-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Searching for it, this change was done on purpose in #58733 (triggered by #47667, #41357), and also mentioned in the whatsnew notes, although under the Performance section:
Eliminated circular reference in to original pandas object in accessor attributes (e.g. Series.str). However, accessor instantiation is no longer cached (GH 47667, GH 41357)
As a starter, we should certainly mention this in the breaking API changes section instead. But I am also wondering if we might not be underestimating the impact, and if there might be other options. For example, we could make this optional (e.g. a cached=True/False keyword in the register function). Or maybe using a weakref avoids the memory issues?
Small reproducer to illustrate:
@pd.api.extensions.register_dataframe_accessor("my_accessor")
class MyAccessor:
def __init__(self, df):
self._df = df
>>> df = pd.DataFrame({"a": [1, 2, 3]})
>>> df.my_accessor is df.my_accessor
False # <-- this gives True with pandas < 3Reason I am opening this is because I noticed that this breaks the usage pattern of the data_description accessor in pyjanitor.