Skip to content

REGR: registered accessors are no longer cached per DataFrame #63393

@jorisvandenbossche

Description

@jorisvandenbossche

Searching for it, this change was done on purpose in #58733 (triggered by #47667, #41357), and also mentioned in the whatsnew notes, although under the Performance section:

Eliminated circular reference in to original pandas object in accessor attributes (e.g. Series.str). However, accessor instantiation is no longer cached (GH 47667, GH 41357)

As a starter, we should certainly mention this in the breaking API changes section instead. But I am also wondering if we might not be underestimating the impact, and if there might be other options. For example, we could make this optional (e.g. a cached=True/False keyword in the register function). Or maybe using a weakref avoids the memory issues?

Small reproducer to illustrate:

@pd.api.extensions.register_dataframe_accessor("my_accessor")
class MyAccessor:
    def __init__(self, df):
        self._df = df

>>> df = pd.DataFrame({"a": [1, 2, 3]})
>>> df.my_accessor is df.my_accessor
False  # <-- this gives True with pandas < 3

Reason I am opening this is because I noticed that this breaks the usage pattern of the data_description accessor in pyjanitor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Accessorsaccessor registration mechanism (not .str, .dt, .cat)Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions