Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should from_dict be annotated to allow sequence data values? #929

Open
Fruglemonkey opened this issue May 26, 2024 · 8 comments
Open

Should from_dict be annotated to allow sequence data values? #929

Fruglemonkey opened this issue May 26, 2024 · 8 comments
Labels
pandas_docs For issues where there is a conflict in behavior with pandas docs and stubs that needs resolution

Comments

@Fruglemonkey
Copy link

Describe the bug
It's not explicitly documented, but Dataframe.from_dict allows for a Sequence of values to be passed in for data.

Is this something we want to annotate? It's fairly easy to demonstrate the failure case, but I'm not sure if this would be 'wider' than we'd want to allow for.

To Reproduce
Example code demonstrating the issue with mypy version 1.10.0, pandas 2.2.2

import pandas as pd

tuple_of_dicts = (
    {'A': 1, 'B': 0},
    {'A': 0},
)

list_of_dicts = [
    {'A': 1, 'B': 0},
    {'A': 0},
]

tuple_df = pd.DataFrame.from_dict(data=tuple_of_dicts)
list_df = pd.DataFrame.from_dict(data=list_of_dicts)

print(tuple_df.to_string())
print()
print(list_df.to_string())

I receive the following errors:

error: No overload variant of "from_dict" of "DataFrame" matches argument type "tuple[dict[str, int], dict[str, int]]" [call-overload]

and

error: No overload variant of "from_dict" of "DataFrame" matches argument type "list[dict[str, int]]" [call-overload]

The code however, runs as you'd expect:

python foo.py
   A    B
0  1  0.0
1  0  NaN

   A    B
0  1  0.0
1  0  NaN

Additional context
Happy to raise the PR to address this, just wanted to check what the intended behaviour is first before doing the work.

@twoertwein
Copy link
Member

It's not explicitly documented, but Dataframe.from_dict allows for a Sequence of values to be passed in for data.

I would say no as not even a list/array-like is document to be accepted.

Depending on pandas-dev/pandas#58814 we should widen the type to Mapping.

@Dr-Irv

@twoertwein
Copy link
Member

The summary says "Construct DataFrame from dict of array-like or dicts.", so I think we should allow a sequence here.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 28, 2024

This is related to #928 . Not clear if the docs are wrong on DataFrame.from_dict() with respect to the data argument. So the issue of accepting a Sequence should be decided by creating an issue in pandas to see whether we want to allow this undocumented behavior.

@Fruglemonkey
Copy link
Author

Fruglemonkey commented May 29, 2024

The Dataframe.__init__ codepath explicitly supports it: https://github.com/pandas-dev/pandas/blob/v2.2.2/pandas/core/frame.py#L836

        # For data is list-like, or Iterable (will consume into list)
        elif is_list_like(data):

And the codepath for is_list_like: https://github.com/pandas-dev/pandas/blob/v2.2.2/pandas/_libs/lib.pyx#L1201

    Check if the object is list-like.

    Objects that are considered list-like are for example Python
    lists, tuples, sets, NumPy arrays, and Pandas Series.

    Strings and datetime objects, however, are not considered list-like.

So I suspect it's just a matter of the documentation being out of date.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 29, 2024

So I suspect it's just a matter of the documentation being out of date.

Yes, but we are talking about from_dict() here, not __init__(). So it could be the case that the documentation is correct, but the implementation should be rejecting a list of dictionaries inside of from_dict().

@Fruglemonkey
Copy link
Author

Ah, I see what you mean - people should be using from_records instead. I think I agree.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 29, 2024

Ah, I see what you mean - people should be using from_records instead. I think I agree.

Or just use the regular DataFrame constructor, which does accept a sequence of dicts

@pmaier-bhs
Copy link

Created a pandas issue, see pandas-dev/pandas#58862.

@Dr-Irv Dr-Irv added the pandas_docs For issues where there is a conflict in behavior with pandas docs and stubs that needs resolution label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pandas_docs For issues where there is a conflict in behavior with pandas docs and stubs that needs resolution
Projects
None yet
Development

No branches or pull requests

4 participants