Constructor with invalid unicode: automatically fall back to object dtype?

We documented that invalid unicode can no longer be stored in a `str` dtype column (https://pandas.pydata.org/docs/dev/user_guide/migration-3-strings.html#invalid-unicode-input), and for sure that will error when you explicitly ask for `str`:

```python
>>> pd.Series(['\ud800'], dtype=str)
...
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
```

But I am wondering if we should still by default fall back to object dtype in the case you are not specifying a dtype, i.e. for the default inference. Right now also `pd.Series(['\ud800'])` gives the same error. 
(it might be a performance cost in validating that up front though, or otherwise we could the specific error if we know we started without user-specified dtype)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Constructor with invalid unicode: automatically fall back to object dtype? #63396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Constructor with invalid unicode: automatically fall back to object dtype? #63396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions