Skip to content

fix: handle non-numpy dtypes in PandasIndex.concat() and join()#11409

Open
C1-BA-B1-F3 wants to merge 3 commits into
pydata:mainfrom
C1-BA-B1-F3:fix-concat-string-dtype
Open

fix: handle non-numpy dtypes in PandasIndex.concat() and join()#11409
C1-BA-B1-F3 wants to merge 3 commits into
pydata:mainfrom
C1-BA-B1-F3:fix-concat-string-dtype

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Description

This PR fixes an issue where concat fails when mixing string coordinates from different sources (e.g., numpy string dtype and pandas StringDtype).

Problem

When concatenating DataArrays where one has a numpy string dtype coordinate and another has a pandas StringDtype coordinate (introduced by pd.Index in pandas 3.0), np.result_type() fails with:

TypeError: Cannot interpret '' as a data type

Root Cause

In PandasIndex.concat() and PandasIndex.join(), np.result_type() is called with coordinate dtypes without checking if they are valid numpy dtypes first. Pandas extension dtypes (like StringDtype) are not valid numpy dtypes.

Fix

  • Check if all dtypes are valid numpy dtypes before calling np.result_type()
  • Fall back to object dtype if any dtype is not a valid numpy dtype
  • Added regression test for the fix

Fixes GH#11317

Tests

  • Added test_concat_string_dtype_from_pd_index regression test
  • All existing concat tests pass (140 passed, 2 skipped)
  • All existing indexes tests pass (75 passed, 2 skipped)

Problem: When a MultiIndex level contains tuple-valued entries (e.g., (1,1)),
selecting with a nested tuple key like ((1,1), 2) incorrectly preserved the
dimension instead of collapsing it to a scalar result.

Root cause: _is_nested_tuple() was checking for 'tuple' in addition to 'list'
and 'slice', which caused it to misidentify tuple-valued keys as nested
selection tuples.

Fix: Remove 'tuple' from the isinstance check in _is_nested_tuple() so that
only 'list' and 'slice' are treated as indicators of nested selections. Tuple-
valued keys in MultiIndex levels are now correctly handled as scalar key values.

Added regression test for selecting with nested tuple keys on MultiIndex with
tuple-valued levels.
When concatenating indexes with mixed string types (e.g., numpy string dtype
and pandas StringDtype), np.result_type() fails because it cannot interpret
extension dtypes. This fix checks if all dtypes are valid numpy dtypes before
calling np.result_type(), falling back to object dtype if not.

Fixes GH#11317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant