Skip to content

Fix: collapse dimension for single nested tuple MultiIndex key#11421

Open
C1-BA-B1-F3 wants to merge 1 commit into
pydata:mainfrom
C1-BA-B1-F3:fix-multiindex-nested-tuple
Open

Fix: collapse dimension for single nested tuple MultiIndex key#11421
C1-BA-B1-F3 wants to merge 1 commit into
pydata:mainfrom
C1-BA-B1-F3:fix-multiindex-nested-tuple

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Problem

When indexing a DataArray with a single nested tuple MultiIndex key (where one level has tuple-valued entries), the result is located correctly but the MultiIndex dimension is incorrectly preserved in the output.

nested_level_0 = pd.Index([(1, 1), (1, 1), (2, 2), (3, 3)], name="a", tupleize_cols=False)
nested_level_1 = pd.Index([1, 2, 10, 20], name="b")
nested_mi = pd.MultiIndex.from_arrays([nested_level_0, nested_level_1])

nested = xr.DataArray(np.arange(4), dims=("index",), ...)
result = nested.sel(index=((1, 1), 2))

# Before: result.dims == ('index',), result.shape == (1,)  # WRONG
# After:  result.dims == (), result.shape == ()             # CORRECT

Root Cause

In PandasMultiIndex.sel(), the _is_nested_tuple() helper detected that ((1, 1), 2) contains a sub-tuple (1, 1) and routed it to get_locs(), which returns an array. But this tuple is actually a single key for the MultiIndex (level a = (1, 1), level b = 2), and get_loc() correctly returns a scalar integer for it.

Fix

Try get_loc() first when a nested tuple is detected. If it succeeds (returns a scalar), use that result. If it raises InvalidIndexError or KeyError (e.g., for slice-in-tuple multi-value selections like (1, slice(1, 2))), fall back to get_locs().

Tests

Added a regression test test_sel_nested_tuple_key_collapses_dimension that verifies the dimension collapses for single nested tuple keys.

Closes #11341

When indexing with a single nested tuple key (where one level has
tuple-valued entries), the dimension was incorrectly preserved because
_is_nested_tuple() detected the nested tuples and used get_locs() which
returns an array, instead of get_loc() which returns a scalar.

The fix tries get_loc() first when a nested tuple is detected, falling
back to get_locs() only if get_loc() fails (e.g., for slice-in-tuple
multi-value selections).

Fixes pydata#11341
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A single nested tuple MultiIndex key is located correctly but preserves the dimension

1 participant