Skip to content

docs: document sort-order behavior of unstack() for pandas-origin MultiIndex#11363

Open
emfdavid wants to merge 2 commits into
pydata:mainfrom
emfdavid:doc/unstack-sorted-coordinates
Open

docs: document sort-order behavior of unstack() for pandas-origin MultiIndex#11363
emfdavid wants to merge 2 commits into
pydata:mainfrom
emfdavid:doc/unstack-sorted-coordinates

Conversation

@emfdavid

@emfdavid emfdavid commented May 27, 2026

Copy link
Copy Markdown

Document unexpected sorting behavior in unstack

This PR changes only Docs and adds a test demonstrating the behavior.

Summary

When a DataArray/Dataset is built from a pandas object whose MultiIndex was created directly by pandas (e.g. via pd.Series.set_index or pd.MultiIndex.from_arrays), pandas stores MultiIndex.levels in sorted order regardless of insertion order. As a result, the coordinates of the new dimensions after unstack() are sorted, not in original insertion order.

This is distinct from xarray's own stack(), which preserves insertion order via factorize() for non-monotonic inputs.

This silent behaviour can cause subtle bugs when user code extracts .data (a positional numpy array) and labels it with coordinates assumed to be in insertion order.

Changes

  • Added a Notes / Sort order section to DataArray.unstack and Dataset.unstack docstrings explaining the behavior and how to restore a specific order with .sel()
  • Added test_unstack_coords_are_sorted to test_dataarray.py to demonstrate the behavior via a pd.Series with a pd.MultiIndex

[This PR was opened by Claude on behalf of emfdavid]

@emfdavid emfdavid marked this pull request as ready for review May 27, 2026 19:17
@emfdavid

Copy link
Copy Markdown
Author

Still some CICD errors but these don't appear to be related to my changes.

@dcherian

Copy link
Copy Markdown
Contributor

Thanks the test failures seem related?

@emfdavid emfdavid force-pushed the doc/unstack-sorted-coordinates branch from 166c0eb to 183e208 Compare June 25, 2026 14:46
@emfdavid

Copy link
Copy Markdown
Author

Thanks the test failures seem related?

Much better after rebase - now just a pixi/numcodecs issue!

emfdavid and others added 2 commits June 26, 2026 14:49
…tiIndex

When a DataArray/Dataset is built from a pandas object whose MultiIndex
was created by pandas directly (e.g. pd.Series.set_index), pandas stores
MultiIndex.levels in sorted order regardless of insertion order. As a
result, the coordinates of the new dimensions after unstack() are sorted,
not in original insertion order.

Add a 'Notes / Sort order' section to DataArray.unstack and Dataset.unstack
docstrings explaining this behavior and how to restore a specific order
with .sel(). Also add test_unstack_coords_are_sorted to demonstrate the
behavior via a pd.Series with a pd.MultiIndex.

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@emfdavid emfdavid force-pushed the doc/unstack-sorted-coordinates branch from 183e208 to e8be126 Compare June 26, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants