Skip to content

BUG: Fix Series.combine_first silently ignoring duplicate indices (#66009)#66025

Open
krishgarg344 wants to merge 4 commits into
pandas-dev:mainfrom
krishgarg344:fix-combine-first-duplicate-index
Open

BUG: Fix Series.combine_first silently ignoring duplicate indices (#66009)#66025
krishgarg344 wants to merge 4 commits into
pandas-dev:mainfrom
krishgarg344:fix-combine-first-duplicate-index

Conversation

@krishgarg344

@krishgarg344 krishgarg344 commented Jun 25, 2026

Copy link
Copy Markdown

The fastpath in Series.combine_first (triggered when dtypes and indices match) was skipping the duplicate label check that normally occurs during .reindex(). This caused the method to silently return positionally misaligned results instead of correctly raising a ValueError.

Modifications:

  • pandas/core/series.py: Added an is_unique check to the fastpath to route duplicate indices to the standard error raising path.
  • pandas/tests/series/methods/test_combine_first.py: Added a test to ensure duplicate indices raise a ValueError.
  • doc/source/whatsnew/v3.1.0.rst: Added a release note under the Indexing section.

Closes #66009

(Note: As a first time contributor, I used an LLM as a pair programming tutor to help me navigate the codebase and structure this PR. All logic, testing, and sandbox verifications were executed and validated manually locally before submission.)

@krishgarg344

Copy link
Copy Markdown
Author

pre-commit.ci autofix

@rhshadrach rhshadrach left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this behavior should change; left a comment in the issue.

@krishgarg344

Copy link
Copy Markdown
Author

Thanks for taking a look and providing the architectural context, @rhshadrach!

That logic makes complete sense, if the indices are strictly identical, positional alignment via .mask() avoids the ambiguity that standard .reindex() faces, making the output perfectly predictable.

The original catalyst for this PR was the inconsistency in the "slow lane": if you run the exact same identically indexed Series, but force a dtype mismatch (e.g., int64 vs float64), it falls through to .reindex() and raises the ValueError.

If the fastpath behavior is the intended standard for identical indices, should the slow lane be updated to match it (perhaps bypassing the raise if self.index.equals(other.index) regardless of dtype)?

Happy to either pivot this PR to address that inconsistency, or simply close this out if the current split behavior is accepted as is!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series.combine_first incorrectly handles duplicate indices instead of raising ValueError

2 participants