BUG: Fix Series.combine_first silently ignoring duplicate indices (#66009)#66025
BUG: Fix Series.combine_first silently ignoring duplicate indices (#66009)#66025krishgarg344 wants to merge 4 commits into
Conversation
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
rhshadrach
left a comment
There was a problem hiding this comment.
I'm not convinced this behavior should change; left a comment in the issue.
|
Thanks for taking a look and providing the architectural context, @rhshadrach! That logic makes complete sense, if the indices are strictly identical, positional alignment via The original catalyst for this PR was the inconsistency in the "slow lane": if you run the exact same identically indexed Series, but force a dtype mismatch (e.g., If the fastpath behavior is the intended standard for identical indices, should the slow lane be updated to match it (perhaps bypassing the raise if Happy to either pivot this PR to address that inconsistency, or simply close this out if the current split behavior is accepted as is! |
The fastpath in Series.combine_first (triggered when dtypes and indices match) was skipping the duplicate label check that normally occurs during .reindex(). This caused the method to silently return positionally misaligned results instead of correctly raising a ValueError.
Modifications:
pandas/core/series.py: Added anis_uniquecheck to the fastpath to route duplicate indices to the standard error raising path.pandas/tests/series/methods/test_combine_first.py: Added a test to ensure duplicate indices raise a ValueError.doc/source/whatsnew/v3.1.0.rst: Added a release note under the Indexing section.Closes #66009
(Note: As a first time contributor, I used an LLM as a pair programming tutor to help me navigate the codebase and structure this PR. All logic, testing, and sandbox verifications were executed and validated manually locally before submission.)