Skip to content

TST: add regression test for combine_first with duplicate index (GH#66009)#66013

Open
AdvenRanises wants to merge 1 commit into
pandas-dev:mainfrom
AdvenRanises:gh-66009-combine-first
Open

TST: add regression test for combine_first with duplicate index (GH#66009)#66013
AdvenRanises wants to merge 1 commit into
pandas-dev:mainfrom
AdvenRanises:gh-66009-combine-first

Conversation

@AdvenRanises

Copy link
Copy Markdown

Adds a regression test for combine_first behavior with duplicate indices.

This ensures current positional alignment behavior remains stable and guards against future changes that might attempt to enforce uniqueness or alter duplicate handling semantics.

Closes #66009 (clarification via test, not behavioral change).

@AdvenRanises AdvenRanises changed the title TST: add regression test for combine_first with duplicate index (GH-66009) TST: add regression test for combine_first with duplicate index (#66009) Jun 24, 2026
@AdvenRanises AdvenRanises changed the title TST: add regression test for combine_first with duplicate index (#66009) TST: add regression test for combine_first with duplicate index (GH#66009) Jun 24, 2026
@krishgarg344

Copy link
Copy Markdown

Hey, tried this myself. If I make s2 int instead of float (with the same duplicate-index setup), combine_first raises ValueError: cannot reindex on an axis with duplicate labels. That comes from a check in Index.reindex that's tagged GH#42568 in the source.

So it looks like pandas has already decided that duplicate indices should raise here. The float/float case seems to skip that check entirely because it takes a different shortcut earlier in the function.

Feels like that's the actual bug, rather than something to lock in with a test. I might be missing some context, but wanted to flag it before this merges and closes #66009.

Here's the error when s2 is made int:

   4251 elif not self.is_unique:
   4252     # GH#42568
-> 4253     raise ValueError("cannot reindex on an axis with duplicate labels")
   4254 else:
   4255     indexer, _ = self.get_indexer_non_unique(target)
ValueError: cannot reindex on an axis with duplicate labels

@jbrockmendel

Copy link
Copy Markdown
Member

why is a test locking down this behavior the right move? Doesn't the link issue treat it as a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series.combine_first incorrectly handles duplicate indices instead of raising ValueError

3 participants