BUG: DataFrame.combine_first loses precision for wide integers (GH#60128)#66011
Draft
jbrockmendel wants to merge 1 commit into
Draft
BUG: DataFrame.combine_first loses precision for wide integers (GH#60128)#66011jbrockmendel wants to merge 1 commit into
jbrockmendel wants to merge 1 commit into
Conversation
…128) Align rows positionally and fill column-by-column from the original arrays instead of reindexing self to the row union, which promoted integer columns through float64 and lost precision for values outside its exactly-representable range. Fully-covered columns now keep their dtype and exact values, including with duplicate row or column labels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #60128
Reimplements
DataFrame.combine_firstto align rows positionally and fill column-by-column, taking values directly from the original arrays instead of reindexingselfto the row union. The old path routed throughalign, which introducedNaNinto integer columns and promoted them tofloat64before the values were combined, losing precision for integers outside float64's exactly-representable range (|n| > 2**53).Fully-covered columns now keep their dtype and exact values, including with duplicate row or column labels. This is a single code path — no special-casing of the unique vs. non-unique case.
Builds on GH-62814, which fixed the nullable
Int64/UInt64variant of this bug.