Skip to content

BUG: read_csv c engine loses float precision with thousands separator (GH#44145)#66010

Draft
jbrockmendel wants to merge 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-44145
Draft

BUG: read_csv c engine loses float precision with thousands separator (GH#44145)#66010
jbrockmendel wants to merge 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-44145

Conversation

@jbrockmendel

Copy link
Copy Markdown
Member

closes #44145

to_numeric already parses high-precision float strings to the correctly-rounded float64 (matching the Python float builtin) via fast_float, but read_csv with a thousands separator did not. precise_xstrtod only dispatched to fast_float when no thousands separator was set, so read_csv(thousands=...) fell back to the bespoke parser, which could be off by up to a couple of ULP on long-mantissa floats (~12% of random long-mantissa values in a fuzz).

The fix strips the integer-part separators into a scratch buffer and hands the token to fast_float. A thousands separator is only valid in the integer part where it follows a digit, so the strip is position-aware (separators elsewhere are kept, causing fast_float to stop there) — this preserves the strict acceptance of the old fallback for malformed input like 1,_2, 1_000,000_000, and 1,e1_2. The consumed length is mapped back to the original string so endptr/trailing-character handling stays correct. The no-thousands path is unchanged.

… (GH#44145)

precise_xstrtod only routed to the correctly-rounded fast_float parser when no
thousands separator was set, so read_csv(thousands=...) fell back to the bespoke
parser and could be off by up to a couple of ULP on long-mantissa floats. Strip
the integer-part separators into a scratch buffer and hand the token to
fast_float, keeping the strict acceptance of the fallback for malformed input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added Bug IO CSV read_csv, to_csv labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: to_numeric str to float has missing digit precision

1 participant