PERF: serialize tz-aware datetime to_json from datetime64 ndarray#66007
Draft
jbrockmendel wants to merge 5 commits into
Draft
PERF: serialize tz-aware datetime to_json from datetime64 ndarray#66007jbrockmendel wants to merge 5 commits into
jbrockmendel wants to merge 5 commits into
Conversation
… ISO "Z" dt64tz values are now written to JSON directly from the underlying UTC-localized datetime64 ndarray, threading a "this is UTC" flag through the encoder so the ISO output keeps its trailing "Z", instead of materializing an object array of Timestamps. This makes timezone-aware Index / index / column-label serialization much faster (~30-50x) and fixes Series.to_json(date_format="iso") dropping the "Z" marker (it had fallen through to a tz-naive path because a Series exposes tz only via .dt.tz, not a top-level .tz attribute). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jorisvandenbossche
added a commit
to jorisvandenbossche/pandas
that referenced
this pull request
Jun 26, 2026
# Conflicts: # pandas/_libs/src/vendored/ujson/python/objToJSON.c # pandas/tests/io/json/test_pandas.py
Re-derive the tz-aware to_json fast path on top of main's get_values (GH#65744): for a dt64tz Series/Index, serialize directly from the underlying UTC datetime64 ndarray (DatetimeArray._ndarray) instead of boxing into an object array of Timestamps via _values_for_json. The existing UTC-flag plumbing keeps the trailing "Z". DataFrame data columns still go through _values_for_json (follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Serialize timezone-aware datetime data to JSON directly from the underlying UTC-localized
datetime64ndarray, threading a "this is UTC" flag through the encoder so the ISO output keeps its trailingZ— instead of materializing an object array ofTimestamp.This is now a performance-only change. It originally also fixed
Series.to_json(date_format="iso")dropping the trailingZfor tz-aware data, but that correctness bug has since been fixed independently onmain(GH-65744), so this PR builds on top of that.Performance: for a timezone-aware datetime
Series(the values), and for a timezone-awareDatetimeIndexused standalone or as the index/columns labels of aSeries/DataFrame,to_json(date_format="iso")no longer boxes into an object array ofTimestampon every call. This gives roughly an order-of-magnitude speedup (e.g. ~18x for a 500k-element tz-awareSeries) with byte-identical output.Output was verified byte-identical to the equivalent object-dtype (
.astype(object)) serialization acrossSeries/DataFrame× all orients ×{s, ms, us, ns}units, includingNaT.Note:
DataFramedt64tz data columns still go through the object path (via_mgr.column_arrays/_values_for_json); optimizing those is left as a follow-up.