Skip to content

PERF: serialize tz-aware datetime to_json from datetime64 ndarray#66007

Draft
jbrockmendel wants to merge 5 commits into
pandas-dev:mainfrom
jbrockmendel:perf-json-tz
Draft

PERF: serialize tz-aware datetime to_json from datetime64 ndarray#66007
jbrockmendel wants to merge 5 commits into
pandas-dev:mainfrom
jbrockmendel:perf-json-tz

Conversation

@jbrockmendel

@jbrockmendel jbrockmendel commented Jun 24, 2026

Copy link
Copy Markdown
Member

Serialize timezone-aware datetime data to JSON directly from the underlying UTC-localized datetime64 ndarray, threading a "this is UTC" flag through the encoder so the ISO output keeps its trailing Z — instead of materializing an object array of Timestamp.

This is now a performance-only change. It originally also fixed Series.to_json(date_format="iso") dropping the trailing Z for tz-aware data, but that correctness bug has since been fixed independently on main (GH-65744), so this PR builds on top of that.

Performance: for a timezone-aware datetime Series (the values), and for a timezone-aware DatetimeIndex used standalone or as the index/columns labels of a Series/DataFrame, to_json(date_format="iso") no longer boxes into an object array of Timestamp on every call. This gives roughly an order-of-magnitude speedup (e.g. ~18x for a 500k-element tz-aware Series) with byte-identical output.

Output was verified byte-identical to the equivalent object-dtype (.astype(object)) serialization across Series/DataFrame × all orients × {s, ms, us, ns} units, including NaT.

Note: DataFrame dt64tz data columns still go through the object path (via _mgr.column_arrays / _values_for_json); optimizing those is left as a follow-up.

… ISO "Z"

dt64tz values are now written to JSON directly from the underlying
UTC-localized datetime64 ndarray, threading a "this is UTC" flag through
the encoder so the ISO output keeps its trailing "Z", instead of
materializing an object array of Timestamps. This makes timezone-aware
Index / index / column-label serialization much faster (~30-50x) and
fixes Series.to_json(date_format="iso") dropping the "Z" marker (it had
fallen through to a tz-naive path because a Series exposes tz only via
.dt.tz, not a top-level .tz attribute).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added Bug Performance Memory or execution speed performance IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype labels Jun 24, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jun 26, 2026
jbrockmendel and others added 3 commits June 26, 2026 15:45
# Conflicts:
#	pandas/_libs/src/vendored/ujson/python/objToJSON.c
#	pandas/tests/io/json/test_pandas.py
Re-derive the tz-aware to_json fast path on top of main's get_values
(GH#65744): for a dt64tz Series/Index, serialize directly from the
underlying UTC datetime64 ndarray (DatetimeArray._ndarray) instead of
boxing into an object array of Timestamps via _values_for_json. The
existing UTC-flag plumbing keeps the trailing "Z". DataFrame data
columns still go through _values_for_json (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel changed the title PERF: serialize tz-aware datetime to_json from datetime64, fix Series ISO "Z" PERF: serialize tz-aware datetime to_json from datetime64 ndarray Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO JSON read_json, to_json, json_normalize Performance Memory or execution speed performance Timezones Timezone data dtype

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant