Skip to content

perf: short-circuit duck-array dispatch helpers for numpy#11354

Open
FBumann wants to merge 6 commits into
pydata:mainfrom
FBumann:perf/load-chunked-check-overhead
Open

perf: short-circuit duck-array dispatch helpers for numpy#11354
FBumann wants to merge 6 commits into
pydata:mainfrom
FBumann:perf/load-chunked-check-overhead

Conversation

@FBumann

@FBumann FBumann commented May 25, 2026

Copy link
Copy Markdown

Description

xarray's duck-array dispatch helpers — is_chunked_array, is_dask_collection, is_duck_dask_array — are called per-variable on every operation that has to branch "dask vs. eager", and ~50 sites across the codebase rely on them. For the dominant case of a numpy.ndarray, none of those should need to enter the dask machinery, but the existing helpers walked the duck-array protocol and (with dask installed) ended up inside dask.base.is_dask_collection anyway.

This PR moves the np.ndarray short-circuit one level deeper, into the helpers themselves:

  1. is_chunked_array (xarray/namedarray/pycompat.py) — was calling is_duck_array twice (once directly, once via is_duck_dask_array). Rewritten to one is_duck_array(x) check, leaning on its built-in isinstance(x, np.ndarray) fast path, with hasattr(x, "chunks") checked before is_dask_collection(x) so numpy avoids the dask import altogether.
  2. is_dask_collection (xarray/namedarray/utils.py) — now does type(x) is np.ndarray: return False before falling through to the dask dispatch. Exact-type check (matching perf(load): skip Variable.load dispatch for numpy data #11355's style) — a plain ndarray never satisfies __dask_graph__, so this is a no-op semantically, while any hypothetical ndarray subclass that did implement __dask_graph__ would still fall through to the real check. Removes the cost from every is_duck_dask_array call site too.

Behavior is unchanged: any duck-array with a dask graph or a chunks attribute is still reported as chunked.

Where this fires

is_chunked_array and is_duck_dask_array together are called from ~50 sites across xarray — every place that branches "dask vs. numpy". On numpy-backed data, all of them skip the dispatch chain. Notable categories:

  • Materialization: ds.load(), da.load(), compute(), xr.load_dataset/dataarray/datatree, persist(), .values, .to_numpy(), .to_dataframe(), .to_pandas(), plotting, repr previews.
  • CF encode/decode — every variable read or written: coding/times.py, coding/strings.py, coding/common.py.
  • Numerical pathsapply_ufunc (computation/apply_ufunc.py), corr/cov/polyval/polyfit, interp, interpolate_na, reductions.
  • Structural opsDataset.chunk(), interp, vectorized indexer dispatch (isel/sel), Variable._shuffle, contains_only_chunked_or_numpy.
  • Groupby internalsgroupers.py, groupby.py.
  • Accessorsaccessor_dt.py, accessor_str.py.
  • Backendsbackends/common.py (ArrayWriter.add).

Not affected: arithmetic on lazy/dask objects (stays lazy); arithmetic on numpy-backed objects (already pure numpy, never reached these helpers).

Benchmark numbers

is_duck_dask_array(numpy) — direct microbench, best of 5×500,000:

per call
main 209 ns
this PR 93 ns
speedup 2.25×

Indexing.time_indexing_basic_ds_large (added in #9003 for this exact concern), best of 5×50, GC off:

per call
main 0.542 ms
this PR ~0.40 ms
this PR + #11355 0.312 ms
speedup (this PR) ~1.36×
speedup (combined) ~1.74×

The two PRs are independent (parallel branches off main, no merge conflicts in either order) but the wins compound for numpy-backed Dataset.load: #11354 makes the is_chunked_array call in Dataset.load's dict comprehension (xarray/core/dataset.py:563) near-free, and #11355 then skips the entire to_duck_array body for each numpy _data.

Review history

Earlier revisions of this PR added an explicit isinstance(x, np.ndarray) guard to is_chunked_array directly. Per @Illviljan's review, that duplicated the isinstance already living inside is_duck_array. The current revision drops that duplicate and moves the cheaper numpy short-circuit into is_dask_collection instead, where it benefits every is_duck_dask_array caller as a bonus.

Checklist

  • Tests covering chunked paths preserved — any duck-array with a chunks attribute or a dask graph is still reported as chunked
  • pytest xarray/tests/test_variable.py xarray/tests/test_parallelcompat.py xarray/tests/test_dask.py xarray/tests/test_namedarray.py xarray/namedarray — 825 passed, 72 skipped, 12 xfailed, 4 xpassed
  • doc/whats-new.rst entry under Internal Changes (updated for the widened scope)

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude (Claude Code)


[This is Claude Code on behalf of Felix Bumann]

FBumann and others added 2 commits May 23, 2026 19:21
For datasets with many variables, Dataset.load() called is_chunked_array
once per variable in its dict comprehension, then again per variable via
Variable.load() -> to_duck_array(). The function itself called
is_duck_array twice (once directly, once via is_duck_dask_array).

Add a numpy fast-path and consolidate the duck-array check to one call.
For non-numpy inputs the behavior is unchanged: any duck-array with a
dask graph or a `chunks` attribute is still reported as chunked.

Measured on isel(...).load() of a 400-scalar-var dataset
(asv_bench/benchmarks/indexing.py::Indexing.time_indexing_basic_ds_large):

    base:   0.524 ms / call   (best of 5x50, GC off)
    branch: 0.335 ms / call   ~1.56x

Profile attribution previously showed ~25% of the load wall time inside
the is_chunked_array dispatch chain; that portion is now near-free.

Closes #2 on the fork.

Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the topic-NamedArray Lightweight version of Variable label May 25, 2026
The previous `isinstance(x, np.ndarray)` short-circuit incorrectly
returned False for ndarray subclasses with a `chunks` attribute (e.g.
DummyChunkedArray in test_parallelcompat.py, or any third-party chunked
array implementation that subclasses ndarray), breaking chunked-array
detection on those types.

Narrow the fast path to `isinstance + not hasattr("chunks")` so plain
ndarrays and non-chunked subclasses (MaskedArray, np.matrix) still skip
the duck-array dispatch, while subclasses that advertise chunks fall
through to the full check.

Co-authored-by: Claude <noreply@anthropic.com>

@dcherian dcherian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change. Thanks!

@Illviljan Illviljan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with is_duck_array, it has the numpy short-circuit already.
Triggering isinstance twice for eager (and lazy) arrays seems wasteful too.

def is_duck_array(value: Any) -> TypeGuard[duckarray[Any, Any]]:
# TODO: replace is_duck_array with runtime checks via _arrayfunction_or_api protocol on
# python 3.12 and higher (see https://github.com/pydata/xarray/issues/8696#issuecomment-1924588981)
if isinstance(value, np.ndarray):
return True
return (
hasattr(value, "ndim")
and hasattr(value, "shape")
and hasattr(value, "dtype")
and (
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__"))
or hasattr(value, "__array_namespace__")
)
)

FBumann and others added 2 commits May 27, 2026 15:33
Per @Illviljan's review feedback on pydata#11354: `is_duck_array` already does
`isinstance(value, np.ndarray)` as its own fast path, so the explicit
`isinstance(x, np.ndarray)` guard in `is_chunked_array` was duplicating
that work — every non-numpy path paid for two isinstance checks instead
of one.

Drop the explicit short-circuit and rely on the one inside
`is_duck_array`. Reorder the remaining checks so `hasattr("chunks")`
runs before `is_dask_collection` — `hasattr` is a cheap C lookup, while
`is_dask_collection` enters the dask dispatch machinery.

To keep `is_dask_collection(numpy)` essentially free on the fall-through
path (and to benefit `is_duck_dask_array` callers across
`duck_array_ops.py`, `variable.py`, `indexing.py`, etc.), add the same
`isinstance(np.ndarray)` short-circuit to `is_dask_collection` itself —
numpy never satisfies `__dask_graph__`, so this is a no-op semantically.

Behavior preserved for every prior case:
- numpy.ndarray and chunkless ndarray subclasses (e.g. MaskedArray) →
  False (via `hasattr("chunks") is False` and the new ndarray guard in
  `is_dask_collection`).
- ndarray subclasses that expose `chunks` (e.g. DummyChunkedArray in
  test_parallelcompat.py) → True (via the `hasattr` branch).
- dask arrays → True (via the `hasattr` branch, without entering the
  `is_dask_collection` call).
- non-array inputs → False (via `is_duck_array`, with one fewer
  isinstance than before).

Co-authored-by: Claude <noreply@anthropic.com>
Reflect the is_dask_collection numpy short-circuit added in the prior
commit and the resulting knock-on speedup for is_duck_dask_array (~2x
on numpy), which ripples through ~28 call sites in duck_array_ops,
variable, indexing, groupby, and the dt / str accessors. Update the
isel().load() figure (1.5x -> 1.4x) to match the post-refactor bench.

Co-authored-by: Claude <noreply@anthropic.com>
@FBumann FBumann changed the title perf(load): short-circuit is_chunked_array for numpy arrays perf: short-circuit duck-array dispatch helpers for numpy May 27, 2026
Match the style used in PR pydata#11355's to_duck_array / to_numpy guards:
`type(x) is np.ndarray` instead of `isinstance(x, np.ndarray)`. For
the plain-ndarray case (the one we're trying to optimize) the check
is a single C-level pointer comparison and a hair faster than
isinstance's MRO walk. More importantly, it is strictly
behavior-preserving: any hypothetical ndarray subclass that
implemented `__dask_graph__` would now fall through to the real
dask.base.is_dask_collection check instead of being silently reported
as non-chunked. No such subclass exists in xarray or any chunked-array
library today, but the exact-type form removes the edge case.

Co-authored-by: Claude <noreply@anthropic.com>
FBumann added a commit to FBumann/xarray that referenced this pull request May 27, 2026
Keep the type(data) is np.ndarray fast-paths (which short-circuit the
common case in one pointer compare), but tidy the slow paths so they
match the same DRY principle Illviljan raised on pydata#11354:

- to_numpy: replace `try: data.to_numpy() except AttributeError` with
  `if hasattr(data, "to_numpy")`. Identical semantics, no exception
  machinery for non-ndarray inputs that lack the method.
- to_duck_array: restructure so is_duck_array is called once instead of
  twice (previously via is_chunked_array AND in the duck-array branch).
  Pull the ExplicitlyIndexed check up so the duck-array dispatch is
  expressed as a single is_duck_array + dask check.

Measured impact on plain ndarrays (vs main):
  isel(...).load() with 200 scalar vars   1.37x
  isel(...).load() with 2000 scalar vars  1.40x
  DataArray.to_numpy() x 1000             4.57x
  Variable.to_numpy()  x 1000             5.42x

Co-authored-by: Claude <noreply@anthropic.com>

[This is Claude Code on behalf of Felix Bumann]
@dcherian dcherian requested review from Illviljan and dcherian June 24, 2026 19:10

@dcherian dcherian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-NamedArray Lightweight version of Variable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants