perf: short-circuit duck-array dispatch helpers for numpy by FBumann · Pull Request #11354 · pydata/xarray

FBumann · 2026-05-25T16:51:49Z

Description

xarray's duck-array dispatch helpers — is_chunked_array, is_dask_collection, is_duck_dask_array — are called per-variable on every operation that has to branch "dask vs. eager", and ~50 sites across the codebase rely on them. For the dominant case of a numpy.ndarray, none of those should need to enter the dask machinery, but the existing helpers walked the duck-array protocol and (with dask installed) ended up inside dask.base.is_dask_collection anyway.

This PR moves the np.ndarray short-circuit one level deeper, into the helpers themselves:

is_chunked_array (xarray/namedarray/pycompat.py) — was calling is_duck_array twice (once directly, once via is_duck_dask_array). Rewritten to one is_duck_array(x) check, leaning on its built-in isinstance(x, np.ndarray) fast path, with hasattr(x, "chunks") checked before is_dask_collection(x) so numpy avoids the dask import altogether.
is_dask_collection (xarray/namedarray/utils.py) — now does type(x) is np.ndarray: return False before falling through to the dask dispatch. Exact-type check (matching perf(load): skip Variable.load dispatch for numpy data #11355's style) — a plain ndarray never satisfies __dask_graph__, so this is a no-op semantically, while any hypothetical ndarray subclass that did implement __dask_graph__ would still fall through to the real check. Removes the cost from every is_duck_dask_array call site too.

Behavior is unchanged: any duck-array with a dask graph or a chunks attribute is still reported as chunked.

Where this fires

is_chunked_array and is_duck_dask_array together are called from ~50 sites across xarray — every place that branches "dask vs. numpy". On numpy-backed data, all of them skip the dispatch chain. Notable categories:

Materialization: ds.load(), da.load(), compute(), xr.load_dataset/dataarray/datatree, persist(), .values, .to_numpy(), .to_dataframe(), .to_pandas(), plotting, repr previews.
CF encode/decode — every variable read or written: coding/times.py, coding/strings.py, coding/common.py.
Numerical paths — apply_ufunc (computation/apply_ufunc.py), corr/cov/polyval/polyfit, interp, interpolate_na, reductions.
Structural ops — Dataset.chunk(), interp, vectorized indexer dispatch (isel/sel), Variable._shuffle, contains_only_chunked_or_numpy.
Groupby internals — groupers.py, groupby.py.
Accessors — accessor_dt.py, accessor_str.py.
Backends — backends/common.py (ArrayWriter.add).

Not affected: arithmetic on lazy/dask objects (stays lazy); arithmetic on numpy-backed objects (already pure numpy, never reached these helpers).

Benchmark numbers

is_duck_dask_array(numpy) — direct microbench, best of 5×500,000:

	per call
`main`	209 ns
this PR	93 ns
speedup	2.25×

Indexing.time_indexing_basic_ds_large (added in #9003 for this exact concern), best of 5×50, GC off:

	per call
`main`	0.542 ms
this PR	~0.40 ms
this PR + #11355	0.312 ms
speedup (this PR)	~1.36×
speedup (combined)	~1.74×

The two PRs are independent (parallel branches off main, no merge conflicts in either order) but the wins compound for numpy-backed Dataset.load: #11354 makes the is_chunked_array call in Dataset.load's dict comprehension (xarray/core/dataset.py:563) near-free, and #11355 then skips the entire to_duck_array body for each numpy _data.

Review history

Earlier revisions of this PR added an explicit isinstance(x, np.ndarray) guard to is_chunked_array directly. Per @Illviljan's review, that duplicated the isinstance already living inside is_duck_array. The current revision drops that duplicate and moves the cheaper numpy short-circuit into is_dask_collection instead, where it benefits every is_duck_dask_array caller as a bonus.

Checklist

Tests covering chunked paths preserved — any duck-array with a chunks attribute or a dask graph is still reported as chunked
pytest xarray/tests/test_variable.py xarray/tests/test_parallelcompat.py xarray/tests/test_dask.py xarray/tests/test_namedarray.py xarray/namedarray — 825 passed, 72 skipped, 12 xfailed, 4 xpassed
doc/whats-new.rst entry under Internal Changes (updated for the widened scope)

AI Disclosure

This PR contains AI-generated content.
- I have tested any AI-generated content in my PR.
- I take responsibility for any AI-generated content in my PR.

Tools: Claude (Claude Code)

[This is Claude Code on behalf of Felix Bumann]

For datasets with many variables, Dataset.load() called is_chunked_array once per variable in its dict comprehension, then again per variable via Variable.load() -> to_duck_array(). The function itself called is_duck_array twice (once directly, once via is_duck_dask_array). Add a numpy fast-path and consolidate the duck-array check to one call. For non-numpy inputs the behavior is unchanged: any duck-array with a dask graph or a `chunks` attribute is still reported as chunked. Measured on isel(...).load() of a 400-scalar-var dataset (asv_bench/benchmarks/indexing.py::Indexing.time_indexing_basic_ds_large): base: 0.524 ms / call (best of 5x50, GC off) branch: 0.335 ms / call ~1.56x Profile attribution previously showed ~25% of the load wall time inside the is_chunked_array dispatch chain; that portion is now near-free. Closes #2 on the fork. Co-authored-by: Claude <noreply@anthropic.com>

for more information, see https://pre-commit.ci

The previous `isinstance(x, np.ndarray)` short-circuit incorrectly returned False for ndarray subclasses with a `chunks` attribute (e.g. DummyChunkedArray in test_parallelcompat.py, or any third-party chunked array implementation that subclasses ndarray), breaking chunked-array detection on those types. Narrow the fast path to `isinstance + not hasattr("chunks")` so plain ndarrays and non-chunked subclasses (MaskedArray, np.matrix) still skip the duck-array dispatch, while subclasses that advertise chunks fall through to the full check. Co-authored-by: Claude <noreply@anthropic.com>

dcherian

Great change. Thanks!

Illviljan

I think this can be done with is_duck_array, it has the numpy short-circuit already.
Triggering isinstance twice for eager (and lazy) arrays seems wasteful too.

xarray/xarray/namedarray/utils.py

Lines 78 to 91 in d022da5

    
           def is_duck_array(value: Any) -> TypeGuard[duckarray[Any, Any]]: 
        
               # TODO: replace is_duck_array with runtime checks via _arrayfunction_or_api protocol on 
        
               # python 3.12 and higher (see https://github.com/pydata/xarray/issues/8696#issuecomment-1924588981) 
        
               if isinstance(value, np.ndarray): 
        
                   return True 
        
               return ( 
        
                   hasattr(value, "ndim") 
        
                   and hasattr(value, "shape") 
        
                   and hasattr(value, "dtype") 
        
                   and ( 
        
                       (hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__")) 
        
                       or hasattr(value, "__array_namespace__") 
        
                   ) 
        
               )

@Illviljan

Per @Illviljan's review feedback on pydata#11354: `is_duck_array` already does `isinstance(value, np.ndarray)` as its own fast path, so the explicit `isinstance(x, np.ndarray)` guard in `is_chunked_array` was duplicating that work — every non-numpy path paid for two isinstance checks instead of one. Drop the explicit short-circuit and rely on the one inside `is_duck_array`. Reorder the remaining checks so `hasattr("chunks")` runs before `is_dask_collection` — `hasattr` is a cheap C lookup, while `is_dask_collection` enters the dask dispatch machinery. To keep `is_dask_collection(numpy)` essentially free on the fall-through path (and to benefit `is_duck_dask_array` callers across `duck_array_ops.py`, `variable.py`, `indexing.py`, etc.), add the same `isinstance(np.ndarray)` short-circuit to `is_dask_collection` itself — numpy never satisfies `__dask_graph__`, so this is a no-op semantically. Behavior preserved for every prior case: - numpy.ndarray and chunkless ndarray subclasses (e.g. MaskedArray) → False (via `hasattr("chunks") is False` and the new ndarray guard in `is_dask_collection`). - ndarray subclasses that expose `chunks` (e.g. DummyChunkedArray in test_parallelcompat.py) → True (via the `hasattr` branch). - dask arrays → True (via the `hasattr` branch, without entering the `is_dask_collection` call). - non-array inputs → False (via `is_duck_array`, with one fewer isinstance than before). Co-authored-by: Claude <noreply@anthropic.com>

Reflect the is_dask_collection numpy short-circuit added in the prior commit and the resulting knock-on speedup for is_duck_dask_array (~2x on numpy), which ripples through ~28 call sites in duck_array_ops, variable, indexing, groupby, and the dt / str accessors. Update the isel().load() figure (1.5x -> 1.4x) to match the post-refactor bench. Co-authored-by: Claude <noreply@anthropic.com>

Match the style used in PR pydata#11355's to_duck_array / to_numpy guards: `type(x) is np.ndarray` instead of `isinstance(x, np.ndarray)`. For the plain-ndarray case (the one we're trying to optimize) the check is a single C-level pointer comparison and a hair faster than isinstance's MRO walk. More importantly, it is strictly behavior-preserving: any hypothetical ndarray subclass that implemented `__dask_graph__` would now fall through to the real dask.base.is_dask_collection check instead of being silently reported as non-chunked. No such subclass exists in xarray or any chunked-array library today, but the exact-type form removes the edge case. Co-authored-by: Claude <noreply@anthropic.com>

Keep the type(data) is np.ndarray fast-paths (which short-circuit the common case in one pointer compare), but tidy the slow paths so they match the same DRY principle Illviljan raised on pydata#11354: - to_numpy: replace `try: data.to_numpy() except AttributeError` with `if hasattr(data, "to_numpy")`. Identical semantics, no exception machinery for non-ndarray inputs that lack the method. - to_duck_array: restructure so is_duck_array is called once instead of twice (previously via is_chunked_array AND in the duck-array branch). Pull the ExplicitlyIndexed check up so the duck-array dispatch is expressed as a single is_duck_array + dask check. Measured impact on plain ndarrays (vs main): isel(...).load() with 200 scalar vars 1.37x isel(...).load() with 2000 scalar vars 1.40x DataArray.to_numpy() x 1000 4.57x Variable.to_numpy() x 1000 5.42x Co-authored-by: Claude <noreply@anthropic.com> [This is Claude Code on behalf of Felix Bumann]

dcherian

Great change. Thanks!

FBumann and others added 2 commits May 23, 2026 19:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

d008f82

for more information, see https://pre-commit.ci

github-actions Bot added the topic-NamedArray Lightweight version of Variable label May 25, 2026

FBumann mentioned this pull request May 25, 2026

perf(load): skip Variable.load dispatch for numpy data #11355

Open

6 tasks

dcherian approved these changes May 26, 2026

View reviewed changes

Illviljan requested changes May 26, 2026

View reviewed changes

FBumann and others added 2 commits May 27, 2026 15:33

FBumann changed the title ~~perf(load): short-circuit is_chunked_array for numpy arrays~~ perf: short-circuit duck-array dispatch helpers for numpy May 27, 2026

dcherian requested review from Illviljan and dcherian June 24, 2026 19:10

dcherian approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: short-circuit duck-array dispatch helpers for numpy#11354

perf: short-circuit duck-array dispatch helpers for numpy#11354
FBumann wants to merge 6 commits into
pydata:mainfrom
FBumann:perf/load-chunked-check-overhead

FBumann commented May 25, 2026 •

edited

Loading

Uh oh!

dcherian left a comment

Uh oh!

Illviljan left a comment •

edited

Loading

Uh oh!

dcherian left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def is_duck_array(value: Any) -> TypeGuard[duckarray[Any, Any]]:
	# TODO: replace is_duck_array with runtime checks via _arrayfunction_or_api protocol on
	# python 3.12 and higher (see https://github.com/pydata/xarray/issues/8696#issuecomment-1924588981)
	if isinstance(value, np.ndarray):
	return True
	return (
	hasattr(value, "ndim")
	and hasattr(value, "shape")
	and hasattr(value, "dtype")
	and (
	(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__"))
	or hasattr(value, "__array_namespace__")
	)
	)

Uh oh!

Uh oh!

Conversation

FBumann commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Where this fires

Benchmark numbers

Review history

Checklist

AI Disclosure

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

Illviljan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FBumann commented May 25, 2026 •

edited

Loading

Illviljan left a comment •

edited

Loading