Skip to content

Add map_blocks and some testing support for array query expressions#11398

Open
mrocklin wants to merge 6 commits into
pydata:mainfrom
mrocklin:codex/dask-array-xarray-suite
Open

Add map_blocks and some testing support for array query expressions#11398
mrocklin wants to merge 6 commits into
pydata:mainfrom
mrocklin:codex/dask-array-xarray-suite

Conversation

@mrocklin

Copy link
Copy Markdown
Contributor

This adds a --use-dask-array pytest mode that registers the dask-array chunk manager and runs the existing suite through that backend. Most of the work here is making tests a bit more dask.array/dask-array agnostic. I also brought back the xarray map_blocks implementation.

I haven't reviewed this thoroughly yet, and it's not complete (there are still sections of tests that would fail if they weren't marked to xfail under dask-array (flox and masked arrays are good examples), but I wanted to push up something before pushing much further forward on this so that this could get some early feedback.

cc @shoyer @dcherian

AI Disclosure

  • This PR contains AI-generated content.
  • I have tested any AI-generated content in my PR.
  • I take responsibility for any AI-generated content in my PR.

Tools: Claude, Codex

Add a --use-dask-array pytest mode that registers the dask-array chunk manager and runs the existing suite through that backend.

Generalize dask-specific tests around the active chunk manager and add dask-array expression support for xarray map_blocks.
@github-actions github-actions Bot added topic-backends topic-dask topic-testing topic-arrays related to flexible array support io topic-NamedArray Lightweight version of Variable labels Jun 22, 2026
Avoid importing the dask chunk manager in bare-minimum test environments and tighten datetime accessor types for mypy/stubtest.
@mrocklin

Copy link
Copy Markdown
Contributor Author

Gentle ping.

Comment thread conftest.py Outdated
)
parser.addoption("--run-mypy", action="store_true", help="runs mypy tests")
parser.addoption(
"--use-dask-array",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's call this --use-dask-array-with-expr to avoid confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread xarray/tests/test_variable.py
Comment thread xarray/tests/test_sparse.py Outdated
Comment thread xarray/tests/test_missing.py
Comment thread xarray/tests/test_groupby.py Outdated
Comment thread xarray/tests/test_groupby.py Outdated
Comment thread xarray/backends/api.py Outdated
name == "dask" and manager is chunkmanager
for name, manager in list_chunkmanagers().items()
)
if isinstance(chunkmanager, DaskManager) or registered_as_dask:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(chunkmanager, DaskManager) or registered_as_dask:
if registered_as_dask:

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread xarray/namedarray/parallelcompat.py Outdated
Comment thread xarray/structure/chunks.py Outdated
Comment thread xarray/tests/__init__.py
metadata into a private ``dask_array`` multi-output map expression. Each output
variable is still a normal ``dask_array.Array`` child expression, so Dask can
group the children with the composite-collection protocol and ``dask_array`` can
optimize, cull, persist, and compute those arrays.

@dcherian dcherian Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for culling please? (e.g. ds.pipe(xr.map_blocks(...)).sel(...) ≡ s.sel(...).pipe(xr.map_blocks, ...) )? Or... I guess that's slice pushdown? Do we gain that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And yes, I'd call this slice pushdown rather than culling now.

mrocklin and others added 2 commits June 24, 2026 10:51
Rename the dask-array expression test mode, centralize dask-array test helpers, and add a map_blocks slice-pushdown regression test that checks executed chunks.

Also remove the unnecessary xarray vectorized-indexing chunk-manager flag and preserve explicit DaskManager tokenization behavior.

Co-Authored-By: OpenAI <noreply@openai.com>
Comment thread xarray/tests/__init__.py Outdated
Comment thread xarray/tests/__init__.py Outdated
Comment on lines +142 to +160
dask_array_api = None
dask_array_type = ()
has_dask_array_expr = False


def refresh_dask_chunkmanager_helpers() -> None:
global dask_array_api, dask_array_type, has_dask_array_expr

dask_array_api = None
dask_array_type = ()
has_dask_array_expr = False
if has_dask:
dask_chunkmanager = get_dask_chunkmanager()
dask_array_api = dask_chunkmanager.array_api
dask_array_type = dask_chunkmanager.array_cls
has_dask_array_expr = dask_array_type.__module__.startswith("dask_array")


refresh_dask_chunkmanager_helpers()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be

if has_dask:
    ...
else:
    ...

@dcherian dcherian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a couple of nits but LGTM otherwise.

Comment thread xarray/tests/test_dask.py
assert len(v2.__dask_graph__()) < len(v.__dask_graph__()) # type: ignore[arg-type]
assert v2.__dask_keys__() == v.__dask_keys__()
if not has_dask_array_expr:
assert v2.__dask_keys__() == v.__dask_keys__()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems important; is there an equivalent assertion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is important to understand. Optimized expressions will often have a different hash than their unoptimized progenitors. I'm changing this test to assert equivalence in the suffix (key[1:]) of every key rather than key equality.

This is a mild headache for both dask.distributed and frisky in operations like persist, because the client-side will continue to use the expression in things. We end up having to maintain a key-map.

So I believe that this change is good. It points to a genuine issue, but that issue is handled by the related infrastructure.

Comment thread xarray/tests/__init__.py
@mrocklin

Copy link
Copy Markdown
Contributor Author

Thanks @dcherian for the help here

@dcherian dcherian added the plan to merge Final call for comments label Jun 24, 2026
@mrocklin

Copy link
Copy Markdown
Contributor Author

Anything I can do to help here?

@mrocklin mrocklin force-pushed the codex/dask-array-xarray-suite branch from 2e31329 to a0a7df1 Compare June 26, 2026 16:00
@mrocklin

Copy link
Copy Markdown
Contributor Author

Rebased on main. CI passes now except for the RTD failure (upstream issue I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io plan to merge Final call for comments topic-arrays related to flexible array support topic-backends topic-dask topic-NamedArray Lightweight version of Variable topic-testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants