Opening a multidim store with multidim=False currently returns a meaningless empty sentinel rather than telling the user what's actually going on. GDAL's classic raster API on a multidim store gets a stub dataset back with no data variables, made-up x/y coordinates (0.5 to 511.5), and no indication that they probably wanted multidim=True or a specific 2D slice via subdataset syntax.
import os
os.environ["AWS_NO_SIGN_REQUEST"] = "YES"
os.environ["AWS_REGION"] = "us-west-2"
import gdalxarray
from gdalxarray import GDALBackendEntrypoint
backend = GDALBackendEntrypoint()
# Multidim store, classic mode:
xds = backend.open_dataset(
"/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk",
multidim=False,
)
# <xarray.Dataset> Size: 8kB
# Dimensions: (x: 512, y: 512)
# Coordinates:
# * x (x) float64 4kB 0.5 1.5 2.5 ... 510.5 511.5
# * y (y) float64 4kB 0.5 1.5 2.5 ... 510.5 511.5
# Data variables:
# *empty*
The user has no signal that they're looking at a stub. They probably wanted multidim=True, or one specific 2D slice of one variable.
What's actually possible
For a single 2D slice from a multidim Icechunk store, GDAL 3.13+ supports:
xds = backend.open_dataset(
'ZARR:"/vsiicechunk/{/vsis3/.../v0.1.0.icechunk}":/wind_u_10m:{0}:{0}',
multidim=False,
)
# Returns the lat × lon face at init_time=0, lead_time=0:
# Dimensions: (y: 721, x: 1440)
# Data variables: band_1 (y, x) float32 4MB
This composition — /vsiicechunk/ VFS + ZARR: driver + :{i}:{j} slice selector — is non-obvious and worth documenting. It's the bridge between multidim cloud-native stores and classic GDAL tools (gdal_translate, gdal_warp, etc.).
Proposed behaviour
When _open_raster is called and detects that the result is a stub (empty data variables, sentinel-looking coordinates), do one of:
Sketch of detection:
def _open_raster(self, filename_or_obj, chunks, drop_variables):
dataset = gdal.Open(filename_or_obj, gdal.GA_ReadOnly)
if dataset is None:
raise ValueError(f"Could not open {filename_or_obj} with GDAL")
# Detect multidim stubs: no real bands, or sentinel-looking metadata
if dataset.RasterCount == 0:
# Try multidim probe
mdim_ds = gdal.OpenEx(filename_or_obj, gdal.OF_MULTIDIM_RASTER)
if mdim_ds is not None and mdim_ds.GetRootGroup() is not None:
rg = mdim_ds.GetRootGroup()
arrays = rg.GetMDArrayFullNamesRecursive()
raise ValueError(
f"{filename_or_obj} appears to be a multidimensional store; "
f"try multidim=True. Found {len(arrays)} arrays at paths like "
f"{arrays[:3]}..."
)
# ... rest of existing _open_raster
Helper for exploration
A small gdalxarray.info(url) function that prints:
- Whether the store opens as classic, multidim, or both
- Root group name and immediate children
- All MDArrays at the root level
- Suggested xarray-style access pattern
gdalxarray.info("/vsis3/dynamical-ecmwf-aifs-single/.../v0.1.0.icechunk")
# Store type: Multidim (Icechunk via /vsiicechunk/)
# Root group: /
# Arrays: 21 variables, 4 dimensions
# init_time, latitude, lead_time, longitude
# wind_u_10m (init_time, lead_time, latitude, longitude) float32
# ...
#
# Suggested:
# backend.open_dataset(url, multidim=True)
# for a single 2D slice (lat × lon at init_time=0, lead_time=0):
# backend.open_dataset(
# 'ZARR:"/vsiicechunk/{<path>}":/wind_u_10m:{0}:{0}',
# multidim=False,
# )
This is the kind of "introspect-then-document-paths" tool that's common in R packages (raadtools::glance, gdalraster::ds$info()) but missing from the Python multidim story.
Open questions
Opening a multidim store with
multidim=Falsecurrently returns a meaningless empty sentinel rather than telling the user what's actually going on. GDAL's classic raster API on a multidim store gets a stub dataset back with no data variables, made-upx/ycoordinates (0.5 to 511.5), and no indication that they probably wantedmultidim=Trueor a specific 2D slice via subdataset syntax.The user has no signal that they're looking at a stub. They probably wanted
multidim=True, or one specific 2D slice of one variable.What's actually possible
For a single 2D slice from a multidim Icechunk store, GDAL 3.13+ supports:
This composition —
/vsiicechunk/VFS +ZARR:driver +:{i}:{j}slice selector — is non-obvious and worth documenting. It's the bridge between multidim cloud-native stores and classic GDAL tools (gdal_translate,gdal_warp, etc.).Proposed behaviour
When
_open_rasteris called and detects that the result is a stub (empty data variables, sentinel-looking coordinates), do one of:multidim=Trueor use the subdataset syntax for a 2D slice"OF_MULTIDIM_RASTERcapability and, if present, raise immediately rather than returning the stubgdalinfo-like summary listing available subdataset paths and the equivalentmultidim=TrueaccessSketch of detection:
Helper for exploration
A small
gdalxarray.info(url)function that prints:This is the kind of "introspect-then-document-paths" tool that's common in R packages (
raadtools::glance,gdalraster::ds$info()) but missing from the Python multidim story.Open questions
gdalxarray.info()worth shipping, or isgdal mdim infofrom the GDAL CLI already enough?