Mfdataset Using Glob Syntax From S3 Not Working Since Zarr V3

by ADMIN 62 views

What happened?

Since the package update to implement the new zarr specification, the glob syntax for loading S3 files is not working anymore. This change has caused inconvenience for users who relied on the previous syntax for loading S3 files using the open_mfdataset function from the xarray library.

Example of non-working code

import xarray as xr

xr.open_mfdataset("s3://mybucket/myzarr/*.zarr", engine="zarr")

Error message

TypeError: Unsupported type for store_like: 'FSMap'

What did you expect to happen?

The previous syntax is expected to remain working. The change to the new zarr specification should not have broken the glob syntax for loading S3 files.

Minimal Complete Verifiable Example

import xarray as xr
from s3fs import S3FileSystem

s3 = S3FileSystem()
xr.open_mfdataset(["s3://" + file for file in s3.glob("s3://mybucket/myzarr/*.zarr")], engine="zarr")

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.11 | packaged by conda-forge | (main, Mar 3 2025, 20:43:55) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 6.8.0-1021-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.4 libnetcdf: 4.9.2

xarray: 2025.1.2 pandas: 2.2.3 numpy: 2.2.3 scipy: 1.15.2 netCDF4: 1.7.2 pydap: 3.5.3 h5netcdf: None h5py: 3.12.1 zarr: 3.0.4 cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: None dask: 2025.2.0 distributed: 2025.2.0 matplotlib: 3.10.1 cartopy: None seaborn: None numbagg: None fsspec: 2025.2.0 cupy: None pint: None sparse: None flox: 0.10.0 numpy_groupies: 0.11.2 setuptools: 75.8.2 pip: 25.0.1 conda: None pytest: None mypy: None IPython: 9.0.1 sphinx: None

Workaround

To load S3 files using the glob syntax, you can use the following workaround:

from s3fs import S3FileSystem
import xarray as xr

s3 = S3FileSystem()
xr.open_mfdataset(["s3://" + file for file in s3.glob("s3://mybucket/myzarr/*.zarr")], engine="zarr")

This workaround requires importing the S3FileSystem class from the s3fs library and using a list comprehension to append the "s3://" prefix to the file names.

Future plans

It is unclear whether the old behavior will be re-implemented. However, the xarray library is constantly evolving, and new features and improvements are being added regularly. Users are encouraged to stay up-to-date with the latest developments and to provide feedback on the library's design and functionality.

Conclusion

Q: What happened to the glob syntax for loading S3 files using the open_mfdataset function from the xarray library?

A: The glob syntax for loading S3 files using the open_mfdataset function from the xarray library is not working since the update to the new zarr specification.

Q: What is the error message I get when trying to use the glob syntax?

A: The error message you get when trying to use the glob syntax is:

TypeError: Unsupported type for store_like: 'FSMap'

Q: What is the workaround for loading S3 files using the glob syntax?

A: To load S3 files using the glob syntax, you can use the following workaround:

from s3fs import S3FileSystem
import xarray as xr

s3 = S3FileSystem()
xr.open_mfdataset(["s3://" + file for file in s3.glob("s3://mybucket/myzarr/*.zarr")], engine="zarr")

This workaround requires importing the S3FileSystem class from the s3fs library and using a list comprehension to append the "s3://" prefix to the file names.

Q: Why do I need to import the S3FileSystem class from the s3fs library?

A: You need to import the S3FileSystem class from the s3fs library because it provides the functionality for working with S3 files. The xarray library relies on the s3fs library to handle S3 file operations.

Q: Why do I need to use a list comprehension to append the "s3://" prefix to the file names?

A: You need to use a list comprehension to append the "s3://" prefix to the file names because the glob function from the s3fs library returns a list of file names without the "s3://" prefix. The list comprehension is used to create a new list with the "s3://" prefix appended to each file name.

Q: Is the old behavior going to be re-implemented?

A: It is unclear whether the old behavior will be re-implemented. However, the xarray library is constantly evolving, and new features and improvements are being added regularly. Users are encouraged to stay up-to-date with the latest developments and to provide feedback on the library's design and functionality.

Q: What can I do to stay up-to-date with the latest developments in the xarray library?

A: You can stay up-to-date with the latest developments in the xarray library by:

  • Checking the official xarray documentation for updates and changes
  • Following the xarray team on social media to stay informed about new features and releases
  • Participating in the xarray community by asking questions and providing feedback on the library's design and functionality

Q: How can I provide feedback on the xarray library's design and functionality?

A: You can provide feedback on the xarray library's design and functionality by:

  • Submitting issues and feature requests on the xarray GitHub repository
  • Participating in the xarray community by asking questions and providing feedback on the library's design and functionality
  • Reaching out to the xarray team directly to provide feedback and suggestions for improvement.