Readablewiki

Zarr (data format)

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Zarr is an open standard for storing large multidimensional array data. It uses a cloud-friendly protocol and data format that supports random access by dividing data into smaller chunks. Zarr can be used from many programming languages, including Python, Java, JavaScript, C++, Rust, and Julia, and has been adopted by organizations like Google and Microsoft for sharing big datasets.

Zarr is built for high-throughput, distributed input/output across different storage systems. Arrays are stored as a grid of chunks, which allows many read or write operations to happen in parallel. The on-disk data format depends on the chosen compressor and storage backend.

The design of Zarr is influenced by HDF5 and includes similar features for metadata and grouping. Arrays can be organized into named hierarchies, and you can attach key-value metadata alongside the data.

Zarr is used for a range of large data applications, including weather, satellite, and energy data. In bioimaging, the Open Microscopy Environment (OME) has created OME-Zarr, a version of Zarr tailored for microscopy with domain-specific extensions.

The .zarr specification enables fine-grained representation of complex experiments. For example, a microscope plate can have many wells, each scanned in multiple fields, producing images that may have up to five dimensions (time, imaging channels, and three spatial dimensions). Zarr can also support resolution pyramids to speed up visualization. Because Zarr uses multiple directories to organize data, different fields can be retrieved independently, even via custom URLs from object storage.


This page was last edited on 2 February 2026, at 07:07 (CET).