Glencoe Software presents findings on scalability of image analysis workflows in the cloud using OMERO Plus, OME-NGFF, and CellProfiler
As high-throughput imaging technology has advanced, researchers have been able to acquire increasingly large and multi-dimensional microscopy datasets. This poses a challenge for both data management and image analysis.
As the quantity of images within individual datasets has expanded, there has been an increasing need for efficient and scalable technologies for data storage and retrieval. There are numerous image file formats currently used by researchers, though these are often proprietary and typically rely on local storage and consider large datasets as a single large file object.
The Open Microscopy Environment’s (OME) Next-Generation File Format (OME-NGFF) is a novel image data format based on the Zarr n-dimensional array storage package. As an object-based format, OME-NGFF provides a critical advantage in that single image tiles or planes within a larger dataset can be accessed without the need to seek through the entire file object. In addition to this, OME-NGFF supports direct reading and writing to the format independent of the storage modality. This means that data can be accessed directly from the cloud rather than needing to be downloaded to the local machine. This offers an effective pathway for handling the large file sizes associated with modern high-throughput microscopes.
CellProfiler is an open-source image analysis software package which allows users to develop customised pipelines which segment and analyse objects within their image datasets.
Herein, we have extended CellProfiler with native support for reading OME-NGFF data. Comparison between locally-stored data in OME-TIFF and OME-NGFF formats revealed a >4-fold reduction in I/O time with NGFF. This reader also allowed us to execute analysis pipelines directly on images from high-dimensional datasets which were stored on Amazon Web Services, without the significant decrease in data I/O performance which would typically be associated with using CellProfiler with remote storage.
OME-NGFF has already been implemented as a storage format for the OMERO Plus image data management server package. We tested the utility of this format using the OMERO-CellProfiler connector interface (Glencoe Software). It was found that running a large scale CellProfiler analysis with OME-NGFF as the source data provided a significant performance advantage over running the same analysis when files were stored in the conventional OME-TIFF format.
Future development will aim to contribute this reader into the open-source CellProfiler repository for the benefit of the wider community.