EarthCube Data Capabilities: A Cloud-Native Data Repository for the Geoscience Community

Lead PI: Ryan Abernathey , Timothy J Crone , Chiara Lepore, Naomi Henderson

Unit Affiliation: Ocean and Climate Physics, Lamont-Doherty Earth Observatory (LDEO)

September 2020 - August 2023
Project Type: Research

DESCRIPTION: The Pangeo Project's overall goal is to make scientists more productive by making it easier, more efficient, and more cost effective to work with very large datasets. This project focuses on using cloud computing to transform scientists' relationship with data. While most traditional data repositories provide download services, this project will create a cloud native data repository to instead allow scientists to do computing in the cloud, where the data are already present and ready for analysis. Software tools and infrastructure will be developed to enable scientists and data providers to extract data from its original location and store it in the cloud in an optimized format. Partnerships with industry will help ensure that this innovative technological approach helps enhance US economic competitiveness.

This project will build a new type of data and computing facility for the geoscience community using cloud technology. Central to this will be a repository of analysis-ready data stored in cloud-native formats such as Zarr. The Pangeo cloud data repository will enable scientists to create analysis-ready copies of existing datasets while fully tracing their provenance to the original source repository. This new virtual facility will transform how scientists work and collaborate, leading to massive leaps in productivity for the entire field. The infrastructure will be built using modular, reusable, sustainable individual software components, including Pangeo Forge, a tool for automating the production of analysis-ready data and a cloud-native data catalog and accompanying website. Partnering with existing data providers and science communities in the areas of ocean observatories, coupled climate modeling, satellite observations, and weather reanalysis will ensure that the repository is maximally useful to the research community. This award by the Directorate for Geosciences is jointly supported by the Office of Advanced Cyberinfrastructure.