Unit Affiliation: Marine and Polar Geophysics, Lamont-Doherty Earth Observatory (LDEO)
Data heterogeneity and accessibility are major barriers to scientific progress. Many community curated data repositories (CCDRs) have emerged in the paleogeosciences in response to the needs of their scientific communities, but these CCDRs are not well integrated. This project seeks to transform geoscientific research by breaking down the barriers among the CCDRs that serve geoscientists. All participating resources are closely engaged with their respective disciplinary communities and each is mobilizing data from its communities; the key need is to facilitate data interchange among CCDRs. The work will align several major CCDRs using a system that develops new shared services that rely on common standards, and demonstrates the kinds of new scientific insights that become possible with an integrated geoscientific infrastructure. This collaborative project will accomplish the following: 1) A survey of existing data structures and standards, their suitability for paleogeoscience CCDRs and alignment with requirements identified by the paleogeosciences research coordination network, and their degree of adoption. This will lead to recommendations for adoption by participating resources (EarthChem, Flyover Country, IODP, LacCore/CSDCO, LinkedEarth, Neotoma) and documentation written for geoscientific audiences. 2) Alignment of participating CCDRs to recommended standards and development of a common API that will allow data exchange among CCDRs and to third-party users. 3) Development of an Annotation Engine, which will provide a credentialed, crowd-sourced system for scientists to flag changes to datasets, to connect datasets post hoc, to add context to legacy data, and to provide link-back notification among CCDRs when linked dataset attributes change. Annotation Engine will be embedded into existing scientific CCDR-based workflows, minimizing disruption to users. 4) Development of GeoNoteBase to enable scientists to generate citeable, reproducible workflows that draw information from across data resources, with workflows made available through keyword searches, with full attribution. This is a pilot effort to begin some of the work Through two pilot scientific projects, THROUGHPUT will test and evaluate these new capabilities and demonstrate kinds of new scientific insights that can be gained through integration.