2024 Abstracts

ESS-DIVE: Enabling Integration Across Diverse Environmental Systems Science Data

Authors

Joan Damerow1* (JoanDamerow@lbl.gov), Shreyas Cholia2, Deb Agarwal2, Matthew Brooke3, Madison Burrus1, Hesham Elbashandy2, Valerie Hendrix2, Matthew B. Jones3, Mario Melara2, Rushiraj Nenuji3, Fianna O’Brien2, Dylan O’Ryan1, Sarah Poon2, Emily Robles1, Shalki Shrivastava2, Jing Tao3, Karen Whitenack2, Catherine Wong2, Charuleka Varadharajan1

Institutions

1Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA; 2Scientific Data Division, Lawrence Berkeley National Laboratory, Berkeley, CA; 3National Center for Ecological Research and Synthesis, Santa Barbara, CA

URLs

Abstract

The ESS Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a data repository designed for the U.S. DOE’s ESS program. ESS-DIVE enables collection, storage, management, and sharing of a variety of observational, experimental, and modeling data generated across the program. The volume, complexity, and diversity of these interdisciplinary data present unique integration challenges.

Researchers discuss how ESS-DIVE approaches data integration across these datasets and with other data systems.

Metadata in ESS-DIVE are published in a number of formats, including the JSON-LD format, which allows the data to be easily ingested and understood by external systems (e.g., Google Dataset Search, OSTI, Data.gov etc.). ESS-DIVE provides a systematic method for linking datasets from other recognized data providers. This allows metadata to be searchable in ESS-DIVE, while linking out to externally managed data products. In order to track and relate sample data across systems, researchers encourage the use of common standards for sample data identifiers, such as the International Generic Sample Number (IGSN). ESS-DIVE works closely with ESS Scientists to promote adoption of standard data reporting formats, which researchers are using to develop tools for data validation, advanced search within data files, and data synthesis.

ESS-DIVE offers project-specific features that allow researchers to collaborate and share data within their teams efficiently. Project data portals allow you to create a collection of project datasets along with contextual information, making project data more findable and accessible. ESS-DIVE also supports a secondary storage layer to serve very large, hierarchical datasets. This allows users to directly browse and access large volumes of data over the web, and efficiently move data between sites using a high-performance data transfer service–Globus. ESS-DIVE is integrated with DataONE, a federation of interoperable data repositories facilitating open science and data discovery.