ESS-DIVE: Enabling Integration Across Diverse Environmental Systems Science Data

Authors

Joan Damerow¹* ([email protected]), Shreyas Cholia², Deb Agarwal², Matthew Brooke³, Madison Burrus¹, Hesham Elbashandy², Valerie Hendrix², Matthew B. Jones³, Mario Melara², Rushiraj Nenuji³, Fianna O’Brien², Dylan O’Ryan¹, Sarah Poon², Emily Robles¹, Shalki Shrivastava², Jing Tao³, Karen Whitenack², Catherine Wong², Charuleka Varadharajan¹

Institutions

¹Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA; ²Scientific Data Division, Lawrence Berkeley National Laboratory, Berkeley, CA; ³National Center for Ecological Research and Synthesis, Santa Barbara, CA

URLs

https://ess-dive.lbl.gov/

Abstract

The ESS Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a data repository designed for the U.S. DOE’s ESS program. ESS-DIVE enables collection, storage, management, and sharing of a variety of observational, experimental, and modeling data generated across the program. The volume, complexity, and diversity of these interdisciplinary data present unique integration challenges.

Researchers discuss how ESS-DIVE approaches data integration across these datasets and with other data systems.

Metadata in ESS-DIVE are published in a number of formats, including the JSON-LD format, which allows the data to be easily ingested and understood by external systems (e.g., Google Dataset Search, OSTI, Data.gov etc.). ESS-DIVE provides a systematic method for linking datasets from other recognized data providers. This allows metadata to be searchable in ESS-DIVE, while linking out to externally managed data products. In order to track and relate sample data across systems, researchers encourage the use of common standards for sample data identifiers, such as the International Generic Sample Number (IGSN). ESS-DIVE works closely with ESS Scientists to promote adoption of standard data reporting formats, which researchers are using to develop tools for data validation, advanced search within data files, and data synthesis.

ESS-DIVE offers project-specific features that allow researchers to collaborate and share data within their teams efficiently. Project data portals allow you to create a collection of project datasets along with contextual information, making project data more findable and accessible. ESS-DIVE also supports a secondary storage layer to serve very large, hierarchical datasets. This allows users to directly browse and access large volumes of data over the web, and efficiently move data between sites using a high-performance data transfer service–Globus. ESS-DIVE is integrated with DataONE, a federation of interoperable data repositories facilitating open science and data discovery.