Data Synthesis and Management Activities for NGEE-Tropics

Authors

Gilberto Pastorello1* (gzpastorello@lbl.gov), Emily Robles1, Val Hendrix1, Robinson Negron-Juarez1, Anna Weber2, Lin Meng3, Jeffrey Chambers1

Institutions

1Lawrence Berkeley National Laboratory, Berkeley, CA; 2University of CaliforniaBerkeley, CA; 3Vanderbilt University, Nashville, TN

URLs

Abstract

The two main data-related activities of NGEE-Tropics are providing data archiving for project data via the NGEE-Tropics Archive and performing data curation and synthesis. This presentation highlights recent data archive developments including streamlining dataset creation workflows, which facilitates usage by data contributors by automating much of the work. For instance, digital object identifiers (DOIs) are now automatically created, becoming immediately available to data contributors once the dataset is submitted for review. The project also reviewed best practices aligned with community established formats, allowing for any potential issues, like incomplete information for metadata fields, to be caught early in the dataset publication process. The research team showed usage metrics for datasets deposited, number of system users, data downloads, and size and variety of data deposited. The large variety in the types of data deposited enabled team involvement in the creation of new data and metadata formats and standards, particularly in collaboration with other ESS-funded projects like ESS-DIVE and AmeriFlux. Other new features have also helped improve the overall process, such as versioning for datasets and enabling easy updates and corrections, and support for draft datasets, raw data, and partner data, which might have accessibility restrictions. All these improvements are aimed at making data archiving an integral part of the regular scientific workflow for the project. From the long-term data preservation perspective, the NGEE-Tropics Archive is now fully integrated with ESS-DIVE, the official long-term repository for ESS data. All public-access datasets are now synchronized with ESS-DIVE, and new datasets are synchronized in near-real-time with publication in the NGEE-Tropics Archive. With that, long-term preservation of NGEE-Tropics data is ensured, while still fully supporting team needs. Data curation activities included synthesizing and quality controlling several datasets used in recent publications for the project, including sap flow, micrometeorology, soil measurements, leaf-level measurements, and others. For instance, forest inventory data from BIONTE, a 30-plus-year logging experiment in Manaus, underwent quality control, integration of data from multiple sources, and georeferencing of every tree, enabling the team to better prepare for field campaigns.