2024 Abstracts

The National Microbiome Data Collaborative: A Community-Driven Data Infrastructure

Authors

Mark Andrew Miller1* (MAM@lbl.gov), Michal Babinski1, Patrick Chain1, Mark Flynn1, Bin Hu1, Leah Johnson1, Julia Kelliher1, Kaitlyn Li1, Po-E (Paul) Li1, Chien-chi Lo1, Francie Rodriguez1, Migun Shakya1, Yan Xu1, Antonio Camargo2, Shane Canon2, Eric Cavanna2, Shreyas Cholia2, Alicia Clum2, Emiley Eloe-Fadrosh2, Kjiersten Fagnan2, Patrick Kalita2, Wendi Lynch2, Nigel Mouncey2, Sierra Moxon2, Sujay Patil2, Simon Roux2, Setareh Sarrafan2, Shalki Shrivastava2, Michael Thornton2, Yuri Corilo3, Kevin Fox3, Grant Fujimoto3, Cameron Giberson3, Katherine Heal3, Douglas Mans3, Lee Ann McCue3, Bea Meluch3, Paul Piehowski3, Camilo Posso3, Anastasiya Prymolenna3, Sam Purvine3, Montana Smith3, James Tessmer3, Brynn Zalmanek3, Jeff Baumes4, Mike Nagler4, Mary Salvi4, Jing Cao5, Donny Winston5, Chris Mungall2

Institutions

1Los Alamos National Laboratory, Los Alamos, NM; 2Lawrence Berkeley National Laboratory, Berkeley, CA; 3Pacific Northwest National Laboratory, Richland, WA; 4Kitware Inc, NY; 5Polyneme LLC, NY

URLs

Abstract

Microbes play a key role in many environmental processes. Microbiome data is multi-faceted, encompassing molecular/omics data (sequence, metabolic, proteomic, natural organic matter) as well as biogeochemical. Organizing and integrating this data presents many technological and organizational challenges. The DOE National Microbiome Data Collaborative (NMDC) was created to tackle interdisciplinary environmental microbiome science by connecting data, people, and ideas.

The NMDC provides three core products for microbiome scientists: A user-friendly web interface for submitting data and metadata about collected samples, a platform for analyzing sample omics data, and a web portal that allows members of the community to explore and collate datasets, using either a web interface or Application Programming Interfaces (APIs).

NMDC is committed to the “FAIR” principles to make data Findable, Accessible, Interoperable, and Reusable. Each piece of information in the NMDC database, from the source sample through to processed data objects, is assigned a unique, persistent, resolvable identifier. Sample metadata follows the Genomic Standards Consortium standards, and terms from the Environment Ontology (ENVO) are used to annotate sample environments. The unified data model weaves together these standards, organized around core concepts (e.g., studies, samples, analytes, computational workflows, data objects, gene functions), as well as different properties (e.g., soil moisture content, pH, metabolite concentration, etc.). Researchers are developing a Field Notes mobile application that allows for real-time sample collection.

The NMDC has a number of partnerships with other organizations. An example is the partnership with the NSF National Ecological Observatory Network (NEON) to provide access to paired metagenome and environmental data. Using NMDC APIs it is possible to explore relationships between metagenome features such as taxonomic community and NEON environmental variables, or to combine with other datasets to perform larger meta-analyses.

Together, the NMDC is advancing how scientists create, use, and reuse microbiome data.