Improvements to Knowledgebase Platform Toward Causal Predictive Ecology
Authors
Benjamin Allen3, Jason Baumohl1, Kathleen Beilsmith2, David Dakota Blair4, Mikaela Cashman1, John-Marc Chandonia1, Dylan Chivian1, Zachary Crockett3, Ellen G. Dow1, Meghan Drake3, Janaka N. Edirisinghe2, José P. Faria2, Jason Fillman1, Andrew Freiburger2, Tianhao Gu2, A. J. Ireland1, Marcin P. Joachimiak1, Sean Jungbluth1, Roy Kamimura1, Keith Keller1, Dan Klos2, Miriam Land3, Filipe Lui2, Chris Neely1, Erik Pearson1, Gavin Price1, Priya Ranjan3, William J. Riehl1, Boris Sadkhin2, Samuel Seaver2, Alan Seleman2, Gwyneth Terry1, Pamela Weisenhorn2, Ziming Yang4, Shinjae Yoo4, Qizhi Zhang2; Shane Canon1, Paramvir S. Dehal1, Elisha M. Wood-Charlson1; Robert Cottingham3, Chris Henry2*([email protected]), Adam P. Arkin1
Institutions
1Lawrence Berkeley National Laboratory, Berkeley, CA; 2Argonne National Laboratory, Argonne, IL; 3Oak Ridge National Laboratory, Oak Ridge, TN; 4Brookhaven National Laboratory, Upton, NY
URLs
Abstract
Given the focus of DOE BER programs on environmental ecology, Knowledge Base (KBase) has prioritized predictive causal ecology as its scientific target. Working with DOE-funded collaborators, researchers integrated a set of tools within KBase that capture many of the steps required to support a mechanistic understanding of environmental system behavior (Figure 1). Over the past year, in partnership with many projects, the team made significant progress implementing and improving many steps in the causal ecology workflow. This begins with improving the mechanisms for predicting the protein content of an environment based on metagenome-assembled genomes (MAGs) and amplicon sequence variant (ASVs). Researchers now have tools to map these entities to reference data to produce probabilistic predictions of what functions are present and how they are distributed among species in the environment. The team will show how this work has been applied to datasets from Genome Resolved Open Wetlands (GROW) and the Joint Genome Institute (JGI) to improve MAG quality and produce improved strain models. Once researchers have improved understanding of the protein content of a strain within an environment, the team next wants to produce a metabolic model of each strain to predict its behavior. Using an upgraded ModelSEED2 pipeline for model reconstruction, researchers significantly improved capacity for predicting energy biosynthesis strategies; improved templates for representing cyanobacteria, archaea, and bacteria; tailored biomass formulations based on metabolic network analysis; and improved predictions of organism metabolic properties like auxotrophy. Researchers are also developing enhanced tools for predicting strain phenotypes and exploring the synergy between machine-learning predictions and metabolic modeling. Finally, by combining strain models into community metabolic models and integrating multiomics data, the team can predict and contrast the active pathways and species interactions to discover patterns of variation with environmental parameters. The team will demonstrate these capabilities with narrative notebooks illustrating each workflow that is currently available.