Improvements to Knowledgebase Platform Toward Causal Predictive Ecology

Authors

Benjamin Allen³, Jason Baumohl¹, Kathleen Beilsmith², David Dakota Blair⁴, Mikaela Cashman¹, John-Marc Chandonia¹, Dylan Chivian¹, Zachary Crockett³, Ellen G. Dow¹, Meghan Drake³, Janaka N. Edirisinghe², José P. Faria², Jason Fillman¹, Andrew Freiburger², Tianhao Gu², A. J. Ireland¹, Marcin P. Joachimiak¹, Sean Jungbluth¹, Roy Kamimura¹, Keith Keller¹, Dan Klos², Miriam Land³, Filipe Lui², Chris Neely¹, Erik Pearson¹, Gavin Price¹, Priya Ranjan³, William J. Riehl¹, Boris Sadkhin², Samuel Seaver², Alan Seleman², Gwyneth Terry¹, Pamela Weisenhorn², Ziming Yang⁴, Shinjae Yoo⁴, Qizhi Zhang²; Shane Canon¹, Paramvir S. Dehal¹, Elisha M. Wood-Charlson¹; Robert Cottingham³, Chris Henry²*(chenry@anl.gov), Adam P. Arkin¹

Institutions

¹Lawrence Berkeley National Laboratory, Berkeley, CA; ²Argonne National Laboratory, Argonne, IL; ³Oak Ridge National Laboratory, Oak Ridge, TN; ⁴Brookhaven National Laboratory, Upton, NY

URLs

https://www.kbase.us/

Abstract

Given the focus of DOE BER programs on environmental ecology, Knowledge Base (KBase) has prioritized predictive causal ecology as its scientific target. Working with DOE-funded collaborators, researchers integrated a set of tools within KBase that capture many of the steps required to support a mechanistic understanding of environmental system behavior (Figure 1). Over the past year, in partnership with many projects, the team made significant progress implementing and improving many steps in the causal ecology workflow. This begins with improving the mechanisms for predicting the protein content of an environment based on metagenome-assembled genomes (MAGs) and amplicon sequence variant (ASVs). Researchers now have tools to map these entities to reference data to produce probabilistic predictions of what functions are present and how they are distributed among species in the environment. The team will show how this work has been applied to datasets from Genome Resolved Open Wetlands (GROW) and the Joint Genome Institute (JGI) to improve MAG quality and produce improved strain models. Once researchers have improved understanding of the protein content of a strain within an environment, the team next wants to produce a metabolic model of each strain to predict its behavior. Using an upgraded ModelSEED2 pipeline for model reconstruction, researchers significantly improved capacity for predicting energy biosynthesis strategies; improved templates for representing cyanobacteria, archaea, and bacteria; tailored biomass formulations based on metabolic network analysis; and improved predictions of organism metabolic properties like auxotrophy. Researchers are also developing enhanced tools for predicting strain phenotypes and exploring the synergy between machine-learning predictions and metabolic modeling. Finally, by combining strain models into community metabolic models and integrating multiomics data, the team can predict and contrast the active pathways and species interactions to discover patterns of variation with environmental parameters. The team will demonstrate these capabilities with narrative notebooks illustrating each workflow that is currently available.