Knowledge-Guided Clustering of Multidomain Data to Improve Predictions of Aerobic Respiration in River Corridors
Authors
Hugh C. McCullough1*, Sanjog Kharel1, Manokaran Veeramani1, Mikayla Borton2, James Stegen3, Christopher Henry4, Hyun-Seob Song1 ([email protected])
Institutions
1University of Nebraska–Lincoln, NE; 2Colorado State University, Fort Collins, CO; 3Pacific Northwest National Laboratory, Richland, WA; 4Argonne National Laboratory, Argonne, IL
Abstract
Thermodynamics-based biogeochemical modeling is increasingly popular, due to its ability to detail substrate chemistry and kinetics with a limited set of parameters. These models typically assume that all detected compounds are respirable, with degradation rates determined primarily by thermodynamic favorability. However, this assumption may not accurately reflect the complexity of data from diverse river systems, which contain differing microbial metabolic potential and growth rates in addition to substrate chemical and thermodynamic properties—all of which should influence whether a compound is respirable. In the team’s previous analysis of high-resolution organic matter (OM) profiles from sediment samples collected as part of the Worldwide Hydro-biogeochemistry Observation Network for Dynamic River Systems (WHONDRS) Summer 2019 Sampling Campaign (Ahamed et al. 2023), the team identified interpretable key factors influencing OM bioavailability without accounting for sample-specific microbial traits. Therefore, in this work, researchers analyzed metagenomes and metadata from the same sampling campaign to better determine sample-specific microbial growth rates. The team filtered these additional features to those expected to directly influence respiration. Incorporating this data with the OM profiles, researchers have large data, which spans physical, chemical, and biological contexts. To reduce complexity and extract interpretable findings, researchers performed dimensionality reduction on the chemical and metabolic features. The resulting principal components and physical data were used for clustering. Researchers were then able to account for environment-specific microbial traits by assigning distinct maximal growth rates among clusters, formerly assumed to be constant across samples. This analysis not only to improvements to predicted respiration rates but also highlights key drivers in diversity among microbial traits. As the next step, the team aims to incorporate this new knowledge into biogeochemical and reactive-transport models.
References
Ahamed, F., et al. 2023. “Exploring the Determinants of Organic Matter Bioavailability Through Substrate-Explicit Thermodynamic Modeling,” Frontiers In Water 5, 1169701.