Data-Driven Modeling Strategies for Predicting Stream Flow and Temperature at Watershed to Continental Scales

Authors

Charuleka Varadharajan¹* (cvaradharajan@lbl.gov), Jared Willard^1,2, Fabio Ciulla¹, Helen Weierbach¹, Vipin Kumar², Aranildo R. Lima³

Institutions

¹Lawrence Berkeley National Laboratory, Berkeley, CA; ²University of Minnesota–Twin Cities, Minneapolis/St. Paul, MN; ³Environment and Climate Change Canada, Vancouver, Canada

URLs

https://sites.google.com/lbl.gov/inaiads

Abstract

Accurate and timely predictions of river flows and water quality at local to regional scales are needed to optimize watershed management strategies under a changing climate with increasing occurrences of extreme events. Since extreme events have unpredictable timing, duration, and spatial extents, assessing their impacts on rivers requires the ability to predict in unmonitored or poorly monitored basins. Classical bottom-up approaches for these predictions involve regionalization of statistical or process-based models built at representative, monitored sites based on different measures of watershed similarity. Recent top-down machine learning (ML) models use large continental-scale datasets and increasingly outperform traditional models for regional predictions. Both approaches depend on the concept of similarity based on catchment traits, which are properties such as topography, geology, land cover, land use, and other human activities. These traits interact and co-evolve with each other and with climate forcings to influence how watersheds function at different scales.

Here, the team presents modeling approaches to improve stream flow and water temperature predictions using various data-driven techniques. Researchers compare a top-down deep learning model against a bottom-up transfer learning approach for >1,400 catchments in the United States and examine the appropriate use of watershed traits for both approaches. The project uses networks, mutual information, and feature importance scores to reduce the redundancy in the trait data and determine the optimal set of traits to model different hydrologic functions. Such selection is independently tested in ML models to verify that the chosen traits enhance their predictive power. Results show that the top-down global ML model outperforms the bottom-up models for most sites. The team also finds that although ML models generally perform better with more data, they benefit from having a diversity of training data rather than strictly larger volumes of data and can suffer from performance degradation with redundant inputs.