January 06, 2023
Machine Learning Models Inaccurately Predict Current and Future High-Latitude Carbon Balances
Process model simulations reveal shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.
The Science
The high-latitude carbon cycle is an important, complex, and highly uncertain component of the global climate system. A growing number of studies have relied on machine learning methods to create regional estimates of current and future ecosystem properties (e.g., carbon balance) based on a small number of site measurements. Because there are few observational data, machine learning model predictions are rarely tested against independent measurements. In this study, a novel approach is used to uncover large biases in machine learning predictions of current and future high-latitude carbon balance.
The Impact
Machine learning methods are shown to incorrectly predict that Alaska is currently a net source of carbon when existing site coverage is used for training. This result mirrors a current mismatch between ecosystem model and machine learning estimates of high-latitude carbon balances and points to insufficient site coverage as a likely cause. This study demonstrates that machine learning methods are unable to predict how ecosystem carbon fluxes will respond to climate change because training data cannot capture important relationship changes. These findings highlight the need for cautious interpretation of machine learning predictions of current and future ecosystem processes.
Summary
In this study, carbon fluxes and environmental data are simulated across Alaska using ecosys, a process-rich terrestrial ecosystem model. Boosted regression tree machine learning algorithms are then applied to different subsets of simulated data that mirror and expand upon existing AmeriFlux eddy-covariance data availability. Machine learning predictions across the entire domain are compared to simulated data to understand how variation in site coverage and climate forcing impacts typical data-driven machine learning upscaling and forecasting approaches.
When current Alaska AmeriFlux data coverage is used for training, machine learning methods incorrectly predict that Alaska is a net carbon source. Machine learning predictions are improved with increased spatial coverage of the training dataset (e.g., bias is halved when 240 modeled sites are used instead of 15). However, even the machine learning model trained with 240 sites does not match the substantial increase in Alaska carbon sink strength simulated by ecosys throughout the 21st century. Convergence cross-mapping is used to show that degradation of machine learning model projections can be ascribed to changes in atmospheric CO2, litter inputs, and vegetation composition. This study reveals large shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.
Principal Investigator
William J. Riley
Lawrence Berkeley National Laboratory
[email protected]
Co-Principal Investigator
Ian Shirley
Lawrence Berkeley National Laboratory
[email protected]
Program Manager
Daniel Stover
U.S. Department of Energy, Biological and Environmental Research (SC-33)
Environmental System Science
[email protected]
Funding
This research was supported by the Biological and Environmental Research (BER) Program within the U.S. Department of Energy’s (DOE) Office of Science as part of the Next-Generation Ecosystem Experiments-Arctic (NGEE-Arctic) project.
References
Shirley, I. A., et al. "Machine Learning Models Inaccurately Predict Current and Future High-Latitude C Balances." Environmental Research Letters 18 (1), 014026 (2023). https://doi.org/10.1088/1748-9326/acacb2.