January 06, 2023

Print Friendly, PDF & Email

Machine Learning Models Inaccurately Predict Current and Future High-Latitude Carbon Balances

Process model simulations reveal shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.

Map of present Alaskan net ecosystem exchange.

Present net ecosystem exchange across Alaska for target simulated data (left) and machine learning predictions (right) reveal large discrepancies in predictions. Green dots denote locations of sites used to train the machine learning model.

[Courtesy Lawrence Berkeley National Laboratory.]

The Science

The high-latitude carbon cycle is an important, complex, and highly uncertain component of the global climate system. A growing number of studies have relied on machine learning methods to create regional estimates of current and future ecosystem properties (e.g., carbon balance) based on a small number of site measurements. Because there are few observational data, machine learning model predictions are rarely tested against independent measurements. In this study, a novel approach is used to uncover large biases in machine learning predictions of current and future high-latitude carbon balance.

The Impact

Machine learning methods are shown to incorrectly predict that Alaska is currently a net source of carbon when existing site coverage is used for training. This result mirrors a current mismatch between ecosystem model and machine learning estimates of high-latitude carbon balances and points to insufficient site coverage as a likely cause. This study demonstrates that machine learning methods are unable to predict how ecosystem carbon fluxes will respond to climate change because training data cannot capture important relationship changes. These findings highlight the need for cautious interpretation of machine learning predictions of current and future ecosystem processes.


In this study, carbon fluxes and environmental data are simulated across Alaska using ecosys, a process-rich terrestrial ecosystem model. Boosted regression tree machine learning algorithms are then applied to different subsets of simulated data that mirror and expand upon existing AmeriFlux eddy-covariance data availability. Machine learning predictions across the entire domain are compared to simulated data to understand how variation in site coverage and climate forcing impacts typical data-driven machine learning upscaling and forecasting approaches.

When current Alaska AmeriFlux data coverage is used for training, machine learning methods incorrectly predict that Alaska is a net carbon source. Machine learning predictions are improved with increased spatial coverage of the training dataset (e.g., bias is halved when 240 modeled sites are used instead of 15). However, even the machine learning model trained with 240 sites does not match the substantial increase in Alaska carbon sink strength simulated by ecosys throughout the 21st century. Convergence cross-mapping is used to show that degradation of machine learning model projections can be ascribed to changes in atmospheric CO2, litter inputs, and vegetation composition. This study reveals large shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.

Principal Investigator

William J. Riley
Lawrence Berkeley National Laboratory

Co-Principal Investigator

Ian Shirley
Lawrence Berkeley National Laboratory

Program Manager

Daniel Stover
U.S. Department of Energy, Biological and Environmental Research (SC-33)
Environmental System Science


This research was supported by the Biological and Environmental Research (BER) Program within the U.S. Department of Energy’s (DOE) Office of Science as part of the Next-Generation Ecosystem Experiments-Arctic (NGEE-Arctic) project.


Shirley, I. A., et al. "Machine Learning Models Inaccurately Predict Current and Future High-Latitude C Balances." Environmental Research Letters 18 (1), 014026  (2023). https://doi.org/10.1088/1748-9326/acacb2.