Quantifying Spatial and Vertical Variations in Soil C:N Relationships for the Permafrost Region Using Machine Learning


Joshua Minai1* ([email protected]), Julie Jastrow1, Roser Matamala1, Chien-Lu Ping2, Gary Michaelson2, Nicolas Jelinski3


1Argonne National Laboratory, Lemont, IL; 2University of AlaskaFairbanks, AK; 3University of Minnesota–Twin Cities, Minneapolis/St. Paul, MN



Soil property ratios are often used to characterize soil organic matter composition and other measures of soil quality. However, mapping and spatial interpretation of soil property ratios is challenging, with no set standard, particularly for the heterogeneous profiles of permafrost-affected soils. Two different approaches—direct and indirect mapping—can be used. For direct mapping, property ratios determined for each soil observation are used to predict their distribution within the landscape. With indirect mapping, each property is predicted independently across the landscape, and the ratio of the resulting two maps are then used to calculate the predicted ratio for each map pixel. Researchers used observations of soil organic carbon (C) and total nitrogen (N) stocks in Alaska to investigate which mapping approach best captures the distribution of soil property ratios for cold region soils. The specific objectives were to: (1) evaluate which approach best captures the distribution of soil C:N ratios at high spatial resolution for a selected latitudinal transect in Alaska and (2) identify which environmental covariates are important predictors of the spatial and vertical variation of soil C:N ratios within the study area. Three machine learning approaches (cubist, Random Forest, and extreme gradient boosting) were compared. Maps of predicted soil C:N ratios were generated at a spatial resolution of 34 m for three depth increments within the surface meter. Overall, indirect mapping with Random Forest performed best at depths of 0 to 30 cm (R2=0.27, RMSE=4.88) and 30 to 60 cm (R2=0.20, RMSE=7.57), whereas direct mapping with cubist performed best at 60 to 100 cm (R2=0.36, RMSE=5.92).

Even though temperature was the most dominant predictor overall, terrain attributes were most important at finer scales. Conversely, parent material was the least important predictor. Both mapping approaches, however, underpredicted low observations and overpredicted high observations due to the relatively low sampling density and the uneven distribution of observation locations and their C:N values within the study area. Knowledge gained from this work will inform ongoing efforts to map soil C:N ratios, as well as soil organic C and N stocks, for the state of Alaska at a spatial resolution of 34 m. Ultimately, the maps will be useful for informing and benchmarking large-scale land surface models.