A Chemistry-Informed Hybrid Machine Learning Approach to Predict Metal Sorption to Mineral Surfaces


Elliot Chang1*, Mavrik Zavarin1 ([email protected]), Linda Beverly2, Haruko Wainwright3


1Lawrence Livermore National Laboratory, Livermore, CA; 2California State University–East Bay, Hayward, CA; 3Lawrence Berkeley National Laboratory, Berkeley, CA



The Lawrence Livermore National Laboratory (LLNL) Surface Complexation/Ion Exchange (L-SCIE) database is a recent effort to unify community adsorption experiments and metadata in a findable, accessible, interoperable, and reusable (FAIR) format (Wilkinson et al. 2016). To date, it has mined over 27,000 raw adsorption data from the literature and provides a platform to test novel approaches to surface complexation modeling and surface complexation database development. Briefly, L-SCIE mines sorption data (e.g., Kd, % sorbed, surface excess) and dataset experimental conditions (e.g., background electrolyte, mineral surface area, gas composition) from journal manuscripts and loads them into a database. The sorption data undergo a series of unit conversions to yield a unified database that includes propagated conversion errors from the original extracted data. The database can then be filtered for a mineral-metal pair of interest to display a corresponding experimental dataset. The application of the L-SCIE database to traditional surface complexation modeling was illustrated in a recent publication (Zavarin et al. 2022).

An alternative hybrid machine learning (ML) approach will be presented that shows promise in achieving equivalent high-quality predictions compared to traditional surface complexation models. At its core, the hybrid random forest (RF) ML approach is motivated by the proliferation of incongruent surface complexation models (SCMs) in the literature that limit their applicability in reactive transport models. This project’s hybrid ML approach implements PHREEQC-based aqueous speciation calculations; values from these simulations are automatically used as input features for an RF algorithm to quantify adsorption and avoid SCM modeling constraints entirely. Named the LLNL Speciation Updated Random Forest (L-SURF) model, this hybrid approach is shown to have applicability to uranium(VI) sorption cases driven by both ion-exchange and surface complexation, as is shown for quartz and montmorillonite cases. The approach can be applied to reactive transport modeling and may provide an alternative to the costly development of self-consistent SCM reaction databases.


Wilkinson, M., et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship,” Scientific Data 3, 160018. DOI:10.1038/sdata.2016.18.

Zavarin, M., et al. 2022. “Community Data Mining Approach for Surface Complexation Database Development,” Environmental Science & Technology 56(4), 2827–2838. DOI:10.1021/acs.est.1c07109.