December 15, 2021

A New Tool for Diverse Environmental Data Integration

BASIN-3D reduces the data processing burden for Earth scientists, making it easier to pull together data from different sources.

Image is described in caption.

Conceptual figure showing how the BASIN-3D broker would connect to various data sources across organizations and present users with an integrated view of the data.

[Courtesy ESS-DIVE Basin-3D Team.]

The Science

Earth data include measurements and model results of physical, chemical, and biological processes in ecosystems. The data are diverse and often stored across many databases, with different formats and conventions. A new software tool called Broker for Assimilation, Synthesis, and Integration of eNvironmental Diverse, Distributed Datasets (BASIN-3D) helps reduce the burden on scientists to integrate their research data by acting as a “broker” that retrieves data on demand from different sources and transforms it into a unified view. This study presents two applications of BASIN-3D to integrate time series (data collected at different time intervals). The first is for advanced search and exploration of data on a web portal, and the second is to provide data to machine learning models for water quality predictions.

The Impact

The BASIN-3D software helps environmental researchers who use data from public and private sources address some critical challenges by automating the process of pulling together data from different sources. Thus, it enables users to have access to the latest data available from providers of their choice without having to manually download data and reconcile differences. This software can be used to support data integration for both web-based tools and data analytics. It is also applicable to environmental field and modeling studies requiring data integration.

Summary

Earth scientists invest significant effort integrating data from multiple data sources for both modeling and data analyses. This study introduces BASIN-3D as a data brokering approach to reduce the data processing burden on scientists. BASIN-3D can synthesize diverse data from different sources on demand, without the need for additional storage. The software is currently implemented to integrate time series earth observations across a hierarchy of spatial locations commonly used in field measurements (such as river basins, watersheds, sites, plots, and wells). Its framework enables users to map data sources of interest to a common format. The utility of this tool is demonstrated in two applications: (1) a web portal that allows scientific users to explore and access data through features such as an interactive map, graphs, and download; and (2) a Python package that can be embedded in scripts to input data to machine learning models for water quality predictions. Hence, BASIN-3D can be used to support data integration for both web-based tools and data analytics.

Principal Investigator

Charuleka Varadharajan
Lawrence Berkeley National Laboratory
[email protected]

Co-Principal Investigator

Deborah Agarwal
Lawrence Berkeley National Laboratory
[email protected]

Program Manager

Jennifer Arrigo
U.S. Department of Energy, Biological and Environmental Research (SC-33)
Environmental System Science
[email protected]

Funding

This research was supported by the Biological and Environmental Research (BER) Program within the U.S Department of Energy’s (DOE) Office of Science.

Related Links

References

Varadharajan, C., et al. "A Brokering Framework to Integrate Diverse Environmental Data." Computers & Geosciences 159 105024  (2022). https://doi.org/10.1016/j.cageo.2021.105024.