Does Increasing Number of Field Observations Necessarily Improve the Performance of an ML-Based Soil Parent Material Prediction?

20260 citationsJournal Articlehybrid Open Access

Authors

Tünde Takáts · Eötvös Loránd University

János Mészáros · Centre for Agricultural Research

Zsófia Adrienn Kovács · Centre for Agricultural Research

Kitti Balog · Centre for Agricultural Research

Gáspár Albert · Eötvös Loránd University

László Pásztor · Centre for Agricultural Research

Abstract

The study aimed to elaborate a large-scale soil parent material (SPM) map for a 90 km2 pilot area in Hungary applying Digital Soil Mapping techniques. Initially, a coarse-scale geological map harmonized with Food and Agriculture Organization (FAO) SPM classes provided the reference data, later supplemented by field observations. Due to the limited ground truth data, we tested how increasing number of field observations affected prediction performance. Random Forest and Gradient Boosting Machine algorithms were applied in a 100-fold simulation-based method using a broad set of environmental covariates including satellite imagery, SRTM DEM derivatives and digital soil property maps. Virtual training and testing data were generated by sampling the FAO-based SPM map and 200 visual field observations were collected. Testing was structured by incrementally adding 50 field points to a randomly generated set. Validation of classified maps was carried out by: i) the overall mean accuracy (OMA) of the predicted maps, ii) the number of predicted classes (NPC) of each pixel and iii) the percentage of the most frequently (MFP) predicted class. Unexpectedly, the inclusion of field data led to a decrease in OMA, suggesting potential data quality issues. While some improvement was observed in reducing highly uncertain categories based on NPC, MFP and the total extent of uncertain areas did not consistently improve. These results highlight that increasing the quantity of field data only is insufficient; future efforts should focus on strategic sampling, data quality assessment, and its integration methods to improve model reliability. This graphical abstract illustrates a Digital Soil Mapping study conducted in the Dorog Basin (90 km2) in Hungary. The study aimed to generate a large-scale Soil Parent Material (SPM) map of the area. The workflow integrates coarse-scale geological data harmonized with FAO SPM classes, satellite imagery, terrain derivatives, as well as digital soil maps. Machine learning algorithms –Random Forest and Gradient Boosting– were applied in a 100-fold simulation using virtual and field-based training data. The infographic highlights: (1) Reference data sources: including spectral information from satellite imagery, topographic information from digital terrain models, virtual sampling data from FAO-based SPM maps, and field observations. (2) Field observation data application strategy: Iterative testing with an increasing number of field points (from 0 to 200). (3) Machine learning algorithms: Random Forest and Gradient Boosting. (4) Validation metrics of the mappig: Overall Mean Accuracy (OMA), Number of Predicted Classes (NPC), and Most Frequently Predicted Class (MFP). (5) Key findings: Adding more field data did not consistently improve model accuracy, revealing that data quality and sampling strategy are more critical than quantity alone. A large-scale soil parent material (SPM) map was created for a 90 km2 pilot area in Hungary using Digital Soil Mapping (DSM) techniques. ML models (RF, GBM) were trained using extensive environmental covariates, including satellite data, topographical and soil information. Virtual training data were generated from FAO-based SPM map, then field observations were iteratively added in steps of 50 to assess their effect. A comprehensive evaluation of model performance using: overall accuracy, number of predicted classes, and the most frequently predicted class. Field data quantity alone is insufficient; strategic sampling, evaluation of data quality, and proper integration methods are also essential.

Topics & Keywords

Soil Geostatistics and Mapping Soil Carbon and Nitrogen Dynamics Geochemistry and Geologic Mapping

UN Sustainable Development Goals

Zero hunger

Publication Details

Published in: Earth Systems and Environment

DOI: 10.1007/s41748-026-01134-2

Field-Weighted Citation Impact: 0.00