Random Forest-Based Multi-Species Habitat Suitability Model for Montenegro

20260 citationsDatasetgreen Open Access

Authors

Filip Vujovic · Environmental Protection Agency

Dragan Roganovic · Environmental Protection Agency

Abstract

Analytical Workflow and Methodology 1. Data Preparation A set of environmental predictor variables was prepared to represent climatic, topographic, and landscape conditions influencing species distribution. All raster layers were harmonized to a common spatial resolution, extent, and coordinate reference system. All predictor variables were standardized to a spatial resolution of 100 m, ensuring consistency across datasets and enabling fine-scale habitat suitability modeling. The variables included: TEMP (temperature) PREC (precipitation) NDVI_mean (vegetation productivity) Elevation Slope Aspect LAND_COVER WATER_0_100m RIVER_0_100m The LAND_COVER layer is based on the CORINE Land Cover classification, further enhanced using national datasets collected for the establishment of the Natura 2000 network. In addition, it was supplemented with vegetation data derived from Natura 2000 habitat mapping, as well as updated satellite-based classifications from Sentinel 2, ensuring improved thematic accuracy and ecological relevance. All layers were clipped and masked to the boundary of Montenegro. Urban land cover classes were excluded to minimize anthropogenic bias. 2. Species Data Processing Species occurrence data were obtained from a structured database (Excel format), including geographic coordinates and taxonomic grouping. Preprocessing included: removal of records without coordinates standardization of species names filtering species with at least 20 occurrence points coordinate rounding to reduce duplicates Additionally, species were grouped into major taxonomic categories for further analysis and result aggregation, including: REPTILIA MAMMALIA CHIROPTERA LEPIDOPTERA COLEOPTERA ORTHOPTERA ODONATA DECAPODA BRYOPHYTA These groups were used to organize model outputs and to evaluate model performance across different biological taxa. 3. Spatial Filtering (Thinning) To reduce spatial autocorrelation and sampling bias, spatial thinning was applied: minimum distance between occurrence points: 1 km one occurrence retained per spatial cluster This step improves model robustness and reduces overfitting. 4. Core Habitat Identification For each species, dominant land cover classes were identified by analyzing the frequency of occurrence within LAND_COVER categories. Classes with at least 20 records were defined as core habitat classes, representing ecologically optimal conditions. These were used for model calibration and evaluation. 5. Training and Testing Data Split Occurrence data within core habitats were divided into: 70% training dataset 30% testing dataset These datasets were exported as: train_presence_points.csv test_presence_points.csv Each CSV file includes full attribute information, including coordinates and associated LAND_COVER class. 6. Pseudo-absence Generation Pseudo-absence points were generated in areas outside core land cover classes. number of pseudo-absence points ≈ 2× number of presence points spatially distributed across unsuitable habitats This ensures a balanced dataset for model training. 7. Model Development Habitat suitability was modeled using the Random Forest (RF) algorithm. Model parameters: 500 trees combination of continuous and categorical variables LAND_COVER treated as a categorical predictor The model performs binary classification: 1 = presence 0 = absence 8. Model Evaluation Model performance was evaluated using: ROC curve (Receiver Operating Characteristic) AUC (Area Under the Curve) AUC values were calculated for each species and exported to: AUC_results.xlsx 9. Variable Importance Analysis The importance of predictor variables was assessed using the Random Forest metric: Mean Decrease in Accuracy Results were exported as: importance.csv This allows identification of key environmental drivers influencing species distribution. 10. Spatial Prediction The trained model was applied across the entire study area to generate continuous habitat suitability maps (values ranging from 0 to 1). To incorporate ecological constraints: areas outside core land cover classes were penalized (values reduced by 50%) Final outputs were exported as GeoTIFF files: RF_[species_name].tif 11. Output Organization All outputs were structured hierarchically by taxonomic group and species: /REPTILIA//Species_name//MAMMALIA//Species_name//LEPIDOPTERA//Species_name/... Each species folder contains: habitat suitability raster variable importance table training and testing datasets model evaluation results 12. Summary This workflow integrates spatial data processing, machine learning, and ecological filtering to produce robust habitat suitability models across multiple taxonomic groups, ensuring both methodological consistency and ecological relevance.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19162466

Command Palette

Random Forest-Based Multi-Species Habitat Suitability Model for Montenegro

Authors

Abstract

Topics & Keywords

Publication Details