WABAD-Europe and ESC50 datasets formatted for machine learning

20260 citationsDatasetgreen Open Access

Authors

Corentin Bernard · Laboratoire d’Informatique et Systèmes

Abstract

This dataset contains BirdNET embeddings, true labels, and acoustice indices values computed from the European recordings of the WABAD dataset V1 (A World Annotated Bird Acoustic Dataset for Passive Acoustic Monitoring). WABAD dataset corresponding authors: Cristian Pérez Granados (cristian.perez@ctfc.cat), Esther Sebastián-González (esther.sebastian@ua.es).Since the WABAD dataset is regularly updated, it is advisable to access the original files here for further research: https://zenodo.org/records/17293588. The WABAD dataset is composed of one-minute audio files (.wav) with corresponding Audacity and Raven Pro annotations at the species level, including start/end times and low/high frequency bounds. Embeddings and labels were also computed for the ESC-50 dataset, which contains environmental sounds: https://github.com/karolpiczak/ESC-50. The two datasets were formatted for machine learning as part of the following studies: Bernard, C., McEwen, B., Cretois, B., Glotin, H., Stowell, D., & Marxer, R. (2025). Data-driven Sampling Strategies for Fine-Tuning Bird Detection Models. bioRxiv. 2025-10.https://www.biorxiv.org/content/10.1101/2025.10.02.679964v1. The ‘results.zip’ file contains the intermediate computation results used in the GitHub repository associated with the article: https://github.com/mim-team/PAM_data_sampling. McEwen, B., Bernard, C., & Stowell, D. (2025). Stratified Active Learning for Spatiotemporal Generalisation in Bioacoustic Monitoring. BioRxiv, 2025-09.https://www.biorxiv.org/content/10.1101/2025.09.01.673472v2. Data processing steps: Dataset curation. Random split of the one-minute audio files into training (40%), validation (10%) and test (50%) sets. Segmentation of audio files into 3-seconds chunks. Computation of BirdNET predictions, uncertainty scores, and embeddings using https://github.com/birdnet-team/BirdNET-Analyzer. Computation of acoustic indices with Scikit-maad https://scikit-maad.github.io/. Storage of results in python .pkl files.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19064409

Command Palette

WABAD-Europe and ESC50 datasets formatted for machine learning

Authors

Abstract

Topics & Keywords

Publication Details