Search for a command to run...
KineScaper-EV: A Controlled Dataset for Emergency Vehicle Siren Recognition, Siren-Type Benchmarking, and Detection-Oriented Analysis Overview KineScaper-EV is a mixed (real-world and synthetic sounds) controlled Emergency-Vehicle (EV) dataset designed to support binary EV recognition, multi-class siren-type classification, and Sound Event Detection (SED) oriented evaluations under explicitly parameterized acoustic conditions. The dataset is generated by combining the Dry Anechoic Siren Sound Simulator (DASSS), which produces dry anechoic emergency siren signals with controllable waveform and warning-pattern characteristics, and kinescaper, which places those signals into realistic urban soundscapes through trajectory-aware acoustic modeling. In this way, KineScaper-EV complements real-world EV corpora by providing explicit control over source trajectory, foreground/background ratio, event audibility, and annotation consistency. At the raw-data level, KineScaper-EV consists of 61 600 monophonic signals at 32kHz, each 40 seconds long, evenly distributed across 7 siren classes: hi-lo, phaser, piercer, rumbler, two-tone, wail, and yelp. The dataset is intended to support both clip-level classification and strong-label temporal evaluation, since each sample is associated with onset/offset information, siren-related attributes, trajectory-aware metadata, and SNR-oriented descriptors. Key Features Controlled Acoustic Variability: all samples are generated under parametrized combinations of source type, waveform, motion trajectory, target background SPL, and target foreground SPL. Balanced Siren Classes: the dataset is evenly distributed across 7 warning-pattern classes, with 8'800 files per class. Clip-Level and Strong Annotations: each file includes strong temporal information, with randomized onset between 3 and 10 seconds and average event duration of about 30.28 seconds. Rich Acoustic Metadata: each sample includes SNR statistics, source speed, closest source–receiver distance, target SPL values, and taxonomy-aware siren descriptors. Framework Integration: the dataset is designed to support train, benchmark, and detection modes through a dedicated Lightning Dataset/DataModule interface (Link). Methodology KineScaper-EV was introduced to complement real-world EV benchmarks with a controlled dataset in which annotation reliability, urban-traffic realism, source-motion trajectories, and SNR conditions can be explicitly regulated. The rationale is not to replace real-world sounds evaluation, but to provide a controlled benchmark where performance variations can be interpreted more clearly with respect to audibility, siren design, and background interference. The dataset generation process combines two tools developed within the same research framework. DASSS defines dry anechoic siren sources with controllable warning-pattern characteristics, while kinescaper injects them into pre-recorded urban traffic soundscapes through trajectory-aware outdoor acoustic modeling. Conceptually, the workflow inherits the annotation-oriented philosophy of Scaper and the physically grounded propagation logic of pyroadacoustics, while enabling hardware accelerated and batch-oriented synthesis. Each generated file is a 40-second mono waveform at 32kHz. Samples are organized across seven siren-pattern classes and are generated under varying source families, waveform types, target background SPL values, target foreground SPL values, and motion conditions. Event onset is randomized between 3 and 10 seconds, while event duration depends on the underlying trajectory configuration (speed, distance, elevation etc.). Metadata preserve these parameters together with SNR descriptors and event timing, making the dataset suitable for clip-level classification, siren-type recognition, and EV sound event detection analyses. Dataset Statistics General characteristics: - Total audio files: 61,600- Total positive chunks derivable in train mode (for GP-AT training consistency): 246 400 raw chunks / 234 269 positive chunks - Audio format: .wav- Channels: 1 (mono)- Sample rate: 32 kHz- Duration: 40s (per file)- Siren classes: 7 (balanced, 8'800 files each) Classes: - hi-lo: 8,800 samples- two-tone: 8,800 samples- wail: 8,800 samples- phaser: 8,800 samples- piercer: 8,800 samples- rumbler: 8,800 samples- yelp: 8,800 samples Temporal statistics: - Event onset: min 3.00s, max 9.99s, mean 6.50s- Event offset: min 19.66s, max 40.00s, mean 36.78s- Event duration: min 16.01s, max 37.00s, mean 30.28s Acoustic statistics: - Average SNR: min -54.98 dB, max 17.25 dB, mean -20.46 dB- Background SPL targets: [50, 53, 56, 59, 62, 65, 68, 71] dB- Foreground SPL targets: [80, 83, 86, 89, 92, 95, 98, 101, 104, 107, 110] dB Source taxonomy (samples): - Source: electronic (48 224), pneumatic (8 800), mechanical (4 576)- Waveforms: sine (12 760), sawtooth (12 584), trapezoid (12 232), triangle (12 056), square (11 968)- Iterations (per parameterized configuration): 20 unique iterations Detection-mode statistics: - Window size: 0.31s (default, but customizable)- Windows per sample: 130- Positive windows per sample: mean 98.92, median 102, min 53, max 121 File Structure This Zenodo release is organized as a multi-part archive to preserve the original dataset structure after extraction. KineScaper_EV_audio_hi-lo.zipKineScaper_EV_audio_two-tone.zipKineScaper_EV_audio_wail.zipKineScaper_EV_audio_phaser.zipKineScaper_EV_audio_piercer.zipKineScaper_EV_audio_rumbler.zipKineScaper_EV_audio_yelp.zip Download audio contents (.zip) from: ONEDRIVE LINK KineScaper_EV_metadata.zip └── KineScaper_EV/ └── dataset/ ├── audio/ │ ├── hi-lo_*.wav │ ├── two-tone_*.wav │ ├── wail_*.wav │ ├── phaser_*.wav │ ├── piercer_*.wav │ ├── rumbler_*.wav │ └── yelp_*.wav ├── csv/ │ └── metadata.tsv ├── json/ │ └── metadata.json ├── config_siren.yaml ├── generation_log_*.txt └── summary.txt To reconstruct the complete dataset structure, download all archive parts and extract them into the same destination directory. The internal paths stored in the ZIP files recreate the original folder hierarchy automatically. Metadata Each sample includes the following metadata fields:- `event_label`: siren class label- `filename`: audio file name- `onset`: event onset in seconds- `offset`: event offset in seconds- `snr_min`, `snr_max`, `snr_avg`, `snr_std`: SNR statistics in dB- `frame_size`: analysis frame size- `velocity_kmh`: source velocity in km/h- `closest_distance`: minimum source–receiver distance in meters- `siren_class`: siren-pattern class- `subset_index`: subset identifier- `iteration`: generation iteration- `bg_spl_target`: target background SPL in dB- `fg_spl_target`: target foreground SPL in dB Files follow the naming convention:`{siren_class}_{source_type}_{waveform}_{iteration}_{onset}_{offset}_i0.wav` Example:`hi-lo_electronic_sawtooth_00_3.164_34.409_i0.wav` Usage ModesThe dataset is integrated in the accompanying framework (The Emergency Vehicle Benchmark) through three operating modes. Train mode:Each 40-second positive signal can be partitioned into four non-overlapping 10-second chunks, consistent with common General-Purpose Audio Tagging input durations. Labels can be configured for binary EV classification or siren-type multi-class classification. Negative chunks are derived from the same urban-traffic backgrounds used for positive-scene synthesis, without siren injection, and can be further augmented online. Benchmark mode:Evaluation is performed on the full dataset through siren-type cross-validation folds, each built from the 8 800 samples of the target siren class together with a balanced pool of negative urban-traffic chunks. Detection mode:The full 40-second signals and strong temporal annotations are preserved, enabling window-based EV sound event detection analyses. In the supplied statistics, this mode uses 0.31-second windows and 130 windows per sample. Use Cases Binary Classification: emergency-vehicle siren presence/absence recognition. Multi-Class Benchmarking: controlled siren-type discrimination across warning-pattern classes. Detection-Oriented Evaluation: strong-label, window-based EV sound event detection experiments. Robustness Analysis: controlled evaluation across different SNR regimes, source families, waveforms, and trajectory conditions. Transferability Studies: complementary use alongside AudioSet-EV v1/v2 and other EV-related corpora to separate the impact of architecture, weak labels, and controlled acoustic variability. References S. Giacomelli, M. Giordano, C. Rinaldi and F. Graziosi, “From General-Purpose Audio Tagging to Real-Time Emergency Vehicle Siren Detection,” under review for IEEE/ACM Transactions on Audio, Speech, and Language Processing 2026 Related tools:- DASSS: Dry Anechoic Synthetic Siren Sounds generator (used to define controllable emergency siren sources).- kinescaper: Physics-Based Dynamic Soundscape Generator for Moving Sound Sources framework (used to place moving sirens into realistic urban acoustic scenes). CitationIf you use KineScaper-EV in your research, please cite: ```bibtex@dataset{giordano2026kinescaper_ev, author = {Giordano, Marco and Giacomelli, Stefano and Rinaldi, Claudia}, title = {KineScaper-EV: A Controlled Dataset for Emergency Vehicle Siren Recognition, Siren-Type Benchmarking, and Detection-Oriented Analysis}, year = {2026}, publisher = {Zenodo}, version = {v1.0}, doi = {10.5281/zenodo.19163980}, url = {https://zenodo.org/uploads/19163980}} ```