Search for a command to run...
Abstract Distributed acoustic sensing (DAS) enables cost‐effective, dense detection of seismic activity. However, the vast amount of data produced by DAS systems presents a significant challenge for labeling and analysis. Traditional supervised machine learning approaches require extensive labeling, which is time‐consuming and prone to user bias. Our approach meets the challenge of reducing the workload required for annotation, regardless of the size of the data set and without any a priori of the content of the data set, while preserving rare seismic events. We propose a two‐step processing chain. The first step constructs a latent data representation from several hundred features. We compare two approaches: one using signal processing metrics commonly used in seismology (human‐engineered features) and the other using self‐supervised learning with common DAS data representations (image‐BYOL, as known as Bootstrap Your Own Latent). The second step applies unsupervised clustering to reduce the data set. We first apply K‐Means to obtain 5,000 clusters followed by hierarchical clustering, merging them into 500–700 clusters using an inconsistency criterion. This dual‐step approach capitalizes on the computational efficiency of K‐Means and the hierarchical granularity offered by agglomerative clustering. This method is applied to DAS data from two experiments in the Pyrenees with different configurations: a 6‐week measurement on an 800‐m cable at Viella and 19 ten‐minute measurements on a 91‐km cable. For Viella, we detected all events with . The image‐BYOL approach produces more false positives than human‐engineered features. The results highlight the potential of clustering for DAS analysis while emphasizing the need to reduce false positives especially for smaller seismic events.
Published in: Journal of Geophysical Research Machine Learning and Computation
Volume 2, Issue 4
DOI: 10.1029/2025jh001054