Introducing Acoustography: Writing Systems from Raw Audio

20260 citationsPreprintgreen Open Access

Authors

Abstract

Abstract This paper introduces acoustography, a novel method for generating acoustic writing systems for unwritten languages using unsupervised machine learning. By decoupling speech encoding from semantic understanding, the method derives a stable acoustic inventory (acoustemes) directly from raw audio using self-supervised speech representations (XLSR-53) and statistical clustering. The method was validated on Sateré-Mawé, an indigenous language of Brazil. A blind recognition study with native speakers confirmed that audio reconstructed from a 100-acousteme inventory remained intelligible, proving that the system successfully captures essential, linguistically relevant distinctions without the need for traditional orthography. Background & Motivation Of the 7,000 languages spoken worldwide, approximately half have no written form, creating a significant barrier to language technology access and Bible translation efforts. Traditional orthography development is a time-intensive process requiring years of analysis. Acoustography provides a technical workaround by creating a machine-readable "writing system" that operates in a purely oral modality. Methodology The paper details a four-stage process: Corpus Preparation: Utilizing 31.7 hours of raw Sateré-Mawé audio. Acoustic Tokenization: Extracting phonetic features using the XLSR-53 model. Quantization: Grouping sound patterns into 100 discrete acoustemes. Stability Freezing: Ensuring the "codebook" remains consistent and reproducible. Key Contributions The Two-Layer Architecture: Demonstrates how the technical encoding of a language can proceed at the speed of computation while human translators focus on the semantic layer (Meaning Maps). Validation: Proves through native-speaker testing that "choppy" concatenative synthesis from acoustemes preserves linguistic meaning. Workflow Integration: Provides a pathway for Oral Bible Translation (OBT) via the Tripod Method and Tripod Studio. Funding Acknowledgement This work was made possible through the support and funding provided by Every Tribe Every Nation (ETEN) via the OBT Affinity Table.

Topics & Keywords

Diverse Musicological Studies Phonetics and Phonology Research Language and cultural evolution

UN Sustainable Development Goals

Quality Education

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18527986