Search for a command to run...
Teachers and students often employ the use of intrasentential code-switching while teaching Information Technology (IT) and other subjects in universities and colleges. Myanmar automatic speech recognition (ASR) systems still stumble on code-switching because switched utterances blur language borders, mix pronunciations, and broaden word lists-especially when English terms slip through Myanmar sounds. These mismatches push up word-error rates and cause many mixed-language phrases to be misunderstood. This paper presents the MEASR dataset, a spontaneous speech dataset featuring Myanmar-English intra-sentential code-switching collected from real online IT teaching sessions. It addresses the challenges code-switching poses to Myanmar ASR systems - such as language boundary confusion, pronunciation mixing, and vocabulary expansion-which conventional monolingual models struggle with. The MEASR dataset includes around 10 hours of speech recorded at 16 kHz, mono channel, and 16-bit resolution. The paper details the dataset's design and provides an analysis to support improved code-switching in automatic speech recognition (CS-ASR) through bilingual training, pronunciation adaptation, and language identification.