One test, many tongues: Surveying language proficiency across the globe

20260 citationsJournal Articlehybrid Open Access

Authors

Pol van Rijn · Max Planck Institute for Empirical Aesthetics

Yue Sun · Goethe University Frankfurt

Harin Lee · Max Planck Institute for Empirical Aesthetics

Raja Marjieh · Princeton University

Ilia Sucholutsky · Princeton University

Francesca Lanzarini · Ernst Strüngmann Institute for Neuroscience

Elisabeth André

Abstract

Language influences our thinking and affects many aspects of cognition, from how we perceive the world to how we interact socially. Thus, objectively characterizing linguistic background is crucial for research in many areas, including second language acquisition, psycho-linguistics, and cognitive science. Traditional language proficiency tests, however, are manually composed by experts, limiting their scope for both lab and online settings. Here, we propose a pipeline that automatically derives a language proficiency test from a corpus of text and applies it to create new tests for 1,939 languages. Using this approach, we conducted a large-scale survey examining L1 and L2 proficiency across 34 countries, with participants tested on all 34 languages. Drawing from human ratings from 4,137 participants, our results validate that our test can effectively distinguish native speakers, second-language speakers, and nonspeakers within one minute, making it an effective tool for evaluating linguistic proficiency. We show that participants' linguistic and demographic backgrounds systematically influence both their language proficiency and their self-reported skills, and we map the prevalence of global languages, such as English and Spanish, among online participants. Moreover, we show that our vocabulary tests are strongly correlated with other linguistic competences-such as listening and writing-in a set of typologically varied languages, demonstrating our test is an efficient instrument to assess language proficiency. More broadly, our work offers a significant resource for investigating global variation in language skills and contributes to reducing the overreliance on the English language in the cognitive and social sciences.

Topics & Keywords

Neurobiology of Language and Bilingualism Second Language Acquisition and Learning EFL/ESL Teaching and Learning

UN Sustainable Development Goals

Quality Education

Publication Details

Published in: Proceedings of the National Academy of Sciences

Volume 123, Issue 13, pp. e2420179123-e2420179123

DOI: 10.1073/pnas.2420179123

Field-Weighted Citation Impact: 0.00