Arabic Dialect NLP: A Unified Taxonomic, Methodological, and Trend‑Driven Survey

20260 citationsJournal Articlehybrid Open Access

Authors

Abstract

Background: Natural language processing of Arabic dialects has faced significant challenges due to the language's complex nature, while Modern Standard Arabic enjoys widespread support in this field. The rapid spread of social media has also shifted research focus towards Arabic dialects, thus creating a critical need to criticize this massive body of research. Objective: This study aims to provide a survey and application-oriented review of the Arabic Dialectal NLP landscape. The primary goal is to map the relationship between foundational tasks, while benchmarking the resources and methodologies that have defined the field. Participants and Setting: The study analyzes a comprehensive dataset of 400 research articles published between 2020 and 2025. Methods: A survey was conducted, utilizing a multi-taxonomic clustering approach. Research papers were categorized into eight functional clusters. Trends were analyzed by year, geographic focus, and algorithm type (Traditional Machine Learning vs. Deep Learning vs. Transformers and LLMs). Results: The study analysis reveals that Sentiment Analysis category is the dominant application about 32% of the literature, followed by 21% for resource building group. Identification and Code-Switching is 10%. Research output peaked in 2022-2025, marking a definitive shift from traditional machines learning model to Transformer-based architectures like AraBERT and MARBERT. Regional coverage is broad, with a notable trend toward the identification and handling of code-switched text, which has emerged as the current state-of-the-art. Conclusions: The survey demonstrates that dialect identification is no longer a standalone goal but a prerequisite for sentiment and translation systems. The field has progressed notably in many areas, such as SA, but future work must prioritize under-resourced dialects, reproducible benchmarks, and cross-dialect transfer learning, and bond these specific dialectal models with the zero-shot capabilities of generative LLMs.

Topics & Keywords

Authorship Attribution and Profiling Linguistic Variation and Morphology Natural Language Processing Techniques

UN Sustainable Development Goals

Quality Education

Publication Details

Published in: Journal of information technology, cybersecurity, and artificial intelligence.

Volume 3, Issue 2, pp. 13-38

DOI: 10.70715/jitcai.2026.v3.i2.051

Field-Weighted Citation Impact: 0.00