Explainable <scp>AI</scp> for Hate Speech Moderation: A Stakeholder‐Centered and Sociotechnical Review

20260 citationsJournal Articlehybrid Open Access

Authors

M. Atif Qureshi · Science Foundation Ireland

Wael Rashwan · Technological University Dublin

Abstract

ABSTRACT The growing use of automated hate speech detection systems has sparked significant concern across technical, ethical, and sociopolitical dimensions. While recent advances in natural language processing and machine learning have improved classification performance, they often do so at the expense of transparency, fairness, and user trust. Explainable artificial intelligence (XAI) offers promising tools to bridge this gap, but current methods remain fragmented in scope and limited in stakeholder alignment. This review synthesizes the state of the art in explainability for hate speech detection by integrating machine learning, human‐computer interaction, and critical social science perspectives. We survey major XAI approaches, including ante‐hoc, post hoc, local, global, counterfactual, and rationale‐based methods, and evaluate their applicability across different stages of the ML pipeline and their relevance to key stakeholder groups such as developers, content moderators, policymakers, and affected communities. To guide this synthesis, we propose a conceptual framework that maps the intersection of explanation strategies, pipeline stages, and stakeholder needs. Using this model, we identify persistent gaps in dataset transparency, cultural and linguistic robustness, explanation evaluation practices, and participatory design processes. We argue that achieving meaningful explainability in content moderation goes beyond technical optimization; it demands sociotechnical alignment, contextual sensitivity, and accountability mechanisms that reflect the lived realities of those impacted by algorithmic decisions. The article concludes with interdisciplinary recommendations focused on dataset development, hybrid evaluation benchmarks, inclusive design, and integration. These strategies aim to foster hate speech detection systems that are not only more explainable but also more just, inclusive, and socially grounded. This article is categorized under: Commercial, Legal, and Ethical Issues > Social Considerations Fundamental Concepts of Data and Knowledge > Explainable AI Technologies > Machine Learning

Topics & Keywords

Hate Speech and Cyberbullying Detection Explainable Artificial Intelligence (XAI)Adversarial Robustness in Machine Learning

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery

Volume 16, Issue 1

DOI: 10.1002/widm.70076

Field-Weighted Citation Impact: 0.00