Search for a command to run...
The global deployment of artificial intelligence systems has repeatedly revealed a critical failure of governance: the datasets used to train these systems are often culturally myopic, encoding narrow worldviews that lead to harmful biases and performance failures when deployed in new contexts. While documentation frameworks like datasheets for datasets have established technical transparency as a standard, they remain insufficient for capturing the cultural, social, and political context that gives data its meaning. This gap between ethical aspiration and operational practice perpetuates a form of “context collapse” and risks scaling a predominantly Western-centric paradigm into a de facto global standard. This paper introduces the Culturally Contextual Datasheet (CCD), a novel framework that extends existing AI documentation to embed cultural reflexivity and accountability directly into the dataset lifecycle. Grounded in critical data studies, decolonial theory, and archival science, the CCD provides a structured, modular methodology comprising six core modules: provenance and collection context, annotator and curator positionality, representational fairness, intended and out-of-context use, linguistic specificity, and maintenance and community feedback. Through a detailed case study of the Jigsaw Toxicity dataset, we demonstrate how the CCD exposes hidden cultural assumptions—such as U.S.-centric definitions of “toxicity” and annotator biases—that are entirely absent from standard documentation. We argue that the CCD is more than a technical tool; it is an instrument of reflexive governance that operationalizes high-level principles from frameworks like the UNESCO AI Ethics Recommendations. By providing a practical mechanism for cultural auditing, the CCD empowers regulators, informs procurement, and fosters accountability to affected communities. The paper concludes by addressing implementation challenges—drawing on archival theory and global governance scholarship to strengthen the framework’s methodological foundations—and issues a call to action for the broader AI ecosystem to adopt such frameworks, thereby ensuring that the pursuit of global AI is grounded in cultural integrity and equity.