(Doing) Computational History: The Role of Data Work in Computational Approaches

20260 citationsJournal Articlediamond Open Access

Authors

Sarah Lang · Max Planck Institute for the History of Science

Abstract

Computational methods have become increasingly prominent within the historical sciences, generating significant enthusiasm among some scholars. Yet their practical demands, epistemic limits, and ethical implications are less often critically examined than praised. This article explores what it means to do computational history today, arguing that it is not primarily defined by algorithms but by datasets. It is methodologically specific, resource-intensive, selective in scope, labour-heavy, and dependent on pre-digitised sources, specialised infrastructure, and interdisciplinary collaboration. These dependencies limit the scope of research questions and can produce narrow outcomes despite substantial effort, lending some validity to the concern over whether the field yields sufficient historiographical return for the labour invested. Corpus construction and data work lie at the epistemic core of computational history. These often undervalued tasks are not merely technical precursors to analysis, but interpretive and epistemic acts. Data are shaped by digitisation politics, historical bias, and institutional power. They shape the questions asked, the answers produced, and the legitimacy of findings. Recognising and valuing data work is essential, both to embed critical perspectives into computational humanities and to counteract the privileging of certain forms of labour over others. Due to the association of quantification with rigour and scholarly prowess, algorithmic work receives more credit, creating a two-tier system in this division of labour in which those who develop algorithms are elevated above those who curate data, despite their symbiotic interdependence. Computational history, when done well, requires deep engagement with our sources, be they historical or data. For computational history to stabilise as a meaningful discipline, it must prioritise building better datasets over pursuing increasingly complex algorithms on an unstable basis of data.

Topics & Keywords

Digital Humanities and Scholarship Computational and Text Analysis Methods Race, Genetics, and Society

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: Histories

Volume 6, Issue 2, pp. 26-26

DOI: 10.3390/histories6020026

Field-Weighted Citation Impact: 0.00