ICARUS: A Dataset for Iconographic Classification and Representation Understanding

20260 citationsDatasetgreen Open Access

Authors

Matthias Springstein · Technische Informationsbibliothek (TIB)

Stefanie Schneider · Philipps University of Marburg

Javad Rahnama · Reply (Italy)

Julian Stalter

Maximilian Kristen · Ludwig-Maximilians-Universität München

Eric Müller-Budack · Technische Informationsbibliothek (TIB)

Ralph Ewerth · Leibniz University Hannover

Abstract

This dataset contains metadata for 477,569 artworks, along with 1,328,417 Iconclass-based annotations. It was compiled by harvesting 19 publicly accessible collection databases from several countries. Sources include nine collections from Germany (Artemis, Bildindex der Kunst & Architektur, Corpus Vitrearum, Heartfield Online, Hessen Kassel Heritage, Incunabulum Catalogue of the Bavarian State Library, Museen Thüringen, Städel Museum, and Virtuelles Kupferstichkabinett), two from Austria (Austrian Gallery Belvedere and REALonline), one from Switzerland (Vitrosearch), one from Poland (PAUart – Polish Academy of Arts and Sciences), three from the Netherlands (Medieval Illuminated Manuscripts, Rijksmuseum, and RKD – Netherlands Institute for Art History), one from the United Kingdom (Broadside Ballads Online), and two from the United States (Emblematica Online and the National Gallery of Art). For each object, the dataset provides a unique identifier and, when available, the following metadata fields: title creator inception instance of genre iconclass collection The dataset is divided into training, validation, and test splits. The respective object identifiers are listed in train.txt, val.txt, and test.txt, with one identifier per line. Images are included only when they are in the public domain or otherwise freely accessible. File structure The images are stored in a ZIP file structured into directories named by the first two characters of each image's hash_id. Within these directories, subfolders named after the next two characters of the hash_id contain the image files, which are named using their full hash_id with a .jpg extension. The annotation data is provided in a JSONL file, where each line encodes metadata for a single image.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18980669

Command Palette

ICARUS: A Dataset for Iconographic Classification and Representation Understanding

Authors

Abstract

Topics & Keywords

Publication Details