Search for a command to run...
This dataset contains metadata for 477,569 artworks, along with 1,328,417 Iconclass-based annotations. It was compiled by harvesting 19 publicly accessible collection databases from several countries. Sources include nine collections from Germany (Artemis, Bildindex der Kunst & Architektur, Corpus Vitrearum, Heartfield Online, Hessen Kassel Heritage, Incunabulum Catalogue of the Bavarian State Library, Museen Thüringen, Städel Museum, and Virtuelles Kupferstichkabinett), two from Austria (Austrian Gallery Belvedere and REALonline), one from Switzerland (Vitrosearch), one from Poland (PAUart – Polish Academy of Arts and Sciences), three from the Netherlands (Medieval Illuminated Manuscripts, Rijksmuseum, and RKD – Netherlands Institute for Art History), one from the United Kingdom (Broadside Ballads Online), and two from the United States (Emblematica Online and the National Gallery of Art). For each object, the dataset provides a unique identifier and, when available, the following metadata fields: title creator inception instance of genre iconclass collection The dataset is divided into training, validation, and test splits. The respective object identifiers are listed in train.txt, val.txt, and test.txt, with one identifier per line. Images are included only when they are in the public domain or otherwise freely accessible. File structure The images are stored in a ZIP file structured into directories named by the first two characters of each image's hash_id. Within these directories, subfolders named after the next two characters of the hash_id contain the image files, which are named using their full hash_id with a .jpg extension. The annotation data is provided in a JSONL file, where each line encodes metadata for a single image.