ArmEpiC – Armenian Epigraphic Corpus (ArtsakhEpiC Sub-Corpus, v1.0)

20260 citationsDatasetgreen Open Access

Authors

Hamest Tamrazyan · Dragomanov Ukrainian State University

Gayane Hovhannisyan · Yerevan State Linguistic University

Emanuela Boroş · École Polytechnique Fédérale de Lausanne

Abstract

ArmEpiC: Methodology and Data Description Abstract ArmEpiC (Armenian Epigraphic Corpus) is a digital scholarly dataset comprising diplomatically transcribed Armenian lapidary inscriptions encoded in TEI/EpiDoc (v9.7), together with a system of authority files designed to preserve epigraphic evidence while enabling analytical interoperability. The dataset is intended for reuse by epigraphers, historians, linguists, and digital heritage researchers requiring transparent, machine-readable epigraphic data. Scope of the Dataset The Zenodo deposit includes ten TEI/EpiDoc inscription files, authority files (ListPlace, ListMonument, ListSubMonument, ListMaterial, ListPreservation, ListScript, ListAbbreviationType, ListChronology, ListBibl), this methodology document, a README, and a licensing statement. Conceptual Separation of Evidence and Interpretation ArmEpiC enforces a strict separation between epigraphic evidence, editorial observation, and interpretive layers. The diplomatic transcription constitutes the primary evidentiary layer; all analytical and interpretive interventions are explicitly encoded and remain reversible. Diplomatic Transcription Policy Original orthography is preserved, lineation follows the stone, and no silent normalization is introduced. Editorial intervention is restricted to explicit expansion of abbreviations, explicit supply of omitted letters, and explicit marking of damage or loss. Graphic Phenomena and Linguistic Structure Ligatures are treated as graphic phenomena and do not determine linguistic segmentation. Ligatures across word boundaries are encoded graphically while preserving separate lexical units. Abbreviations and Omitted Letters A strict distinction is maintained between abbreviations (intentional and conventional) and omitted letters (context-driven loss). Ambiguous cases are flagged rather than silently resolved. Honorific and graphic abbreviations are distinguished analytically via a controlled vocabulary. Word Segmentation and Lemmatization Each lexical unit is encoded as an independent word. Lemmatization is an analytical layer supplied in normalized Classical Armenian and does not imply correction of the original spelling. Names, Prosopography, and Places Personal names are encoded structurally without imposing prosopographic identification. Place names are preserved as attested and linked to external authorities via ListPlace. Dating and Chronology Dates are recorded as transmitted in the inscription, with Gregorian equivalents supplied as scholarly interpretation. The evidentiary basis of each date is made explicit. Functional Classification Each inscription is assigned a single dominant functional category as a heuristic analytical label. Translation Strategy Translations into Modern Armenian and English are provided as interpretive aids, prioritizing semantic accuracy. They do not replace the original text. Authority Files Each authority entity is assigned a persistent URN that is immutable once published. Authorities are aligned conceptually with international vocabularies to support interoperability. XML Structure and Validation All XML files were validated using the official TEI/EpiDoc 9.7 Relax NG and Schematron schemas with standard XML validation tools prior to Zenodo deposition. All xml:id values conform to NCName constraints. Licensing and Versioning The dataset is released under a Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license. This Zenodo deposit represents a fixed release; future revisions will receive new DOIs. Conclusion ArmEpiC provides a transparent, reversible, and interoperable digital epigraphic dataset grounded in Armenian scholarly tradition and international standards, enabling analytical reuse across disciplines. The project has been funded by the National Association for Armenian Studies and Research (NAASR) and the Knights of Vartan Fund for Armenian Studies. *ArmEpiC (Armenian Epigraphic Corpus) is a scholarly research project initiated and curated under the chief editorship of Hamest Tamrazyan, with Gayane Hovhannisyan and Arsen Arutyunyan as editors. *ArmEpiC is an evolving corpus. Authority files, identifiers, and encoding practices may be refined between versions.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18198117

Command Palette

ArmEpiC – Armenian Epigraphic Corpus (ArtsakhEpiC Sub-Corpus, v1.0)

Authors

Abstract

Topics & Keywords

Publication Details