hispanicseminary/HSMS-tools: HSMS tools for processing the paleographical transcriptions

20260 citationsOthergreen Open Access

Authors

Francisco Gago Jover · College of the Holy Cross

Francisco Javier Pueyo Mena · Cervantes Institute

Abstract

Repository of tools used in the processing of the paleographical transcriptions analizador This folder contains the analizador. To install it, you need to download the analizador.zip file from the repository and then extract it into a working folder. The paleographical transcriptions to be processed must be in their own folder. Although they can be located anywhere, even in the cloud, we recommend the following structure for better organization: C:\HSMS-tools [general folder]C:\HSMS-tools\analizador [folder with the Analizador Corpus OSTA]C:\HSMS-tools\textos [folder with the paleographical transcriptions] The manual can be found here: https://hispanicseminary.org/manuales/analizador/ ################################################################################ Analizador Corpus OSTA The Analizador Corpus OSTA is a textual analysis tool developed by F. Javier Pueyo Mena and Francisco Gago Jover for processing the paleographic transcriptions used in the creation of the Old Spanish Textual Archive. It integrates the libraries of FreeLing (Carreras, Chao et al. 2004; Padró 2011, 2012) and combines them with a series of routines so that, from the plain text of the HSMS paleographic transcriptions, it is possible to obtain a text in XML format with all the linguistic information incorporated, while also maintaining all the structural characteristics of the work to facilitate its reading and its subsequent presentation in query results. Sánchez Marco (2010, 2011, 2012) carried out a first adaptation of FreeLing to medieval Spanish, using part of the semi-paleographic transcriptions of the HSMS to form a golden corpus, train the program, and create and modify the resources and rules in the areas mentioned above. For our part, we have significantly improved and expanded the linguistic resources of FreeLing, particularly the dictionary, which now only contains medieval forms (more than 275,000), and the affixation rules, whose clitic analysis section has been expanded. ################################################################################ El Analizador Corpus OSTA es una herramienta de análisis textual desarrollada por F. Javier Pueyo Mena y Francisco Gago Jover para el procesamiento de las transcripciones paleográficas utilizadas en la elaboración del Old Spanish Textual Archive. Integra las librerías de FreeLing (Carreras, Chao et al. 2004; Padró 2011, 2012) y las combina con una serie de rutinas para que, a partir del texto plano de las transcripciones paleográficas del HSMS, sea posible obtener un texto en formato XML con toda la información lingüística incorporada, manteniendo también todas las características estructurales de la obra para facilitar su lectura y su posterior presentación en los resultados de las consultas. Sánchez Marco (2010, 2011, 2012) realizó una primera adaptación de FreeLing al español medieval, utilizando parte de las transcripciones semi-paleográficas del HSMS para conformar su golden corpus, entrenar el programa, y crear y modificar los recursos y las reglas en los ámbitos arriba señalados. Por nuestra parte, hemos mejorado y ampliado considerablemente los recursos lingüísticos de FreeLing, en particular el diccionario, que ahora solo contiene formas medievales (más de 275.000), y las reglas de afijación, cuya sección de análisis de clíticos ha sido expandida. ########## LICENSE ########## Copyright (C) 2026 F Javier Pueyo Mena and Francisco Gago Jover This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18912253

Command Palette

hispanicseminary/HSMS-tools: HSMS tools for processing the paleographical transcriptions

Authors

Abstract

Topics & Keywords

Publication Details