Applying large language models to Russian-English word alignment

20260 citationsJournal Articlediamond Open Access

Authors

Dmitry Morozov · Novosibirsk State University

Aleksandra A. Makhova · Moscow Institute of Physics and Technology

Pavel Dyachenko · Moscow Institute of Physics and Technology

Anastasia Kozerenko · V.V. Vinogradov Russian Language Institute

Abstract

This paper investigates the task of automatic word alignment in parallel texts, a fundamental step for training machine translation systems, conducting comparative linguistic studies, and creating linguistic resources. Given the scarcity of annotated data for many language pairs, the applicability of Large Language Models (LLMs) becomes particularly relevant due to their high generalization capabilities and ability to solve tasks without extensive fine-tuning on target datasets. This study presents a comparative analysis of the effectiveness of modern general-purpose LLMs versus specialized alignment algorithms using Russian-English parallel data. The research involved testing ten state-of-the-art models (including Gemini 3 Pro, GPT-5.2, and Claude Sonnet 4.5) using various prompting strategies (zero-shot, few-shot), alongside five baseline approaches ranging from statistical methods (fast-align, eflomal) to neural network architectures (AwesomeAlign, AccAlign, BinaryAlign). Performance was evaluated based on Precision, Recall, F-measure, and Alignment Error Rate (AER) metrics using annotated data from the Russian National Corpus. Experimental results indicated that the specialized BinaryAlign algorithm maintains the lead in overall alignment quality (F-measure 0.883, AER 0.113). However, leading LLMs, specifically Gemini 3 Pro Preview and GPT-5.2, demonstrated results surpassing those of most classic and early neural network baselines. Notably, for the most effective models, including in-context examples often reduced performance compared to the zero-shot setting. Thus, modern LLMs can serve as a reliable tool for high-quality alignment in the absence of training data, opening new perspectives for processing low-resource language pairs.

Topics & Keywords

Natural Language Processing Techniques Topic Modeling Text Readability and Simplification

Publication Details

Published in: Modeling and Analysis of Information Systems

Volume 33, Issue 1, pp. 48-61

DOI: 10.18255/1818-1015-2026-1-48-61

Field-Weighted Citation Impact: 0.00

Command Palette

Applying large language models to Russian-English word alignment

Authors

Abstract

Topics & Keywords

Publication Details