HealthFormer: Dual-level time-aware Transformers for irregular electronic health record events

20260 citationsJournal Articlegreen Open Access

Authors

Péter Kőrösi-Szabó · Alfréd Rényi Institute of Mathematics

G. Kovács · Alfréd Rényi Institute of Mathematics

Adrián Csiszárik · Alfréd Rényi Institute of Mathematics

Botond Forrai · Alfréd Rényi Institute of Mathematics

Judit Laki · Semmelweis University

Miklós Szócska · Semmelweis University

Tamás Kováts

Abstract

Longitudinal electronic health records (EHRs) form irregular event sequences that mix multiple clinical coding systems and care settings. Learning transferable patient representations requires modeling both within-encounter code composition and long-range temporal dependencies. We aim to develop a pretraining framework that preserves event structure and explicitly uses elapsed time, while remaining straightforward to fine-tune for new supervised endpoints without task-specific feature engineering. We propose HealthFormer, a dual-level Transformer for event-centric EHR modeling. An Intra-Event Encoder aggregates heterogeneous domain tokens within each typed clinical event into an event embedding via code-specific embedding modules and attention pooling. Event embeddings are combined with a Date Encoder and a continuous-time attention bias based on attention with linear biases (ALiBI) inside an Inter-Event Encoder . We pretrain on Hungarian national administrative health records from a large-scale nationwide longitudinal cohort (spanning millions of individuals over a decade) using multi-task self-supervision with (i) per-domain masked token prediction (masked language modeling, MLM), (ii) event-type prediction under full-event masking (Event-level MLM), (iii) next-event type prediction, and (iv) time-to-next-event ( Δt ) regression. Pretraining induces hierarchy-consistent organization in learned diagnosis (ICD-10) embedding geometry conducive to analysis and interpretation. On incident cancer prediction, end-to-end fine-tuning achieves test AUCs of 0.81/0.75/0.73 for colorectal cancer (CRC) and 0.94/0.87/0.84 for prostate cancer across 30/60/90-day horizons on balanced cohorts, outperforming logistic-regression baselines, including time-decayed bag-of-codes. HealthFormer provides an event-centric, time-aware representation that transfers via standard fine-tuning without endpoint-specific designs. Using ICD-10 diagnoses and ATC codes can facilitate adoption beyond Hungary. Learned diagnosis embeddings align with the hierarchy, enabling clinical inspection. Broader benchmarking across endpoints remains needed.

Topics & Keywords

Machine Learning in Healthcare Topic Modeling Electronic Health Records Systems

Publication Details

Published in: medRxiv

DOI: 10.64898/2026.03.25.26349262

Field-Weighted Citation Impact: 0.00

Command Palette

HealthFormer: Dual-level time-aware Transformers for irregular electronic health record events

Authors

Abstract

Topics & Keywords

Publication Details