Search for a command to run...
Abstract In this work, we address the problem of segmenting railway tracks from video recordings captured from a train engine, focusing on the rails the train travels on. Our approach is motivated by the physical properties of railway rails, which typically exhibit a distinctive reflective appearance and remain visually and geometrically consistent across consecutive video frames. Moreover, the rails do not undergo significant shape changes or spatial displacement between frames. These properties enable the transfer of contextual information between frames and allow the use of composite models with a reduced number of trainable parameters. We introduce a composite model that incorporates contextual information from previous frames, reflecting the limited temporal variability of rail geometry and appearance, thereby reducing overfitting when training on small annotated datasets. We evaluate three neural network-based segmentation approaches: a convolutional neural network (CNN), a CNN combined with post-processing using morphological transformations and connected component labelling, and the proposed context-aware composite model. Furthermore, we propose an unsupervised evaluation methodology that assesses segmentation quality based on the distinguishability of segment colours from the background, geometric complexity, and colour variance within segments. All models are trained exclusively on synthetic data, with evaluation performed on real-world video sequences. Experimental results demonstrate that incorporating physically interpretable temporal context improves segmentation consistency compared to frame-wise methods. The synthetic dataset created for this work is publicly released.