Search for a command to run...
Introduction. Visual object tracking is an important task in computer vision with a wide range of applications, including autonomous navigation, robotics and surveillance. The problem involves estimating an object's position across a sequence of video frames, given its initial location. Despite significant research efforts, the task remains challenging due to factors such as target occlusions, changes in illumination, motion blur, and object deformations. Tracking methods are categorized into short-term, where the target is assumed to remain consistently within the field of view, and long-term, that handle situations where the object may disappear and reappear. This paper provides an in-depth analysis of various single object tracking (SOT) methods, covering traditional approaches like correlation-based and keypoint-based trackers, as well as modern deep learning techniques. The purpose of the paper is to provide a comprehensive analysis of methods for visual single object tracking (SOT), considering both short-term and long-term tracking scenarios, and benchmark datasets that are used for algorithm evaluation. The paper aims to review the core principles of different tracking approaches, including correlation filters, keypoint-based methods, and various deep learning models, such as Siamese neural networks, transformers and others. Additionally, the study presents an overview of popular benchmark datasets like VOT 2018, LaSOT, and GOT-10k and compares the performance of most of the reviewed algorithms on these benchmarks. This comparison highlights the strengths and weaknesses of different tracking approaches and provides a basis for future research directions, particularly in enhancing the efficiency, adaptability and speed of tracking algorithms for real-world applications. Results. Correlation-based trackers are known for their high speed and reasonable performance. These methods leverage the Fourier domain for efficient calculations and can be enhanced with various features, from hand-crafted ones like HOG to deep convolutional features. However, they require modifications for long-term tracking to handle object disappearance and reduce error accumulation. While some of the reviewed methods account for these challenges, they do not solve them completely. Keypoint-based trackers track objects by identifying and matching interesting points or features across frames. Methods like Kanade-Lucas-Tomasi (KLT) provide a foundation, while SIFT or ORB detectors increase robustness to noise and scale changes. These trackers are particularly useful for scenarios with partial occlusions, as they can track a subset of the object's points. However, they may struggle with low-textured or small objects. Deep learning-based trackers represent a major advancement, surpassing traditional methods in accuracy and robustness due to their powerful feature representation capabilities. Some deep trackers, such as SiamFC and SiamRPN, show good accuracy and real-time performance on GPU. The paper's comparison of algorithms on benchmarks like VOT 2018, LaSOT, and GOT-10k demonstrates that deep learning-based approaches show superior performance in complex tracking scenarios, but often are computationally demanding. Conclusions. The analysis concludes that visual object tracking has evolved significantly with the appearance of deep learning methods, which has enabled trackers to achieve superior accuracy and robustness compared to traditional methods. The introduction of large-scale, annotated datasets like VOT, GOT, and LaSOT has been crucial in driving this progress and providing a standardized framework for evaluating new algorithms. While correlation filters and keypoint-based methods remain viable for certain applications, especially in resource-constrained environments, deep learning-based trackers, particularly Siamese networks and transformers, have emerged as the leading approaches. Future research should focus on optimizing the efficiency and adaptability of these algorithms to make them more suitable for real-time applications and diverse real-world scenarios. Keywords: visual object tracking, single object tracking, correlation filters, keypoint tracking, Siamese networks, transformers.