Search for a command to run...
Unmanned aerial vehicles (UAVs), equipped with advanced sensor technology and cognitive abilities are effective for object detection and tracking in diverse operations such as search and rescue, surveillance, urban planning, agriculture, and traffic monitoring. Despite their ability to capture a wide range of objects, UAV-captured images present challenges in terms of resolution, object scale variance, highly complex distribution, occlusion, density, and noise. In this work, we propose a novel deep learning-based one-stage detector for performing robust and fast multi-vehicle detection from aerial images in a noisy scenario. We generated synthetic multinoise datasets through integrating three types of noise (gaussian noise, salt and pepper noise, and speckle noise) to existing datasets and real-time data. The proposed framework incorporates two main components, the first of which comprises a noise-resilient feature extraction module named MLAM that extracts features from aerial images using a multilevel attention mechanism (MLAM) to deal with various noises. The second is considered an anchor-free detection technique since it aggregates semantic features for classification and recognition without accumulating region proposals for objects in the image. In this detection pipeline, we will only use UAVs for image capture and conduct computationally intensive tasks on a GPU-based cloud server. We performed comprehensive experiments on two of the most prominent aerial image benchmark datasets: DOTAVehicle (single-modality) and VEDAI (multi-modality), as well as a real-time dataset collected by an unmanned aerial vehicle using a single camera. Our results demonstrate that MVDNet achieves a mean average precision (mAP) of 81.4% and 58.2% on the respective datasets in noisy environments, with an inference speed of 14.4 milliseconds (ms).
Published in: IEEE Transactions on Vehicular Technology
Volume 74, Issue 11, pp. 16850-16863