Search for a command to run...
Distracted driving is a major contributor to road fatalities, accounting for over 3,000 deaths annually in the United States alone. This paper presents a real-time distracted-driver detection system designed for deployment on low-power embedded edge devices, such as the NVIDIA Jetson Orin Nano or Raspberry Pi 5. This system processes a continuous in-vehicle video feed and classifies driver activity into 9 distraction-related categories, including texting, drinking, phone use, and interacting with passengers.To achieve low latency and robust generalization, the system employs a multi-region visual representation consisting of full-frame, face, and hand-region crops, each processed using a shared MobileNetV3-Small backbone. Region-specific feature embeddings are concatenated and classified using a lightweight linear head, while temporal smoothing via exponential moving average stabilizes predictions across frames.To prevent data leakage and ensure proper generalization, we introduce a face-based driver clustering method that enables driver-level cross-validation due to the absence of explicit driver identity labels. Using Agglomerative Clustering on 102-dimensional face features, we partition the State Farm dataset into 26 driver clusters and ensure zero driver overlap between training (20 drivers, 16,275 images), validation (3 drivers, 2,072 images), and test (3 drivers, 4,077 images) sets.Our multi-view architecture achieves 98.97% test accuracy with only 1.01% train-test gap, indicating proper generalization despite the result significantly exceeding published driver-based benchmarks of 82-85%. We hypothesize this is because of the similarity of the State Farm data in controlled conditions, the effectiveness of our multi-view architecture, potential data leakage in prior work using random splits, and high variance from limited test drivers. The trained model is exported to ONNX and optimized using TensorRT FP16 for efficient edge inference, achieving an estimated 13-20 frames per second on NVIDIA Jetson Orin Nano with a projected 50-75 milliseconds total latency. Experimental results show high accuracy, strong temporal stability, and real-time performance under strict computational constraints, validating the feasibility of privacy-preserving, on-device distracted driver monitoring. Due to the overfitting, we acknowledge significant limitations, including small test set size, face clustering uncertainty, staged dataset characteristics, and recommend K-fold cross-validation across all 26 drivers and cross-dataset validation on naturalistic driving datasets as critical future work.