Search for a command to run...
Neurodegenerative diseases such as ALS, Parkinson’s, and Alzheimer’s progressively damage and limit motor function, severely limiting the patient’s communication ability. Traditional assistive technologies have serious limitations (e.g., expense, ability to accurately calibrate, and inability to change with the progression of the disease). This study proposes a clinical-grade adaptive multi-modal communication system, which combines real-time eye-tracking, face expression recognition, and AI-assisted language processing, aimed at restoring communication autonomy. The system exhibits a high-contrast interface, navigated via gaze where sustained hovering allows the patients to select buttons for representing patient needs (e.g., PAIN, FOOD, etc.). Face emotion analysis provides contextual representations, while a large language model (LLM) generates natural language sentences from the button selection (e.g., I need water). The synthesized speech announces patient needs in the form of an audio directed towards their nurses or caregivers. The system was deployed on a Raspberry Pi 5 computer, with custom software developed with a highly optimized multi-threaded architecture with low latency per interaction cycle, 93% eye-tracking accuracy, and $89.5 \%$ emotion identification accuracy. The system was validated with 12 participants, and maintained operability for 7 hours or more on a portable battery. These results demonstrate a low-cost and distributive communication platform that supports patient autonomy to minimize quality of life degradation and to provide utility in clinical contexts. Future work could focus on developing multilingual support and integration with hospital infrastructure. Conventional systems are generally too expensive, utilize single input modalities, such as only gaze or facial gestures, and can no longer adapt to further declines in motor skills as individuals become more helpless and isolated. The proposed solution is an adaptive, inexpensive, multimodal platform that incorporates eyetracking, facial emotion recognition, and LLM-assisted sentence generation all integrated in one user interface. Experimental validation showed the system executing gazetracking with a 93% accuracy rate, facial emotion recognition with an $89.5 \%$ accuracy rate, and more than 7 hours of portable (battery-operated) reliability. Usability testing with $\mathbf{1 2}$ participants highlighted that the system generally reduced delays between sibling communication, and the subjects’ level of satisfaction improved over existing communication aids. These results indicate the technical feasibility of a scalable assistive communication tool that variants of which could address the needs of individuals of all levels of ability, providing affordability along with clinical-grade performance.