Search for a command to run...
Introduction Despite strong evidence that repetitive home-based rehabilitation improves functional recovery after stroke, current delivery models still show gaps in continuity of care and patient engagement. AI-driven Embodied Conversational Agents (ECAs) could provide personalized home support through natural-language guidance on prescribed exercises, reinforcement of neuroplasticity, clarification of therapeutic principles, and motivational support. However, clinical deployment remains challenging. Many robotic platforms lack real-time interaction capabilities such as speech processing, gesture execution, and attention tracking, while Large Language Models (LLMs) may produce factual errors or inconsistent responses. Early development is also constrained by limited access to real users due to practical and ethical considerations. Methods To address these challenges, we propose a Design-Based Research methodology for human–AI co-design and evaluation of ECAs (co-AI DBR), where generative AI facilitates iterative cycles of design, testing, and refinement. Co-AI DBR combines synthetic patient generation with real-code execution to simulate, emulate, and evaluate the ECA platform and its LLM-based conversational pipeline. To validate the method in a post-stroke rehabilitation context, a virtual ECA was first tested with synthetic patients to assess technical implementation and accuracy of LLM responses. A pilot deployment using the Furhat robot as an ECA was then conducted with patient relatives and rehabilitation professionals to evaluate the voice interface and augmented communication. Results LLM responses to questions from real participants showed higher lexical diversity (MTLD ≈ 134 vs. 93.9) and lower repetition (Yule’s K ≈ 66.8 vs. 115.4) than responses to synthetically generated questions. Responses remained factually consistent, with no contradictions and complete gender invariance, although slightly lower hapax rates were observed (88.8% vs. 99.4%). Usability scores were higher among relatives (M = 86.67) than professionals (M = 72.50), while Intrinsic Motivation Inventory scores indicated similarly high motivation in both groups (M = 6.32 vs. 6.12). Discussion The results suggest that co-AI DBR can support early design and evaluation of ECAs when direct patient testing is limited. By combining synthetic patient generation with real-code execution, generative AI supports iterative knowledge building during the prototyping and refinement of LLM-based ECAs. This methodology enables the practical development of ECA to support home-based post-stroke rehabilitation.