Search for a command to run...
Voice-based commerce is emerging as a convenient modality for consumers to transact in domains like retail and automotive services. However, deploying intent recognition agents in noisy, bandwidth-limited Internet of Things (IoT) environments (e.g., vehicles, factories) poses significant challenges. This paper proposes an intent-aware AI agent framework optimized for on-device speech recognition and understanding in edge IoT settings. The system leverages robust acoustic modeling and multilingual spoken language understanding to reliably extract user intents under ambient noise and network constraints. We introduce an edge-compute architecture that processes voice commands locally for low latency and privacy, using model compression and quantization to fit resource-constrained devices. To enable monetization, the agent supports contextual product recommendations (voice-based “ads”), third-party API call integrations with licensing control, and a tiered service model where premium subscribers benefit from enhanced on-device inference capabilities. We evaluate the framework in a smart automotive environment. The on-device system achieves 94.3% intent accuracy in quiet conditions and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{9 0. 5 \%}$</tex> under noisy conditions with noise suppression (SNR <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\approx \mathbf{5 - 1 0 ~ d B}$</tex>), with 280 ms end-to-end latency and a quantized intent model of 0.30 MB (-75% vs. uncompressed). These results are comparable to cloud-based performance while reducing latency and preserving privacy.