AI Can See What You Can't See: How LLM-Agents Complement Human-Based Gender-Inclusive Usability Testing

20260 citationsJournal Article

Authors

Joy Geuenich · Chemnitz University of Technology

Stefan Brandenburg · Chemnitz University of Technology

Jonathan Dodge · Pennsylvania State University

Abstract

Inclusive usability testing, such as the GenderMag method, wants to identify gender-related usability problems in digital interfaces. Large Language Models (LLMs) have been used by usability engineers in usability evaluations but their contribution is still underexplored, especially regarding inclusive usability testing. Research has shown that GenderMag workshops can produce valuable insights but are resource intense and might show effects from the evaluators’ ability to embody personas with a different cognitive style. Therefore, we need to assess if LLM-agent based testing can aid human-based evaluations. This study evaluates an LLM-agent system for GenderMag persona-based usability testing, and compares its performance to traditional human-led evaluations. The agent system integrates GenderMag persona facets into three LLM-agents, which analyze usability issues of four web interfaces, three generic and one intentionally flawed interface with gender-related usability issues. We quantitatively and qualitatively compare the types, severity, and relevance of usability problems LLM-agents identified to those produced in three GenderMag workshops involving nine participants. Findings show a broad overlap in detecting usability issues of humans and LLM-agents that are not gender specific via the generic interfaces. Agents thereby consistently assign significantly higher severity and relevance ratings. For the intentionally flawed interface, humans and LLM-agents assign similar ratings but the overlap between human and agent gender-related usability issues was low, with each missing issues the other caught. The agent system is an efficient tool for gender-related usability evaluation that humans may overlook, thereby expanding the coverage of evaluation.

Topics & Keywords

Persona Design and Applications AI in Service Interactions Technology Use by Older Adults

UN Sustainable Development Goals

Gender equality

Publication Details

Published in: ACM Transactions on Interactive Intelligent Systems

DOI: 10.1145/3801978

Field-Weighted Citation Impact: 0.00