img

Authors

Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Xiaoan Liu, Sonia Castelo Quispe, Shaoyu Chen, Leslie Welch, Claudio Silva, Jing Qian

Abstract

Augmented Reality (AR) virtual assistants have gained popularity by helping users complete tasks such as assembly and cooking. Despite this, current AR assistants lag behind their counterparts on tablets and smartphones, primarily because they often respond passively to user requests and neglect rich contextual and user-specific information. To bridge this gap, we propose a novel AR assistant framework that models both user state and environmental information to deliver proactive guidance. This framework integrates a user modeling approach based on the well-established Belief-Desire-Intent (BDI) model and employs a state-of-the-art multi-modal large language model (LLM) to infer appropriate user guidance. Our design is informed by a two-round co-design process with six experts in human-computer interaction and psychology. We evaluated our model through a prototype-based study involving twelve participants. The results demonstrate that our approach not only aids users in completing both goal-oriented and routine tasks but also significantly enhances user satisfaction.