Simulating Human Audiovisual Search Behavior
Researchers: Hyunsung Cho, Xuejing Luo, Byungjoo Lee, David Lindlbauer, Antti Oulasvirta
By Ashlyn Lacovara
Researchers within 黑料正能量, including XRTC lead faculty David Lindlbauer, Yonsei University and Aalto University, are exploring how people locate targets in complex environments with Sensonaut, a computational model of audiovisual search. This work will be presented at CHI 2026.
Finding something in space—whether it’s a car in a crowded parking lot or a speaker in a virtual meeting—requires balancing time, effort, and accuracy in a fast paced enviornment. In everyday situations, information is rarely clear: sounds can be masked by noise, objects can be partially hidden, and multiple targets may look or sound alike. Rather than waiting for clearer signals, people actively search—turning their heads, shifting their gaze, and moving through space to gather better information.
Sensonaut is built on this idea that search is an embodied process, shaped by both perception and physical action. Traditional models tend to separate these elements, focusing either on how we process sensory input or how we make decisions.
The model uses a framework known as resource-rational decision-making, where people aim to locate targets as efficiently as possible while balancing the cost of time and physical effort. It continuously integrates auditory and visual cues, maintains a belief about where a target might be, and selects actions that are expected to provide the most useful information.
Using reinforcement learning, Sensonaut develops human-like search strategies. It prioritizes low-effort actions, such as turning the head to localize sound, and escalates to larger movements like walking only when necessary. These behaviors emerge naturally from the model as it navigates uncertainty, distractions, and occlusions.
To evaluate the system, researchers conducted a virtual reality study where participants searched for sound-emitting targets under varying levels of difficulty, including changes in obstacles, distractors, and starting positions. Sensonaut closely matched real human behavior, accurately reproducing how search time and effort scale with complexity, as well as common patterns of error.
By capturing how people naturally coordinate perception and movement, Sensonaut offers a new way to design interactive systems. Applications range from navigation tools in complex environments to XR interfaces that guide users toward virtual or physical targets. Instead of relying on static cues, future systems can adapt to user behavior in real time—reducing search time, lowering effort, and minimizing cognitive load.
Sensonaut represents a shift toward more human-centered models of interaction. By unifying sensory input, physical movement, and decision-making, it provides a foundation for designing systems that better align with how people actually search in the world.
