Animate Agent World Modeling Benchmark

Published in Proceedings of the Annual Meeting of the Cognitive Science Society, 2024

Recommended citation: Cross, L., Xiang, V., Haber, N., \& Yamins, D. (2024). Animate Agent World Modeling Benchmark. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 46). https://escholarship.org/uc/item/7r41x81m

Abstract

To advance the capacity of intuitive psychology in machines, we introduce the Animate Agent World Modeling Benchmark. This benchmark features agents engaged in a diverse repertoire of behaviors, such as goal-directed interactions with objects and multi-agent interactions, all governed by realistic physics. Humans tend to predict the future based on expected events rather than simulating step-by-step. Thus, our benchmark includes a cognitively-inspired evaluation pipeline designed to assess whether the simulated trajectories of world models capture the correct sequences of events. To perform well, models need to leverage predictive cues from the observations to accurately simulate the goals of animate agents over long horizons. We demonstrate that current state-of-the-art models perform poorly in our evaluations. A hierarchical oracle model sets an upper bound for performance, suggesting that to excel, a model should scaffold their predictions with abstractions like goals that guide the simulation process towards relevant future events.

Access paper here