Event
Computer Vision in the Wild Workshop at CVPR 2026
Full workshop title: The 5th Workshop on Computer Vision in the Wild (CVinW): Towards Unified Multimodal Agents for Reasoning in the Wild Host conference: The Conference on Computer Vision and Pattern Recognition (CVPR) (opens in…
Microsoft Research Blog
AsgardBench: A benchmark for visually grounded interactive planning
Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to…
Microsoft Research Blog
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation
Vision-language models (VLMs) use images and text to plan robot actions, but they still struggle to decide what actions to take and where to take them. Most systems split these decisions into two steps: a…
Publication