Video
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
The video introduces MindJourney, a framework that enhances Vision-Language Models (VLMs), which excel at interpreting single images but struggle to infer the underlying three-dimensional world. By allowing the VLM to “imagine” moving through the scene…
Microsoft Research Blog
MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation
MindJourney can enable AI to navigate and interpret 3D environments from limited visual input, potentially improving performance in navigation, planning, and safety-critical tasks.
Project
MindJourney
MindJourney is a framework that equips AI agents with a “simulation loop” to explore hypothetical 3D viewpoints before answering spatial reasoning questions—tackling a core limitation of vision-language models (VLMs), which recognize objects well in 2D…