Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
SimulatorArena
This repository contains the code and data for SimulatorArena, a framework that enables: (1) benchmarking AI assistants through multi-turn conversations with user simulators, and (2) evaluating the reliability of user simulators as proxies for human…
MatterSim
MatterSim is a deep learning model for accurate and efficient materials simulation and property prediction over a broad range of elements, temperatures and pressures to enable in silico materials design.
BAPO – Bounded Attention Prefix Oracle
This repository contains all scripts for re-producing the results of our paper “Lost in Transmission: When and Why LLMs Fail to Reason Globally”.
ExACT
ExACT is an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies.
BioEmu-1
Biomolecular Emulator (BioEmu-1 for short) is a deep learning model that can generate thousands of protein structures per hour on a single graphics processing unit. It provides orders of magnitude greater computational efficiency compared to…
Microsoft Research Accurate Chemistry Collection (MSR-ACC)
The Skala functional will enable more accurate, scalable predictions in computational chemistry. It starts with the largest high-accuracy dataset ever built for training deep-learning-based density functional theory (DFT) models. This dataset underpins Skala—coming soon to…
Science Foundation Model
We develop the Science Foundation Model to empower natural scientists and accelerate breakthroughs in scientific discovery. As part of this effort, we introduce the sequence-based model, Nature Language Model (NatureLM). NatureLM is designed to span…
EfficientXLang
This codebase is the official implementation of “EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning.”
Phi-4
Phi-4-multimodal and Phi-4-mini, the newest models in Microsoft’s Phi family of small language models (SLMs) are now available. These models are designed to empower developers with advanced AI capabilities. Phi-4-multimodal, with its ability to process…