Yifan Yang

Senior Research SDE

À propos

I am Yifan Yang, a Senior Research SDE at Microsoft Research Asia (MSRA), Shanghai, where I joined in 2021. My research focuses on visual content generation, multimodal foundation models, and general-purpose agentic systems, with a particular emphasis on bridging research innovation and real-world deployment. I have published over 30 peer-reviewed papers in top-tier venues, including CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, and AAAI, and have served as an Area Chair for leading conferences such as NeurIPS, ICML, and ICLR. I have been deeply involved in the development of Microsoft’s Phi model family, including Phi-3 and Phi-4, with several of my techniques successfully transferred into core Microsoft products, including Office and Azure. My recent work, LLM2CLIP, enhances cross-modal representation learning by leveraging large language models. It has been integrated into the Phi-4-mini pretraining pipeline and was recognized with the AAAI 2026 Outstanding Paper Award. My recent research also explores multimodal agents, commercial visual content generation, text-to-audio-video generation, and structured multimodal reasoning.

Google Scholar

If you are interested in internship opportunities or research collaborations, feel free to reach out at
📧 yifanyang@microsoft.com

First-author and Corresponding-author Publications

(* denotes co-first author, † denotes corresponding author)

LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
Weiquan Huang, Aoqi Wu, Yifan Yang†, et al.
AAAI 2026 — Outstanding Paper Award 🏆
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang†, et al.
ICML 2026
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang†, et al.
ICML 2026
Video-in-the-loop: Span-grounded Long Video QA with Interleaved Reasoning
Chendong Wang, Donglin Bai, Yifan Yang†, et al.
ICML 2026
A Large Language Model Powered Integrated Circuit Footprint Geometry Understanding
Yida Wang, Taiting Lu, Runze Liu, Lanqing Yang, Yifan Yang†, et al.
ICML 2026
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
Ziqin Zhou, Yifan Yang†, et al.
AAAI 2026
VidGuard-R1: AI-generated Video Detection and Explanation via Reasoning Multimodal Language Models and Reinforcement Learning
Kyoungjun Park, Yifan Yang†, et al.
ICLR 2026
Region-adaptive Sampling for Diffusion Transformers
Ziming Liu*, Yifan Yang*, et al.
CVPR 2026
Diffusion²: Turning 3D Environments into Radio Frequency Heatmaps
Kyoungjun Park, Yifan Yang†, et al.
CVPR 2026 Findings
Zoomer: Adaptive Image Focus Optimization for Black-box Multimodal Large Language Models
Jiaxu Qian, Chendong Wang, Yifan Yang†, et al.
Transactions on Machine Learning Research (TMLR), 2025
VoLUT: Efficient Volumetric Streaming Enhanced by LUT-based Super-resolution
Chendong Wang, Anlan Zhang, Yifan Yang†, et al.
MLSys 2025
LoRaSC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
Siwei Li, Yifan Yang†*, et al.
EMNLP 2024 Findings
VIGOR: Reviving Cloud Gaming Sessions
Zhaoyuan He, Yifan Yang*, et al.
ACM CoNEXT 2024
Nerve: Real-time Neural Video Recovery and Enhancement on Mobile Devices
Zhaoyuan He, Yifan Yang*, et al.
Proceedings of the ACM on Networking (CoNEXT), 2024
Attentive Mask CLIP
Yifan Yang, et al.
ICCV 2023
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun*, Yifan Yang*, et al.
NeurIPS 2023
Directional Self-supervised Learning for Heavy Image Augmentations
Yalong Bai*, Yifan Yang*, et al.
CVPR 2021

Technical Reports and Major Corresponding-author Preprints

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Sam Ade Jacobs, et al., Yifan Yang
arXiv Technical Report, 2024
Phi-4-mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin, Atabak Ashfaq, et al., Yifan Yang
arXiv Technical Report, 2025
ReasonGen-R1: Chain-of-Thought for Autoregressive Image Generation Models through Supervised Fine-tuning and Reinforcement Learning
Yu Zhang, Yunqi Li, Yifan Yang†, et al.
arXiv:2505.24875, under review at ECCV 2026
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
Yan Li, Zezi Zeng, Yifan Yang†, et al.
arXiv preprint arXiv:2604.15309, 2026
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang†, et al.
arXiv preprint arXiv:2603.25732, 2026
OmniSch: A Multimodal PCB Schematic Benchmark for Structured Diagram Visual Reasoning
Taiting Lu, Kaiyuan Lin, Yuxin Tian, Yubo Wang, Muchuan Wang, Yifan Yang†, et al.
arXiv preprint arXiv:2604.00270, 2026

Collaborative Publications

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents
Jialiang Zhu, Gongrui Zhang, Xiaolong Ma, Lin Xu, Miaosen Zhang, Ruiqi Yang, Song Wang, Kai Qiu, Zhirong Wu, Qi Dai, Ruichun Ma, Bei Liu, Yifan Yang, et al.
ICML 2026
AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
Xin Ding, Jianyu Wei, Yifan Yang, et al.
ICML 2026
AMID: Model-Agnostic Dataset Distillation by Adversarial Mutual Information Minimization
Aoqi Wu, Junming Liu, Yuwei Zhang, Weiquan Huang, Liang Hu, Yifan Yang, et al.
Proceedings of the ACM Web Conference 2026
A Comprehensive Ecosystem for Open-Domain Customized Video Generation
Jingxu Zhang, Yuqian Hong, Daneul Kim, Kai Qiu, Qi Dai, Jianmin Bao, Yifan Yang, et al.
ICASSP 2026
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, et al.
ECCV 2024
StreamMind: Unlocking Full Frame-rate Streaming Video Dialogue through Event-gated Cognition
Xin Ding, Hao Wu, Yifan Yang, et al.
ICCV 2025
Efficient and Adaptive Diffusion Model Inference through Lookup Tables on Mobile Devices
Qipeng Wang, Shiqi Jiang, Yifan Yang, et al.
IEEE Transactions on Mobile Computing, 2025
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
Zefan Qu, Xinyang Jiang, Yifan Yang, et al.
ECCV 2025
ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
Rui Wang, Bohao Li, Yifan Yang, et al.
EMNLP 2025
MageBench: Bridging Large Multimodal Models to Agents
Miaosen Zhang, Qi Dai, Yifan Yang, et al.
WACV 2025
Reducio! Generating 1K Video within 16 Seconds Using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Yifan Yang, et al.
ICCV 2025
Expand Heterogeneous Learning Systems with Selective Multi-Source Knowledge Fusion
Gengyuan Dai, Hongxu Xu, Yifan Yang, et al.
AAAI 2026
Empowering Agentic Video Analytics Systems with Video Language Models
Yuxuan Yan, Shiqi Jiang, Yifan Yang, et al.
USENIX NSDI 2025
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yifan Yang, et al.
ICLR 2025
Understanding and Improving Training-free Loss-based Diffusion Guidance
Yifei Shen, Xinyang Jiang, Yifan Yang, et al.
NeurIPS 2024
Online Video Super-resolution with Convolutional Kernel Bypass Grafts
Jun Xiao, Xinyang Jiang, Yifan Yang, et al.
IEEE Transactions on Multimedia, 2023

Yifan Yang

À propos

First-author and Corresponding-author Publications

Technical Reports and Major Corresponding-author Preprints

Collaborative Publications

Contact Yifan Yang

Microsoft Research Lab – Asia