Portrait de Yifan Yang

Yifan Yang

Senior Research SDE

À propos

I am Yifan Yang, a Senior Research SDE at Microsoft Research Asia (MSRA), Shanghai, where I joined in 2021. My research focuses on visual content generation, multimodal foundation models, and general-purpose agentic systems, with a particular emphasis on bridging research innovation and real-world deployment. I have published over 30 peer-reviewed papers in top-tier venues, including CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, and AAAI, and have served as an Area Chair for leading conferences such as NeurIPS, ICML, and ICLR. I have been deeply involved in the development of Microsoft’s Phi model family, including Phi-3 and Phi-4, with several of my techniques successfully transferred into core Microsoft products, including Office and Azure. My recent work, LLM2CLIP, enhances cross-modal representation learning by leveraging large language models. It has been integrated into the Phi-4-mini pretraining pipeline and was recognized with the AAAI 2026 Outstanding Paper Award. My recent research also explores multimodal agents, commercial visual content generation, text-to-audio-video generation, and structured multimodal reasoning.


Google Scholar


If you are interested in internship opportunities or research collaborations, feel free to reach out at
📧 yifanyang@microsoft.com


First-author and Corresponding-author Publications

(* denotes co-first author, † denotes corresponding author)

  • LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
    Weiquan Huang, Aoqi Wu, Yifan Yang†, et al.
    AAAI 2026Outstanding Paper Award 🏆
  • World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
    Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang†, et al.
    ICML 2026
  • AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
    Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang†, et al.
    ICML 2026
  • Video-in-the-loop: Span-grounded Long Video QA with Interleaved Reasoning
    Chendong Wang, Donglin Bai, Yifan Yang†, et al.
    ICML 2026
  • A Large Language Model Powered Integrated Circuit Footprint Geometry Understanding
    Yida Wang, Taiting Lu, Runze Liu, Lanqing Yang, Yifan Yang†, et al.
    ICML 2026
  • HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
    Ziqin Zhou, Yifan Yang†, et al.
    AAAI 2026
  • VidGuard-R1: AI-generated Video Detection and Explanation via Reasoning Multimodal Language Models and Reinforcement Learning
    Kyoungjun Park, Yifan Yang†, et al.
    ICLR 2026
  • Region-adaptive Sampling for Diffusion Transformers
    Ziming Liu*, Yifan Yang*, et al.
    CVPR 2026
  • Diffusion²: Turning 3D Environments into Radio Frequency Heatmaps
    Kyoungjun Park, Yifan Yang†, et al.
    CVPR 2026 Findings
  • Zoomer: Adaptive Image Focus Optimization for Black-box Multimodal Large Language Models
    Jiaxu Qian, Chendong Wang, Yifan Yang†, et al.
    Transactions on Machine Learning Research (TMLR), 2025
  • VoLUT: Efficient Volumetric Streaming Enhanced by LUT-based Super-resolution
    Chendong Wang, Anlan Zhang, Yifan Yang†, et al.
    MLSys 2025
  • LoRaSC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
    Siwei Li, Yifan Yang†*, et al.
    EMNLP 2024 Findings
  • VIGOR: Reviving Cloud Gaming Sessions
    Zhaoyuan He, Yifan Yang*, et al.
    ACM CoNEXT 2024
  • Nerve: Real-time Neural Video Recovery and Enhancement on Mobile Devices
    Zhaoyuan He, Yifan Yang*, et al.
    Proceedings of the ACM on Networking (CoNEXT), 2024
  • Attentive Mask CLIP
    Yifan Yang, et al.
    ICCV 2023
  • ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
    Yasheng Sun*, Yifan Yang*, et al.
    NeurIPS 2023
  • Directional Self-supervised Learning for Heavy Image Augmentations
    Yalong Bai*, Yifan Yang*, et al.
    CVPR 2021

Technical Reports and Major Corresponding-author Preprints

  • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
    Marah Abdin, Sam Ade Jacobs, et al., Yifan Yang
    arXiv Technical Report, 2024
  • Phi-4-mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
    Abdelrahman Abouelenin, Atabak Ashfaq, et al., Yifan Yang
    arXiv Technical Report, 2025
  • ReasonGen-R1: Chain-of-Thought for Autoregressive Image Generation Models through Supervised Fine-tuning and Reinforcement Learning
    Yu Zhang, Yunqi Li, Yifan Yang†, et al.
    arXiv:2505.24875, under review at ECCV 2026
  • MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
    Yan Li, Zezi Zeng, Yifan Yang†, et al.
    arXiv preprint arXiv:2604.15309, 2026
  • BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
    Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang†, et al.
    arXiv preprint arXiv:2603.25732, 2026
  • OmniSch: A Multimodal PCB Schematic Benchmark for Structured Diagram Visual Reasoning
    Taiting Lu, Kaiyuan Lin, Yuxin Tian, Yubo Wang, Muchuan Wang, Yifan Yang†, et al.
    arXiv preprint arXiv:2604.00270, 2026

Collaborative Publications

  • RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents
    Jialiang Zhu, Gongrui Zhang, Xiaolong Ma, Lin Xu, Miaosen Zhang, Ruiqi Yang, Song Wang, Kai Qiu, Zhirong Wu, Qi Dai, Ruichun Ma, Bei Liu, Yifan Yang, et al.
    ICML 2026
  • AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
    Xin Ding, Jianyu Wei, Yifan Yang, et al.
    ICML 2026
  • AMID: Model-Agnostic Dataset Distillation by Adversarial Mutual Information Minimization
    Aoqi Wu, Junming Liu, Yuwei Zhang, Weiquan Huang, Liang Hu, Yifan Yang, et al.
    Proceedings of the ACM Web Conference 2026
  • A Comprehensive Ecosystem for Open-Domain Customized Video Generation
    Jingxu Zhang, Yuqian Hong, Daneul Kim, Kai Qiu, Qi Dai, Jianmin Bao, Yifan Yang, et al.
    ICASSP 2026
  • Unified Medical Image Pre-training in Language-Guided Common Semantic Space
    Xiaoxuan He, Yifan Yang, et al.
    ECCV 2024
  • StreamMind: Unlocking Full Frame-rate Streaming Video Dialogue through Event-gated Cognition
    Xin Ding, Hao Wu, Yifan Yang, et al.
    ICCV 2025
  • Efficient and Adaptive Diffusion Model Inference through Lookup Tables on Mobile Devices
    Qipeng Wang, Shiqi Jiang, Yifan Yang, et al.
    IEEE Transactions on Mobile Computing, 2025
  • Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
    Zefan Qu, Xinyang Jiang, Yifan Yang, et al.
    ECCV 2025
  • ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
    Rui Wang, Bohao Li, Yifan Yang, et al.
    EMNLP 2025
  • MageBench: Bridging Large Multimodal Models to Agents
    Miaosen Zhang, Qi Dai, Yifan Yang, et al.
    WACV 2025
  • Reducio! Generating 1K Video within 16 Seconds Using Extremely Compressed Motion Latents
    Rui Tian, Qi Dai, Yifan Yang, et al.
    ICCV 2025
  • Expand Heterogeneous Learning Systems with Selective Multi-Source Knowledge Fusion
    Gengyuan Dai, Hongxu Xu, Yifan Yang, et al.
    AAAI 2026
  • Empowering Agentic Video Analytics Systems with Video Language Models
    Yuxuan Yan, Shiqi Jiang, Yifan Yang, et al.
    USENIX NSDI 2025
  • DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
    Brian Nlong Zhao, Yifan Yang, et al.
    ICLR 2025
  • Understanding and Improving Training-free Loss-based Diffusion Guidance
    Yifei Shen, Xinyang Jiang, Yifan Yang, et al.
    NeurIPS 2024
  • Online Video Super-resolution with Convolutional Kernel Bypass Grafts
    Jun Xiao, Xinyang Jiang, Yifan Yang, et al.
    IEEE Transactions on Multimedia, 2023