Offloading Cloud Network Services at Production Scale with SONiC DASH SmartSwitch
- Shaofeng Wu ,
- Zhixiong Niu ,
- Riff Jiang ,
- Lawrence Lee ,
- Junhua Zhai ,
- Ze Gan ,
- Vasundhara Volam ,
- Prabhat Aravind ,
- Prince Sunny ,
- Prince George ,
- Qi Luo ,
- Evan Langlais ,
- Soumya Tiwari ,
- Venkat Satish Katta ,
- Weixi Chen ,
- Rishiraj Hazarika ,
- Sachin Jain ,
- Deven Jagasia ,
- Michal Zygmunt ,
- Avijit Gupta ,
- Neeraj Motwani ,
- Pranjal Shrivastava ,
- Qiang Su ,
- Anil Reddy Pannala ,
- Kristina Moore ,
- James Grantham ,
- Anupam Pandey ,
- Xin Liu ,
- Guohan Lu ,
- Gerald De Grace ,
- Rishabh Tewari ,
- Lihua Yuan ,
- Erica Lan ,
- Deepak Bansal ,
- Dave Maltz ,
- Yongqiang Xiong ,
- Hong Xu
To support stateful cloud network services, Microsoft Azure has operated several generations of offloading solutions over the past decade. While these systems improved performance, operating them at hyperscale revealed three persistent lessons: (i) overly flexible programming models hinder hardware acceleration, (ii) appliance-style DPU pools inflate physical footprint and complicate deployment, and (iii) vendor-specific SDKs slow down service iteration. We present SONiC DASH SmartSwitch that addresses these lessons with three key designs: (1) the DASH pipeline as an immutable and hardware-friendly programming model; (2) the uni-box SmartSwitch that converges NPU and DPU resources within a single T1 switch; and (3) a communitydriven development model with P4 behavior specifications. SONiC DASH SmartSwitch has been deployed in Microsoft Azure at scale. It achieves 1.53Tbps throughput, 19.2M CPS, and 256M concurrent connections for network services, while improving power efficiency by ∼1.8× and space efficiency by ∼2.7× compared to the previous generation.