Project
Torch
We aim to develop practical tools and techniques that can help cloud developers adequately debug, test, configure, and monitor their systems. The research spans all aspects of improving reliability and availability of large-scale cloud systems,…
Microsoft Research Blog
Hyperscale cloud reliability and the art of organic collaboration
What does it take to build one of the most reliable hyperscale clouds on the planet? It clearly requires astronomical investments and a vast organization that operates at global scale in near seamless coordination. Yet…
Publication
Rethinking Networking for “Five Computers”
Project
SeeDot: compiler for low-precision machine learning
The emergence of IoT and Machine Learning (ML) has seen an increase in systems that deploy sensors to collect data and analyze the data using ML algorithms in the cloud. However, running the ML classifiers directly…