AI for Business Transformation: The Business of Data
The value of data has never been higher—it has become an essential driver of business decisions and it is critical for deploying AI successfully. AI research leaders Peter Lee and Vijay Mital explore how AI…
Eureka: Evaluating and understanding progress in AI
How can we rigorously evaluate and understand state-of-the-art progress in AI? Eureka is an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. Learn more about the extended findings.
Eureka ML Insights
This repository contains the code for the Eureka ML Insights, a framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. The framework is designed to help researchers and practitioners run reproducible evaluations…
Research Focus: Week of September 9, 2024
Investigating vulnerabilities in LLMs; A novel total-duration-aware (TDA) duration model for text-to-speech (TTS); Generative expert metric system through iterative prompt priming; Integrity protection in 5G fronthaul networks:
MedFuzz: Exploring the robustness of LLMs on medical challenge problems
Medfuzz tests LLMs by breaking benchmark assumptions, exposing vulnerabilities to bolster real-world accuracy.