SafeAgents
A unified framework for building and evaluating safe multi-agent systems SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI…
BusyBox
BusyBox is a physical 3D-printable device for benchmarking affordance generalization in robot foundation models. It features Please check out our website (opens in new tab) for more details. For fully building a instrumented BusyBox capable…
TestExplora
This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…
Systematic debugging for AI agents: Introducing the AgentRx framework
As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a…
Senior Applied Scientist
We seek a Senior Applied Scientist with expertise in Machine Learning, Generative AI, Agentic Modeling, Data Science, or related areas. The ideal candidate should be passionate about generative modeling, experimental prompt tuning, and large-scale modeling…
PlugMem: Transforming raw agent interactions into reusable knowledge
It seems counterintuitive: giving AI agents more memory can make them less effective. As interaction logs accumulate, they grow large, fill with irrelevant content, and become increasingly difficult to use. More memory means that agents must…