Human-Centered Evaluation and Auditing of Language Models

Ziang Xiao; Wesley Deng; Michelle S. Lam; Motahhare Eslami; Juho Kim; Mark Lee; Q. Vera Liao

Human-Centered Evaluation and Auditing of Language Models

Ziang Xiao ,
Wesley Deng ,
Michelle S. Lam ,
Motahhare Eslami ,
Juho Kim ,
Mark Lee ,
Q. Vera Liao

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems | May 2024

Download BibTex

The recent advancements in Large Language Models (LLMs) have significantly impacted numerous, and will impact more, real-world applications. However, these models also pose significant risks to individuals and society. To mitigate these issues and guide future model development, responsible evaluation and auditing of LLMs are essential. This workshop aims to address the current “evaluation crisis” in LLM research and practice by bringing together HCI and AI researchers and practitioners to rethink LLM evaluation and auditing from a human-centered perspective. The workshop will explore topics around understanding stakeholders’ needs and goals with evaluation and auditing LLMs, establishing human-centered evaluation and auditing methods, developing tools and resources to support these methods, building community and fostering collaboration. By soliciting papers, organizing invited keynote and panel, and facilitating group discussions, this workshop aims to develop a future research agenda for addressing the challenges in LLM evaluation and auditing.