I2SC Lecture Series (Recording): Orestis Papakyriakopoulos (Societal Computing, TU Munich) AI Safety Benchmarks Do Not Benchmark Safety
Date: May 9, 2025
Abstract:
As artificial intelligence systems become increasingly integrated into critical aspects of society, the need for robust safety evaluation is crucial. However, this talk contends that current AI safety benchmarks, while offering insights into specific model behaviors, fundamentally fall short of measuring genuine safety. We argue this from multiple perspectives: First, these benchmarks typically assess only a narrow, predefined set of known risks, providing limited coverage of the vast landscape of potential failures, including complex edge cases, subtle biases, and unknown unknowns. Second, we posit that safety is not merely the absence of quantifiable risk or a property inherent to an isolated component; rather, it is an emergent property of the entire socio-technical system operating within its real-world context. Finally, unlike mature engineering disciplines that incorporate robust methods for handling uncertainty and rely on continuous monitoring and adaptation, AI benchmarks offer static evaluations that cannot account for the dynamic nature of risk or the intersubjective and societal dimensions of what constitutes acceptable safety. By highlighting these limitations, this talk aims to demonstrate that a sole reliance on current benchmarks provides an incomplete and potentially misleading assessment of AI safety, underscoring the urgent need for more comprehensive, systemic, and context-aware evaluation paradigms.
You can watch the recording of the talk below.