Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
The BenchRisk workflow allows for comparison between benchmarks; as an open-source tool, it also facilitates the identification and sharing of risks and their mitigations.
Measuring AI Safety Practice Progress
AVERI’s purpose (i.e., to make third-party auditing of frontier AI effective and universal) takes inspiration from, among others, the financial industry.
US Policy Landscape
Mapping the current state of AI evaluation policy in the United States and opportunities for improvement.
EU AI Act Implementation
Analysis of audit requirements under the EU AI Act and recommendations for effective third-party assessment.
Frontier AI Auditing
A comprehensive framework for independent evaluation of frontier AI systems, mapping access requirements to systemic risks.