Audit Research and Engineering

The science and practice of understanding and mitigating the risks of AI systems has not kept pace with the rate of frontier AI capability progress. Evaluations are time-consuming, expensive, non-comprehensive, and prone to various sources of error and contamination. New methods enabling a scaled, continuous practice of audit – combined with assessments of the reliability of current and future methods – are required to make frontier AI auditing fulfill its place in the pantheon of critical social institutions.

We believe in grounding new methods via “pilot audits” conducted in collaboration with frontier AI companies. These pilots help to focus engineering developments on the challenges that most impede audit fieldbuilding. Such pilots by nature involve confidential details but we will share our lessons learned from them wherever possible and they deeply inform our research and engineering roadmap.

Read our latest work

RESEARCH

BenchRisk

We produced a metaevaluation tool, methodology, and results to assess the risk associated with relying on benchmarks for real world decisions of consequence. We named this tool suite "BenchRisk."