Audit Research and Engineering

The science and practice of understanding and mitigating the risks of AI systems has not kept pace with the rate frontier progress. Evaluations are time consuming, expensive, non-comprehensive, error prone, and often substantially compromised by the systems they are designed to assess. New methods enabling a scaled, continuous practice of audit combined with assessments of the reliability of current and future methods are required to make frontier AI audit fulfill its place in the pantheon of critical social institutions.

Read our latest work

RESEARCH

Benchrisk

We produced a metaevaluation tool, methodology, and results to assess the risk associated with relying on benchmarks for real world decisions of consequence. We named this tool suite "BenchRisk."

Read more