Anthropic is launching a program to fund the development of new benchmarks to evaluate AI models’ performance and impact, including generative models like its own Claude.
Unveiled on Monday, Anthropicโs program will provide payments to third-party organizations that can “effectively measure advanced capabilities in AI models,” according to a company blog post. Applications will be accepted on a rolling basis.
โOur investment in these evaluations aims to elevate the entire field of AI safety, providing valuable tools for the whole ecosystem,โ Anthropic stated. โDeveloping high-quality, safety-relevant evaluations is challenging, and demand is outpacing supply.โ
AI currently has a benchmarking problem. The most commonly cited benchmarks fail to capture how the average person uses the systems being tested. Some benchmarks, especially those predating modern generative AI, may not measure what they claim to.
Anthropic proposes creating challenging benchmarks focusing on AI security and societal implications using new tools, infrastructure, and methods.
The company calls for tests assessing a modelโs ability to perform tasks like cyberattacks, enhancing weapons of mass destruction, and manipulating or deceiving people. For AI risks related to national security, Anthropic is committed to developing an โearly warning systemโ for identifying and assessing risks, though details are not provided in the blog post.
Anthropic also aims to support research into benchmarks and “end-to-end” tasks probing AIโs potential in scientific study, multilingual conversations, bias mitigation, and self-censoring toxicity.
To achieve this, Anthropic envisions new platforms for subject-matter experts to develop evaluations and large-scale model trials involving โthousandsโ of users. A full-time coordinator has been hired for the program, and the company may purchase or expand promising projects.
โWe offer a range of funding options tailored to each project’s needs and stage,โ Anthropic writes, without providing further details. โTeams will interact directly with Anthropicโs domain experts from various relevant teams.โ
Anthropicโs effort to support new AI benchmarks is commendable, assuming sufficient resources are allocated. However, given the companyโs commercial ambitions in the AI race, complete trust may be difficult.
Anthropic is transparent about wanting certain evaluations to align with its AI safety classifications, developed with input from third parties like the nonprofit AI research organization METR. This is within the companyโs prerogative but may require applicants to accept definitions of โsafeโ or โriskyโ AI they might not agree with.
Some in the AI community may also take issue with Anthropicโs references to โcatastrophicโ and โdeceptiveโ AI risks, like nuclear weapons risks. Many experts argue thereโs little evidence suggesting AI will gain world-ending, human-outsmarting capabilities soon, if ever. Claims of imminent โsuperintelligenceโ may distract from pressing AI regulatory issues like AIโs hallucinatory tendencies.
Anthropic hopes its program will be โa catalyst for progress towards a future where comprehensive AI evaluation is an industry-standard.โ While many open, corporate-unaffiliated efforts to create better AI benchmarks may identify with this mission, it remains to be seen if they will join forces with an AI vendor ultimately loyal to shareholders.