AI Model Report

Benchmarks · 1 piece on file

Benchmarks

Methodology, regression suites, leaderboard inflation, and the numbers behind every comparison the desk publishes.


Feature · MAY 5, 2026

Claude Opus 4.7 leads Vals AI's Finance Agent benchmark at 64.4%; tops GDPval-AA

Anthropic's finance-tuned model debuted at the lab's May 5 invite-only briefing in New York. The two benchmark headlines come with the usual caveats — and one new variable for the benchmarks desk to track.

By Linnea Halberg · Benchmarks desk

Read the full piece →