AI Model Report
Linnea Halberg

Staff · 1 piece on file

Linnea Halberg

Benchmarks desk

Linnea runs the benchmarks desk. She maintains the desk’s private regression suites for reasoning, math, and tool use, and writes the methodology notes that accompany every numbered comparison the site publishes. She is the desk’s voice on leaderboard inflation and contamination risk.

Beats: benchmarks, coding-evals


All pieces by Linnea

  • Benchmarks · MAY 5, 2026

    Claude Opus 4.7 leads Vals AI's Finance Agent benchmark at 64.4%; tops GDPval-AA

    Anthropic's finance-tuned model debuted at the lab's May 5 invite-only briefing in New York. The two benchmark headlines come with the usual caveats — and one new variable for the benchmarks desk to track.

← Back to our writers