EN / ES / HU
startup

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

Source: venturebeat.com 1 min read

Share

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

You are reading a summary. The full content is hosted on venturebeat.com.

UC Berkeley RDI and 300+ experts launched Agents’ Last Exam, a benchmark of long-horizon professional workflows across 55 industries with mostly deterministic grading and anti-contamination controls. GPT-5.5 via Codex leads the leaderboard at a 24.0% pass rate, underscoring that top models still perform poorly, with many scoring 0.0% on the hardest tier.

Read the full article on the original website

External link to venturebeat.com

Related Articles