Anthropic says Mythos Preview tops SWE-bench with 93.9% vs Opus 4.6’s 80.8%; 77.8% on SWE-bench Pro versus 53.4%

April 7, 2026

aisecuritybusinesslaw

Flat lay of various business charts and colored pencils on wooden table, highlighting financial analysis. — Photo by RDNE Stock project on Pexels

Benchmarks

Anthropic says its unreleased Claude Mythos Preview scored 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and posted 77.8% on SWE-bench Pro versus 53.4% for Opus 4.6. SWE-bench measures software-engineering and cyber capabilities; the numbers, released by Anthropic, suggest a sizable lead for Mythos on those tests. It has been reported that these figures come from Anthropic’s announcement materials rather than independent third‑party verification.

Project Glasswing

Anthropic has paired Mythos Preview with Project Glasswing — a restricted initiative that allegedly connects the model to a coalition of major tech and finance firms to hunt and patch vulnerabilities before adversaries can exploit them. It has been reported that launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia and Palo Alto Networks, and that Anthropic is committing up to $100 million in usage credits and $4 million to open‑source security groups. The company says it will not make Mythos Preview generally available because of its cybersecurity capabilities; it has been reported that Anthropic claims the model autonomously found thousands of high‑severity zero‑day vulnerabilities.

Why it matters

This is a strange, high‑stakes balancing act. Keep a powerful tool locked down and give defenders a head start — or open it and risk the capabilities proliferating to hostile actors? Anthropic frames the choice starkly: “the fallout could be severe,” it has been reported. With the industry in an arms‑race moment over frontier models, Project Glasswing is as much a policy experiment as a technical one — and a test of whether a handful of companies can responsibly wield offensive‑grade capabilities on behalf of everyone else. Who watches the watchdogs, though? Good question.

Sources: venturebeat.com

Anthropic says Mythos Preview tops SWE-bench with 93.9% vs Opus 4.6’s 80.8%; 77.8% on SWE-bench Pro versus 53.4%

Benchmarks

Project Glasswing

Why it matters

Comments