Anthropic says Mythos Preview tops SWE-bench with 93.9% vs Opus 4.6’s 80.8%; 77.8% on SWE-bench Pro versus 53.4%

Benchmarks
Anthropic says its unreleased Claude Mythos Preview scored 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and posted 77.8% on SWE-bench Pro versus 53.4% for Opus 4.6. SWE-bench measures software-engineering and cyber capabilities; the numbers, released by Anthropic, suggest a sizable lead for Mythos on those tests. It has been reported that these figures come from Anthropic’s announcement materials rather than independent third‑party verification.
Project Glasswing
Anthropic has paired Mythos Preview with Project Glasswing — a restricted initiative that allegedly connects the model to a coalition of major tech and finance firms to hunt and patch vulnerabilities before adversaries can exploit them. It has been reported that launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia and Palo Alto Networks, and that Anthropic is committing up to $100 million in usage credits and $4 million to open‑source security groups. The company says it will not make Mythos Preview generally available because of its cybersecurity capabilities; it has been reported that Anthropic claims the model autonomously found thousands of high‑severity zero‑day vulnerabilities.
Why it matters
This is a strange, high‑stakes balancing act. Keep a powerful tool locked down and give defenders a head start — or open it and risk the capabilities proliferating to hostile actors? Anthropic frames the choice starkly: “the fallout could be severe,” it has been reported. With the industry in an arms‑race moment over frontier models, Project Glasswing is as much a policy experiment as a technical one — and a test of whether a handful of companies can responsibly wield offensive‑grade capabilities on behalf of everyone else. Who watches the watchdogs, though? Good question.
Sources: venturebeat.com
Comments