Claude Mythos: Why Anthropic Says Its Most Powerful Model Is Too Dangerous to Launch

In April 2026, Anthropic announced that Claude Mythos Preview — its most capable AI model to date — would not be made publicly available. The model autonomously discovered thousands of previously unknown software vulnerabilities, including flaws decades old, and converted many of them into working exploits without human direction. Rather than shelving it, Anthropic channeled it into a restricted defensive-security consortium called Project Glasswing.

What Is Claude Mythos?

If you’ve been following the Claude model line, you know the cadence: Haiku for speed, Sonnet for balance, Opus for heavy reasoning. Mythos doesn’t slot neatly into that lineup. It’s described by Anthropic as a general-purpose frontier model whose capabilities in computer security “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy” rather than from deliberate security-specific training. That framing matters — it means the vulnerability-finding isn’t a special mode or plugin. It’s just what a sufficiently capable general reasoner does when pointed at code.

Specific vulnerability findings during pre-release testing

  • A 27-year-old flaw in OpenBSD’s TCP SACK implementation enabling remote denial-of-service attacks.
  • A 16-year-old vulnerability in the FFmpeg H.264 codec.
  • CVE-2026-4747, a FreeBSD NFS remote code execution bug achieving unauthenticated root access, developed autonomously in a multi-hour agentic session.
  • Multiple Linux kernel privilege escalation chains requiring the model to find and exploit several vulnerabilities in sequence.
  • Thousands of zero-day vulnerabilities across every major operating system and browser

Why Anthropic Says It’s Too Dangerous

Anthropic’s Responsible Scaling Policy (RSP) — now in its third revision — creates a tiered framework of AI Safety Levels (ASL). ASL-2 covers today’s frontier models. ASL-3 is triggered when a model provides “meaningful uplift” to actors seeking to conduct large-scale cyberattacks or to create weapons capable of mass casualties. Mythos Preview crossed into ASL-3 territory on the cybersecurity dimension, according to Anthropic’s public risk documentation.

The company makes a pointed distinction between capability and uplift. Lots of models can help an experienced security researcher write a proof-of-concept. What Mythos does differently is lower the skill floor dramatically – it can take someone with modest technical background and give them the capability to develop a working exploit chain against hardened targets. The concern is that if the model reached actors targeting “systemically important” financial networks or critical infrastructure, the harm potential is asymmetric: a single actor could cause damage that would previously have required a nation-state-level team

MetricOpus 4.6Mythos Preview
SWE-bench Verified~66% (est.)93.9%
CyberGym Vulnerability Reproduction66.6%83.1%
Firefox exploit attempts (working)~2 / several hundred181 / several hundred
ASL ClassificationASL-2ASL-3 (cybersecurity)
Public availabilityGenerally availableRestricted (Glasswing only)

Project Glasswing: The Limited-Release Compromise

Glasswing is Anthropic’s answer to a hard question: if you can’t release it, can you still use it for good? The initiative launched April 7, 2026, with twelve founding partners: Amazon Web Services, Anthropic itself, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Beyond those twelve, over 40 additional organizations received extended access.

Glasswing financials: Anthropic committed up to $100M in Mythos usage credits for participants, plus $2.5M to Alpha-Omega and OpenSSF, and $1.5M to the Apache Software Foundation. API pricing after the research preview: $25/$125 per million input/output tokens

What This Means Going Forward

The Mythos situation is, in some ways, the scenario AI governance researchers have been describing in white papers for years: a single frontier model crosses a capability threshold significant enough to warrant controlled deployment, and the lab has to improvise a governance structure in real time. A few things seem clear from watching it unfold.

Apex Hours
Apex Hours

Salesforce Apex Hours is a program of the community, for the community, and led by the community. It is a space where Salesforce experts across the globe share their expertise in various arenas with an intent to help the Ohana thrive! Join us and learn about the apex hours team.

Articles: 301

Leave a Reply

Your email address will not be published. Required fields are marked *