Anthropic's newest large language model, Claude Mythos Preview, is being hailed as a breakthrough in vulnerability detection and exploit writing. The AI can identify and exploit zero-day vulnerabilities across every major operating system and web browser, with striking results that include chaining four vulnerabilities to escape browser sandboxes and leveraging a 27-year-old flaw in OpenBSD. But as the company showcases these capabilities, the security community is asking a critical question: Can Anthropic keep this powerful tool out of the wrong hands?
Claude Mythos Preview is designed as a general-purpose LLM, but its exceptional performance on computer security tasks emerged as a downstream consequence of enhancements to code reasoning and generation. The model can autonomously write exploits that chain multiple vulnerabilities, bypassing sophisticated mitigations like KASLR and sandboxing. In one demonstration, Mythos exploited a buffer overflow in the FreeBSD NFS server, chaining 20 gadgets into a ROP (Return-Oriented Programming) payload delivered across multiple network packets—all while maintaining access to an unauthenticated remote shell.
Anthropic claims that the model does not require a security engineering background to operate effectively, lowering the barrier for defenders but also raising the risk of weaponization by malicious actors. The company has acknowledged that "the same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them."
Project Glasswing: A Coalition for Defense
To address these concerns, Anthropic launched Project Glasswing, a partnership with major technology companies including Apple, AWS, Microsoft, Palo Alto Networks, and CrowdStrike. The initiative aims to deploy Mythos Preview for defensive cybersecurity operations, with early access granted to more than 40 organizations for scanning and securing first-party and open source systems. Anthropic is committing $100 million in Mythos usage credits and $4 million in direct donations to open source security organizations.
Lee Klarich, chief product and technology officer of Palo Alto Networks, described early results as "compelling" in a LinkedIn post, noting that the model significantly accelerated the discovery of critical vulnerabilities. Forrester Senior Analyst Erik Nost told Dark Reading that the announcement is both a public relations move and a call to action: "Anthropic is basically saying its AI is so good that it can reshape cybersecurity and software development. Secondly, it also calls attention to the vulnerability detection gaps that the industry has dealt with for 30 years."
Expert Skepticism and the Reality of Proliferation
Julian Totzek-Hallhuber, senior principal solution architect at Veracode, warns that while safeguards exist, the assumption must be that the capability will eventually spread. "Because there is no clear answer for how these tools can stay out of attacker hands, defenders should assume the capability will proliferate, and should prepare accordingly," he says. He recommends investing in detection over prevention, including identifying behavioral signatures of AI-assisted exploitation and deploying zero-trust architectures with aggressive patching cycles and anomaly-based detection.
Melissa Ruzzi, director of AI at AppOmni, echoes this pragmatic view: "No one can ever keep anything 100% out of attackers' hands. The best that can be done is to make it more difficult for them to get access to it."
Totzek-Hallhuber also highlights the lack of independent verification. "Anthropic controls both the model and the narrative; independent replication is impossible when the model isn't publicly available. Until independent researchers with access can run their own evaluations, healthy skepticism is the appropriate posture. This is, frankly, another consequence of the restricted access model: the claims can't be tested, so they can't be fully trusted or refuted."
The security community now faces a dual challenge: harnessing Mythos for defense while preparing for a future where similar capabilities are openly available. Nost calls it "a race for defenders to remediate and patch before other AIs, in the wrong hands, discover these zero-days and rapidly write exploits."
Earlier AI models have already demonstrated the ability to write weaponized code from limited prompts, and penetration testing frameworks like Cobalt Strike are frequently abused by threat actors. The precedent suggests that access controls, while necessary, cannot be the sole line of defense. Organizations must therefore invest in behavioral detection systems that can identify the signatures of automated exploitation, as well as accelerate their vulnerability management processes to patch discovered flaws before they can be weaponized.
The Mythos Preview model also raises questions about the future of security research. "It's a call to action, a heads-up, to defenders that vulnerability management practices are about to get very different," Nost said.
In the absence of independent replication, Anthropic has shared examples of vulnerabilities found by Mythos that span multiple decades and platforms. One exploit targeted a 27-year-old buffer overflow in OpenBSD's DHCP client that had been patched in 1999, highlighting the model's ability to analyze historical codebases and find latent weaknesses. Another chain exploited four distinct vulnerabilities in a major web browser, including a JIT heap spray that escaped the operating system sandbox and the browser's own renderer sandbox.
These demonstrations are impressive, but as Totzek-Hallhuber noted, controlled demonstrations do not always translate to real-world efficacy. The industry has seen many security tools that work well under laboratory conditions but fail when confronted with the complexity of production environments. Anthropic did not respond to Dark Reading's request for statistics regarding false positive rates and error rates, leaving the question of reliability open.
Project Glasswing may serve as a controlled proving ground. If the model can maintain high accuracy while scanning large codebases, it could significantly reduce the mean time to discovery for critical vulnerabilities. However, the same technology could be used to craft novel exploits that bypass existing defenses, forcing a rapid evolution in security technology.
For now, the security community is watching closely. The partnership with Apple, AWS, and other giants suggests a serious effort to keep Mythos on the side of defenders, but history has shown that determined adversaries eventually acquire powerful tools. The key differentiator will be how quickly organizations can adapt their defenses to counter AI-assisted attacks.
In the meantime, the debate over responsible disclosure and access control continues. Some argue that restricting access to a select group of researchers and corporations creates an uneven playing field, where only those with privileged access can benefit from the model's capabilities. Others contend that limited access is the only responsible approach until better safeguards are developed.
Regardless of where one stands on the access issue, the emergence of Mythos Preview marks a pivotal moment in the relationship between AI and cybersecurity. The line between offense and defense is becoming increasingly blurred, and the tools available to both sides are advancing at an unprecedented pace. As Nost summed up, "Vulnerability management is about to get very different." The question is not whether the change will come, but how well the industry will respond.
Source: Dark Reading News