Los Angles Wire

collapse
Home / Daily News Analysis / Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

May 23, 2026  Twila Rosenbaum  11 views
Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Anthropic has introduced Claude Mythos Preview, a general-purpose large language model that demonstrates exceptional capabilities in computer security tasks, including the autonomous discovery and exploitation of zero-day vulnerabilities. The model, unveiled on April 7, 2026, can identify and exploit flaws in every major operating system and web browser, from subtle race conditions to complex chains of vulnerabilities. For example, Mythos Preview successfully exploited a 27-year-old patched flaw in OpenBSD and chained four vulnerabilities to escape both renderer and OS sandboxes in a web browser exploit.

The development of these exploit capabilities emerged as a "downstream consequence" of improving the model's code understanding and reasoning, rather than an explicit goal. Anthropic notes that the same improvements that make the model effective at patching vulnerabilities also make it effective at exploiting them. This dual-use nature raises significant questions about how such a powerful tool can be kept out of the hands of threat actors.

In response, Anthropic launched Project Glasswing, a collaborative initiative with industry giants including Apple, AWS, Microsoft, Palo Alto Networks, and CrowdStrike. The project aims to leverage Mythos Preview for defensive purposes, providing limited access to over 40 organizations to scan and secure first-party and open-source systems. Anthropic is committing $100 million in Mythos Preview usage credits and $4 million in direct donations to open source security organizations. Lee Klarich of Palo Alto Networks described early results as "compelling."

Dual-Use Dilemma and Expert Perspectives

Security analysts recognize both the potential benefits and risks. Forrester senior analyst Erik Nost suggests that the announcement is partly a public relations move, showcasing the model's sophistication while highlighting the long-standing gaps in vulnerability management. "It's a call to action, a heads-up, to defenders that vulnerability management practices are about to get very different," Nost says. He acknowledges that controls exist but warns of a race for defenders to patch before malicious AIs discover and exploit the same zero-days.

Julian Totzek-Hallhuber, senior principal solution architect at Veracode, emphasizes that since no clear answer exists for keeping such tools out of attacker hands, organizations must assume the capability will proliferate. He recommends investing in detection over prevention, identifying behavioral signatures of AI-assisted exploitation, adopting zero-trust architectures, and accelerating patch cycles. Melissa Ruzzi of AppOmni underscores the inevitability: "No one can ever keep anything 100% out of attackers' hands. The best that can be done is to make it more difficult for them to get access to it."

Skepticism and the Need for Independent Validation

Despite the impressive claims, experts urge caution. Totzek-Hallhuber points out that Anthropic controls both the model and the narrative, making independent replication impossible when the model is not publicly available. "Until independent researchers with access can run their own evaluations, healthy skepticism is the appropriate posture. This is, frankly, another consequence of the restricted access model: the claims can't be tested, so they can't be fully trusted or refuted," he says. Anthropic did not respond to requests for statistics on false positives or error rates by press time.

The broader implications for cybersecurity are profound. AI-powered exploitation could dramatically accelerate the timeline from vulnerability discovery to weaponization, putting pressure on defenders to adopt more proactive and automated defenses. The introduction of Mythos Preview mirrors the way legitimate penetration testing tools like Cobalt Strike are often abused by attackers. The industry must now grapple with how to harness AI for defense while mitigating its potential for harm.

As the cybersecurity landscape evolves, the success of Project Glasswing and similar initiatives will depend on transparency, collaboration, and robust safeguards. The race between offensive and defensive AI capabilities is intensifying, and the outcome will shape the future of digital security. For now, the message is clear: defenders must prepare for a new era where AI-driven vulnerabilities and exploits become the norm.


Source: Dark Reading News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy