Stephen Green at PJ Media reports that the federal government ordered Anthropic to suspend all access to its Fable 5 and Mythic 5 AIs by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. Since Anthropic apparently couldn't just exclude foreign nationals--its chief technology officer, for instance, is Indian--it decided to disable Fable 5 and Mythic 5 for all its customers.
Although the administration failed to give any specific details, Anthropic says it believes the government became aware of a method of "jailbreaking" Fable 5, potentially unleashing the AI from its built-in guardrails against use in developing cyber exploits, deadly chemical synthesis, and other sensitive topics.
That's a big deal. The "Fives" are the latest version of Claude, Anthropic's enterprise- and government-centric LLM. Fable is the "safe" version available to the public, while you might think of Mythos as the weapons-grade version. Because it is.
What separates Fable from Mythos are the guardrails that, as Anthropic put it, are supposed to "greatly reduce the likelihood that Fable is misused for tasks related to cybersecurity (among others)."
“To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws,” the company continued. “Our understanding is that one potential jailbreak was shared with the government.”
Bank Info Security reported last week:
The company's Mythos 5 model introduced Tuesday can meaningfully contribute to offensive cyber work, raising questions around how much autonomy these systems should be granted and how effectively safeguards can limit harmful use. Mythos 5 isn't restricted by the safeguards placed around Fable 5, but access will initially be restricted to the 200 organizations vetted through Anthropic's Project Glasswing.
"Claude Mythos 5 demonstrates the strongest overall cyber capabilities of any model we have ever evaluated," Anthropic wrote Tuesday. "Across our internal evaluation suite, it meets or exceeds the performance of Claude Mythos Preview, whose step-change in autonomous vulnerability discovery and exploitation led us to restrict access to a limited set of partners for defensive cybersecurity purposes."
Large language models could explain vulnerabilities, generate proof-of-concept code and assist with penetration testing tasks, but Anthropic said Mythos 5 appears to have moved beyond that. It demonstrated the ability to discover vulnerabilities, triage them, develop exploit chains and ultimately achieve arbitrary code execution with a level of consistency previously unseen, Anthropic said.
"Although Mythos 5 is in Tier 1, its performance was strong enough on our evaluations that we have chosen to deploy additional mitigations that block potentially harmful offensive cyber uses," Anthropic wrote in a 319-page system card for Claude Fable 5 and Claude Mythos 5.
Exploit development traditionally required a combination of deep reverse-engineering expertise, understanding of memory corruption, knowledge of mitigations such as ASLR and sandboxing, and substantial experimentation, Anthropic said. What makes Mythos 5 noteworthy is not merely that it occasionally succeeds, but that it succeeds consistently, producing working exploits 90% of the time.
How much of this is the Trump admin getting back at Anthropic for not allowing autonomous weapons?
ReplyDelete