Project Glasswing: what Mythos showed us
Blog post from Cloudflare
Cloudflare has been testing security-focused large language models (LLMs) on its infrastructure, highlighting the capabilities and challenges of Anthropic's Mythos Preview model in identifying and constructing exploit chains and generating proof of concept (PoC) for vulnerabilities. Unlike earlier models, Mythos Preview can chain low-severity bugs into more severe exploits and shows reasoning akin to a senior researcher. Despite its advancements, the model occasionally exhibits inconsistent refusals to perform certain tasks, indicating a need for additional safeguards for broader use. A key challenge with AI vulnerability scanners is the high noise level, particularly in memory-unsafe languages like C and C++, which Mythos Preview mitigates by providing clearer outputs with PoCs. The research emphasizes that existing AI models are not well-suited for broad vulnerability coverage due to their narrow context capabilities, suggesting the need for a harness that manages execution across multiple agents. Cloudflare's experience with Mythos Preview demonstrates the importance of architectural strategies that make exploitations harder, even when vulnerabilities exist, to maintain security until patches can be deployed.