Home / Companies / Anthropic / Blog / Post Details
Content Deep Dive

Expanding our model safety bug bounty program

Blog post from Anthropic

Post Details
Company
Date Published
Author
-
Word Count
681
Language
English
Hacker News Points
-
Summary

We're expanding our model safety bug bounty program to identify and mitigate universal jailbreak attacks, which are exploits that allow consistent bypassing of AI safety guardrails across a wide range of areas, including critical domains such as CBRN and cybersecurity. The new initiative will test our next-generation system for AI safety mitigations in a controlled environment before its public deployment, offering bounty rewards up to $15,000 for novel attacks. We're inviting interested researchers to apply to the program and work with us to strengthen AI safety in high-risk areas, aligning with commitments signed by other AI companies to develop responsible AI.