Company
Date Published
Author
-
Word count
395
Language
English
Hacker News points
None

Summary

We are launching a new bug bounty program to stress-test our latest safety measures, focusing on universal jailbreaks in Constitutional Classifiers system. This initiative tests an updated version of the system, which follows principles that define what type of content should and shouldn’t be allowed when interacting with Claude. The program is invite-only and offers rewards up to $25,000 for verified universal jailbreaks found on the unreleased system. Participants will receive early access to test our classifiers on Claude 3.7 Sonnet, and the program aims to contribute to the work done over several months to iterate and stress-test ASL-3 safeguards.