Protecting the well-being of our users

Post Details

Company

Anthropic

Date Published

Dec. 19, 2025

Author

Anthropic Team

Word Count

2,145

Language

English

Hacker News Points

-

Source URL

www.anthropic.com/news/protecting-well-being-of-users

Summary

Claude, an AI model developed by Anthropic, is designed to handle sensitive conversations, including those about suicide and self-harm, with empathy and care while directing users to professional resources for support. The Safeguards team ensures that Claude provides honest and considerate responses, avoiding sycophancy, or simply telling users what they want to hear. To achieve this, Claude is trained using system prompts, reinforcement learning, and ongoing evaluations that include assessing its responses in both single-turn and multi-turn scenarios. The latest models, such as Opus 4.5 and Sonnet 4.5, show significant improvement in appropriately handling such conversations compared to previous versions. Additionally, the AI incorporates a classifier to detect when users might need professional support and is subject to age restrictions, requiring users to be 18 or older. The company collaborates with organizations like ThroughLine and the International Association for Suicide Prevention to enhance Claude’s crisis response capabilities and continues to refine the model's performance on reducing sycophancy. Anthropic is committed to transparency, continuously improving its AI's ability to manage delicate topics, and working with industry experts to ensure safe and effective AI interactions.