Building Datasets to Enable Safer AI Responses

Post Details

Company

Gretel.ai

Date Published

Dec. 13, 2024

Author

Lipika Ramaswamy, Maarten Van Segbroeck, Dhruv Nathawani

Word Count

1,792

Company Posts That Month

3

Language

English

Hacker News Points

1

Source URL

gretel.ai/blog/gretel-open-synthetic-safety-dataset

Summary

The Gretel's Synthetic Safety Dataset is a resource designed to align large language models (LLMs) with safe and ethical responses. The dataset features 8,361 triplets of "prompt", "response" and "safe response" spanning significant risk categories, including discrimination, harassment, propaganda, religious intolerance, gender bias, and more. It was created using Gretel Navigator's Data Designer toolkit and is available on HuggingFace. The dataset aims to provide a transparent and modular resource for the AI community to utilize in aligning models for secure and public-interest-focused interactions. It also highlights the importance of prompt generation benefits from human expertise in jailbreaking (attempts to bypass model restrictions) and red teaming (simulated attacks to test system security). The dataset can be used for pre-training and fine-tuning guardrails, stress-testing model robustness, facilitating rapid iteration and refinement, and benchmarking ethical and safety maturity.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
AI Guardrails	5	186	50	28	+2%
LLM	5	2,668	436	137	-7%
Reinforcement learning	4	43	28	16	+30%
AI Model Fine-tuning	3	476	103	54	-13%
Real-time	1	3,091	773	211	-1%