White Hat Security Agent Prompts 600K Dataset by Yatin Taneja
Blog post from HuggingFace
The White-Hat-Security-Agent-Prompts-600K dataset, created by Yatin Taneja, is a comprehensive collection of 596,295 security prompts designed to simulate real-world scenarios faced by defensive security professionals. Unlike typical datasets that focus on technical vulnerabilities, this dataset offers rich, contextualized queries that reflect the operational challenges and decision-making processes of roles such as CISOs, threat hunters, and Trust & Safety leads during live threat engagements. The dataset spans a wide range of security domains and impact levels, from minor nuisances to existential risks, and covers conventional cybersecurity, AI safety, and emerging threats. With a combinatorial search space of over 76.8 million unique threat scenarios, it provides an extensive resource for fine-tuning AI models to better understand and respond to the complex and urgent nature of security threats. Released under the Creative Commons Attribution 4.0 International License, this dataset is intended to support the development of security-specialized AI tools and research in AI safety and alignment, offering a practitioner's perspective on real-time threat management.