GPT-4V System Card Paper Exposes Hidden AI Safety Risks

Company

Galileo

Date Published

Sept. 6, 2025

Author

Conor Bronsdon

Word count

1526

Language

English

Hacker News points

None

URL

galileo.ai/blog/openai-gpt4v-system-card-ai-safety

Summary

The text outlines OpenAI's comprehensive safety framework for deploying multimodal AI systems, specifically focusing on its vision-language model, GPT-4V. It details the rigorous processes involved, including red-team drills, alpha testing, and layered mitigations designed to address new attack surfaces like visual jailbreaks, adversarial photos, person-identification, and geolocation threats. OpenAI's approach includes a high level of scrutiny with over 1,000 early testers and 50+ domain experts probing for weaknesses, resulting in a 97.2% refusal rate for illicit requests and 100% for ungrounded inferences. The model is utilized in real-world applications, such as the "Be My AI" feature in the Be My Eyes app, which serves blind and low-vision users, thereby integrating user feedback into ongoing improvements. The text emphasizes the necessity of evolving safety measures and transparency in AI deployment, urging teams to adopt a similar robust framework to address unique risks associated with vision-language integration.