Navigating Threats: Detecting LLM Prompt Injections and Jailbreaks

Post Details

Company

WhyLabs

Date Published

Dec. 19, 2023

Author

Felipe Adachi

Word Count

1,978

Company Posts That Month

1

Language

English

Hacker News Points

-

Source URL

whylabs.ai/blog/posts/navigating-threats-detecting-llm-prompt-injections-and-jailbreaks

Summary

This blog post discusses the issue of malicious attacks on language models (LLMs) such as jailbreak attacks and prompt injections. It presents two methods of detecting these attacks using LangKit, an open-source package for feature extraction for LLM and NLP applications. The first method involves comparing incoming user prompts to a set of known jailbreak/prompt injection attacks, while the second method is based on the assumption that under a prompt injection attack, the original prompt will not be followed by the model. Both methods have limitations, but they can help mitigate the issues associated with malicious LLM attacks.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	44	1,884	250	103	-28%
Vector Search	5	906	144	68	-61%
AI Guardrails	4	44	24	15	-71%
Observability	3	1,101	190	79	-6%
RAG	2	690	102	38	-37%
Secrets Management	2	369	78	52	-42%