Home / Companies / WhyLabs / Blog / Post Details
Content Deep Dive

Navigating Threats: Detecting LLM Prompt Injections and Jailbreaks

Blog post from WhyLabs

Post Details
Company
Date Published
Author
Felipe Adachi
Word Count
1,978
Company Posts That Month
1
Language
English
Hacker News Points
-
Summary

This blog post discusses the issue of malicious attacks on language models (LLMs) such as jailbreak attacks and prompt injections. It presents two methods of detecting these attacks using LangKit, an open-source package for feature extraction for LLM and NLP applications. The first method involves comparing incoming user prompts to a set of known jailbreak/prompt injection attacks, while the second method is based on the assumption that under a prompt injection attack, the original prompt will not be followed by the model. Both methods have limitations, but they can help mitigate the issues associated with malicious LLM attacks.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 44 1,884 250 103 -28%
Vector Search 5 906 144 68 -61%
AI Guardrails 4 44 24 15 -71%
Observability 3 1,101 190 79 -6%
RAG 2 690 102 38 -37%
Secrets Management 2 369 78 52 -42%