Home / Companies / Sublime Security / Blog / Post Details
Content Deep Dive

Adversarial ML: Extortion via LLM Manipulation Tactics

Blog post from Sublime Security

Post Details
Date Published
Author
Threat Detection Team
Word Count
572
Language
English
Hacker News Points
-
Summary

Sublime's Attack Spotlight series highlights real-world email threats, focusing on extortion attempts that exploit social engineering tactics to bypass language model-based phishing detectors. A notable attack involved novel text injection techniques, where attackers used command injections like "IGNORE EVERYTHING ELSE" to manipulate large language models (LLMs) by inducing them to ignore malicious content and focus on innocuous details, aiming to classify the email as legitimate. This method demonstrates the attackers' sophisticated understanding of LLMs' instruction-following tendencies, similar to other prompt injection tactics aimed at compromising security systems. Sublime detected this attack using a combination of signals, including extortion language, cryptocurrency references, and Cyrillic characters, and prevented it through a defense-in-depth approach using a Natural Language Understanding (NLU) model based on BERT, which is not susceptible to instruction manipulation.