Indirect Prompt Injection

Types of Prompt Injection Attack

Directly, for example, via a message to a chat bot.
Indirectly, where an attacker delivers the prompt via an external source. For example, the prompt could be included in training data or output from an API call.

Attacks

Deleting user accounts by posting comments in the products
Exploiting Insecure output handling

Training Data Poisoning

Training data poisoning is a type of indirect prompt injection in which the data the model is trained on is compromised. This can cause the LLM to return intentionally wrong or otherwise misleading information.

This vulnerability can arise for several reasons, including:

The model has been trained on data that has not been obtained from trusted sources.
The scope of the dataset the model has been trained on is too broad.

REFERENCES

https://portswigger.net/web-security/llm-attacks
https://portswigger.net/web-security/llm-attacks#indirect-prompt-injection

PreviousExploiting LLM APIs, functions, and plugins NextHost Header Attacks

Last updated 5 months ago

Was this helpful?