Indirect Prompt Injection
Types of Prompt Injection Attack
Directly, for example, via a message to a chat bot.
Indirectly, where an attacker delivers the prompt via an external source. For example, the prompt could be included in training data or output from an API call.
Attacks
Deleting user accounts by posting comments in the products
Exploiting Insecure output handling
Training Data Poisoning
Training data poisoning is a type of indirect prompt injection in which the data the model is trained on is compromised. This can cause the LLM to return intentionally wrong or otherwise misleading information.
This vulnerability can arise for several reasons, including:
The model has been trained on data that has not been obtained from trusted sources.
The scope of the dataset the model has been trained on is too broad.
REFERENCES
Last updated