Indirect Prompt Injection

Types of Prompt Injection Attack

  • Directly, for example, via a message to a chat bot.

  • Indirectly, where an attacker delivers the prompt via an external source. For example, the prompt could be included in training data or output from an API call.

Attacks

  • Deleting user accounts by posting comments in the products

  • Exploiting Insecure output handling

Training Data Poisoning

Training data poisoning is a type of indirect prompt injection in which the data the model is trained on is compromised. This can cause the LLM to return intentionally wrong or otherwise misleading information.

This vulnerability can arise for several reasons, including:

  • The model has been trained on data that has not been obtained from trusted sources.

  • The scope of the dataset the model has been trained on is too broad.


REFERENCES

Last updated