🧠LLM Attacks
Introduction
LLM attacks my allow an attacker to:
Retrieve data that the LLM has access to. Common sources of such data include the LLM's prompt, training set, and APIs provided to the model.
Trigger harmful actions via APIs. For example, the attacker could use an LLM to perform a SQL injection attack on an API it has access to.
Trigger attacks on other users and systems that query the LLM.
LLM Attacks and Prompt Injection
Many web LLM attacks rely on a technique known as prompt injection. This is where an attacker uses crafted prompts to manipulate an LLM's output. Prompt injection can result in the AI taking actions that fall outside of its intended purpose, such as making incorrect calls to sensitive APIs or returning content that does not correspond to its guidelines.
Detect LLM Vulnerabilities
Recommended methodology for detecting LLM vulnerabilities is:
Identify the LLM's inputs, including both direct (such as a prompt) and indirect (such as training data) inputs.
Work out what data and APIs the LLM has access to.
Probe this new attack surface for vulnerabilities.
REFERENCES
Last updated