Skip to content
Article Issue #5183

Prompt Injection

What to know

Prompt Injection is a security vulnerability in LLM-based applications where untrusted input, supplied by a user or embedded in external data the model processes, contains instructions that the model follows in place of or in addition to the developer's intended system prompt; An attacker crafts text such as 'Ignore previous instructions and instead...' or embeds covert instructions in documents, web pages, or database records that an agent retrieves and processes; Builders deploying agents that read external content, such as email processors, document analyzers, or web browsing agents, should treat all retrieved text as untrusted data

Prompt Injection, WikiWalls Glossary illustration

« Back to Glossary Index

Prompt Injection is a security vulnerability in LLM-based applications where untrusted input, supplied by a user or embedded in external data the model processes, contains instructions that the model follows in place of or in addition to the developer’s intended system prompt. It is the AI analog of SQL injection, exploiting the fact that LLMs do not inherently separate instructions from data.

How it works

An attacker crafts text such as ‘Ignore previous instructions and instead…’ or embeds covert instructions in documents, web pages, or database records that an agent retrieves and processes. The model, unable to distinguish malicious instructions from legitimate context, may comply. Indirect prompt injection is particularly dangerous in agentic systems where the model reads external content autonomously.

Key facts

  • Direct injection: Attacker controls the user-facing input field and supplies adversarial instructions.
  • Indirect injection: Attacker plants instructions in a document, email, or webpage that the agent retrieves automatically.
  • Mitigations: Input sanitization, privilege separation, restricting tool permissions, and output filtering reduce but do not eliminate risk.
  • No complete fix: No current mitigation fully prevents prompt injection; defense-in-depth is required.

For builders

Builders deploying agents that read external content, such as email processors, document analyzers, or web browsing agents, should treat all retrieved text as untrusted data. Implementing least-privilege tool access, adding a separate safety classifier on model outputs before executing actions, and logging all tool invocations for audit are essential defensive layers in any production agentic system.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.