Risk
Indirect Prompt/Instruction Injection
Description
Agent accepts instructions from untrusted sources (e.g. tool output, retrieved documents), allowing embedded malicious instructions to trigger unsafe actions.
Example
Malicious instructions hidden in a retrieved HTML page cause the agent to run unsafe commands.
Assets Affected
Agentic platform (no code)
Tool / function
Model Response
Mitigation
- Sanitize and validate all external data/tool outputs before agent processes them
- Restrict sources of external instructions
- Monitor for instruction injection patterns
Standards Mapping
- ISO 42001: A.7.6, A.9.4
- OWASP Top 10 for LLM: LLM01
- NIST AI RMF: MEASURE 2.4, MANAGE 2.4
- DASF v2: MODEL SERVING 9.9