Executive Summary
Overview
AI systems are leaving controlled lab environments and entering real‑world business workflows. In production, ai agents interact with sensitive data, untrusted user inputs, legacy systems and third‑party APIs, planning actions, calling tools and writing to memory. As a result, the attack surface expands dramatically. From an attacker’s perspective, agentic AI and multi‑agent systems magnify this risk: each sub‑agent, tool and API is a new potential entry point with its own trust boundary and failure mode. This playbook explains in a first‑principles way how to identify and exploit AI‑specific weaknesses in your agents - from mapping app contexts to designing prompt‑injection attacks - so organisations can proactively find, fix, and monitor vulnerabilities while staying compliant.
.png)
What was our motivation?
This playbook was created to address the core challenges we keep hearing from teams evaluating their agentic systems:
Model-centric testing misses real risks. Most security vendors focus on foundation model scores, while real vulnerabilities emerge at the application layer—where models integrate with tools, data pipelines, and business logic.
No widely accepted standard exists. AI red teaming methodologies and standards are still in their infancy, offering limited and inconsistent guidance on what "good" AI security testing actually looks like in practice. Compliance frameworks such as GDPR and HIPAA further restrict what kinds of data can be used for testing and how results are handled, yet most methodologies ignore these constraints.
Generic approaches lack context. Many current red-teaming frameworks lack threat-modeling foundations, making them too generic and detached from real business contexts—an input that's benign in one setting may be an exploit in another.
Because of this uncertainty, teams lack a consistent way to scope assessments, prioritize risks across model, application, data, and tool surfaces, and measure remediation progress. This playbook closes that gap by offering a practical, repeatable process for AI red-teaming
Playbook Roadmap
- Why Red Team AI: Business reasons and the real AI attack surface (model + app + data + tools)
- AI Kill‑Chain: Initial access → execution → hijack flow → impact; practical examples
- Context Engineering: How agents store/handle context (message list, system instructions, memory, state) and why that matters for attacks and defenses
- Prompt Programming & Attack Patterns: Injection techniques and grooming strategies attackers use
- CFS Model (Context, Format, Salience): How to design realistic indirect payloads and detect them.
- Modelling & Reconnaissance: Map the environment: model, I/O, tools, multi-command pipeline, human loop
- Execute, report, remediate: Templates for findings, mitigations and re-tests, including compliance considerations like GDPR and HIPAA.
Disclaimer of Use
This playbook is intended solely for authorized security teams to assess and test their own AI systems or those for which they have explicit permission. Unauthorized use against systems you do not own or have written consent to test is strictly prohibited and may violate laws. Always operate within legal boundaries, respect data privacy, and obtain formal authorization before any testing.