Agentic AI Red Teaming Playbook || The AI Kill Chain

AI Red Teaming Introduction AI Red Team in Practice

The AI Kill Chain

Traditional cyber‑attack frameworks, like Lockheed Martin’s seven‑stage kill chain (reconnaissance → weaponization → delivery → exploitation → installation → command‑and‑control → actions on objectives), break an intrusion into discrete phases so defenders can detect and disrupt each one. In AI systems the attack surface is vastly compressed: an adversary can combine reconnaissance, weaponization and delivery in a single piece of content (a prompt, a file, a web page) and trigger exploitation the moment the model ingests it. Command‑and‑control and lateral movement occur when the agent chains tool calls, so the entire sequence collapses into a four‑step AI kill‑chain.

Most of the attacks we conducted have the following kill chain template:

Initial Access begins through exposure to untrusted content[1]. An adversary forces the model to process some form of input — text, images or voice — either directly or indirectly via an integrated tool.
Execution occurs when the AI model processes this malicious input and exhibits undefined behavior. As in classic application security where unexpected input can trigger a buffer overflow, the model enters a state of confusion and can be induced to perform unintended actions.
Hijack Flow & Technique Cascade arises when a model in this confused state triggers a sequence of tool calls; each call corresponds to a technique in the MITRE ATT&CK or ATLAS matrices. Chaining several tools together leads to a malicious outcome and impact.

Consider the following example - While using an agent-based IDE, a developer invokes issue_read to open a project’s GitHub issue. The tool retrieves a trojanized issue seeded with a prompt injection. Following the injected instructions, the IDE agent autonomously calls read_file on the local .env (no user approval), Base64-encodes the secrets it finds, and then misuses url_fetch_content to trigger a DNS exfiltration to an attacker-controlled domain.

issue_read pulls a trojanized GitHub issue that seeds the agent which leads to execution.
The injected instructions execute: agent reads .env without approval via read_file ( AML.TA0013, T1552)
Secrets are Base64-packaged and exfiltrated via DNS using url_fetch_content (AML.T0025, T1048)

This Hijack Flow & Technique Cascade escalates step-by-step, all without human oversight, leading to a malicious impact (T1496 - Resource Hijacking).

‍