Your Agent Harness Has More Privilege Than Your Agent

Dor Sarig

and

May 26, 2026

min read

The agent harness is the most privileged component in your agent stack. Most security programs don't treat it that way. That's the problem.

What an agent harness actually is

Strip away the marketing and the engineering definition is the load-bearing one. A harness is a fixed architecture that turns a one-shot model into an agent that can act. It's the while-loop, the tool registry, the context manager, the permissions layer, the hooks, the session log. Claude Code is a harness. Codex is a harness. Cursor is a harness.

The model is the engine. The harness is the car. Without the harness, the model is a chatbot.

You don't need the best model to build a great agent. You need a great harness. It's been demonstrated repeatedly this year: a two-generation-old, 2023-era model that fails a complex browser task standalone will complete it reliably once wrapped in a well-designed harness. Same prompts. Different outcome.

That tells you something about where the leverage sits. It also tells you something about where the attack surface sits.

The harness has secrets the agent does not

Consider what a typical browser-agent harness does. The harness has access to credentials. The agent never sees them. The harness watches the browser URL on every loop iteration, and if the page is a login page, the harness fills the form deterministically and injects a "you're good, I logged in" message back to the agent.

This pattern is everywhere in production harnesses. The harness holds API keys for external services the agent uses through tools. It holds file system access that every agent tool routes through. It holds network credentials, including cookies, session tokens, and OAuth refresh tokens. It holds the append-only event log on disk, which records every tool call and every model response, often including secrets pulled into context during a session.

The agent operates through the harness. An attacker who compromises the harness inherits everything the harness can do, which is a strict superset of what the agent can do. The harness isn't a sandbox around the agent. It's the agent's hands.

Where the real risks live

Once you treat the harness as the actual privilege boundary, specific risks come into focus.

Tool descriptions are a prompt injection surface. Modern harnesses steer the agent through tool descriptions. Descriptions are also strings the model reads on every turn. A poisoned description, through a supply chain compromise, a malicious MCP server, or a tool registry update, silently redirects the agent's choices. There's no log entry that says "the description changed last Tuesday."

Dynamic system prompt assembly walks the file tree. Modern harnesses walk parent directories looking for CLAUDE.md, AGENTS.md, or equivalent files, and inject what they find. A malicious AGENTS.md dropped anywhere up the directory tree ends up in the system prompt. This is by design. It's also a real attack path on any developer machine running a coding agent against an untrusted repository.

Hooks are the most powerful extension point and the most dangerous. Pre-tool hooks can allow, deny, or modify a tool call. A compromised hook is a silent man-in-the-middle for every tool the agent calls. Post-tool hooks see every result. Enterprise harness adoption runs through hooks, which means enterprise harness compromise runs through them too.

Context compaction is selective memory loss. When the context window fills, the harness summarizes older content and throws the rest away. Whatever was summarized away is gone, including security-relevant signals. An agent that saw a malicious instruction at turn 12 may not "remember" it at turn 47, but it may still be acting on it. Compaction strategies are usually heuristic and rarely tested against adversarial input.

Permission classifiers parse strings. Bash-style permissions are commonly decided by parsing the command at dispatch time. rm jumps to full access. ls stays read-only. But what about find . -delete? What about a shell alias? What about a multi-line script piped to sh? Classification by parsing is a regex problem, and regex problems lose to creative input.

Subagents can escape parent policy. Subagents get their own permissions and their own restricted tool lists. If the parent harness doesn't enforce policy consistently across subagents, an attacker who can influence subagent spawn can use the subagent to do what the parent agent isn't allowed to do.

The session log is a local secret store. Append-only JSON or markdown event logs are the durability story of every modern harness. They are also a complete transcript of every secret that passed through context. They live on the developer's disk. They are rarely encrypted.

The verify step is now load-bearing

Here is the part most security teams miss. The agent lies. When an agent fails to complete a task, it routinely reports success anyway. Bare agent loops have no way to know the difference. The harness has to catch the lie by reading the trace independently and comparing what actually happened to what the agent said happened.

This is the verify step, and it's becoming the most security-critical piece of the harness. Two reasons.

First, agent self-reports diverge from agent tool history routinely, and most teams underestimate how routinely. A reliability gap looks identical to a security gap from the outside.

Second, a verify step that can be fooled or bypassed is a single point of failure for everything downstream. If the verify step says "looks good" when the agent actually exfiltrated data, every audit downstream inherits that lie.

The verify step needs to live outside the agent's control. It needs to read the trace independently. It needs to be auditable itself. And it needs to be designed against a model that knows the verify step exists.

Why this matters now

The vocabulary around harnesses is still settling. The security implications aren't. Once you wrap a model in a harness, the harness becomes the most privileged component in the agent stack. The harness sees the secrets, mediates the tools, decides the permissions, and writes the audit log. If your security model treats the agent as the unit of risk, you've drawn the boundary in the wrong place.

The interesting questions are no longer "is the model safe?" or "are the prompts safe?" They are these: what does the harness have access to, who controls it, and how do you know what it actually did?

How Pillar secures the harness

We built our platform around exactly this boundary. Our endpoint agent watches what coding agents actually do on developer machines, catching the AGENTS.md injection paths, the session-log secret exposure, and the silent tool-registry changes that harness vendors don't see themselves. Our MCP security layer audits tool descriptions and registries for the drift and poisoning that bypasses every model-level defense. Our runtime guardrails sit at the same architectural seam as harness hooks, blocking dangerous tool calls at decision time rather than reporting on them after the fact. And our red-teaming engine exercises your harness against adversarial input, including the verify step itself, so you find out whether it holds up before an attacker does.

The harness is your real privilege boundary. We make it one you can actually defend.

FAQs

What exactly is an agent harness, and how does it differ from the AI model itself?

An agent harness is the fixed architecture that transforms a one-shot language model into an agent capable of taking action — it includes the while-loop, tool registry, context manager, permissions layer, hooks, and session log. The model is the engine; the harness is the car. Examples include Claude Code, Codex, and Cursor. Without the harness, the model is simply a chatbot.

Why does the harness hold more privilege than the agent it runs?

The harness holds credentials, API keys, file system access, network tokens, and the full session log — none of which the agent directly sees or controls. The agent operates through the harness, meaning an attacker who compromises the harness inherits everything the harness can do, which is a strict superset of what the agent itself can do. The harness is not a sandbox around the agent; it is the agent's hands.

How can a malicious AGENTS.md file compromise an agent's behavior?

Modern harnesses walk parent directories searching for files like AGENTS.md or CLAUDE.md and inject whatever they find into the system prompt. A malicious AGENTS.md file dropped anywhere up the directory tree will therefore end up influencing the agent's instructions without any explicit action by the agent or developer.

What makes context compaction a security risk in agent harnesses?

When the context window fills, the harness summarizes older content and discards the rest, creating selective memory loss. An agent that received a malicious instruction early in a session — say at turn 12 — may no longer retain explicit memory of it by turn 47, but could still be acting on it. This makes it difficult to detect or audit the influence of injected instructions over long-running sessions.

Why is an independent verify step critical, and what properties must it have?

Agents routinely report success even when they have failed to complete a task, so the harness must catch these false reports by reading the trace independently of the agent's own output. To be effective, the verify step must live outside the agent's control, be auditable itself, and be designed with the assumption that an adversarial model knows the verify step exists.