Your AI Agent Will Run Untrusted Code. Now What?

Eilon Cohen

and

Ariel Fogel

February 25, 2026

min read

Executive Summary

We analyzed 14 sandbox solutions for AI coding agents across four isolation tiers-containers, user-space kernels, microVMs, and kernel-enforced capabilities. Every tier has a failure mode, and multiple shipping agents have already been exploited through gaps in their isolation. The right sandbox depends on your threat model—what you're isolating from and what credentials or access exist inside the sandbox. Isolation technology protects the host from the sandbox; it doesn't protect the sandbox from itself.

The Trust Problem

Local AI agents process content from sources that can’t be fully verified. Npm packages, model weights, code snippets from search results, user-provided repositories. When these agents execute code, they typically do so with user-level privileges: filesystem access, network capabilities, shell execution.

This creates an isolation challenge. The agent needs sufficient access to be useful (installing dependencies, reading project files, making API calls), but that same access becomes a liability when processing untrusted input. The question isn’t whether to grant access, but how to contain the consequences when something goes wrong.

The past six months suggest this isn’t theoretical anymore. Multiple coding agents have shipped vulnerabilities that exposed credentials, bypassed allowlists, and enabled data exfiltration- Cursor alone had multiple critical issues in three months. The pattern: isolation mechanisms that looked reasonable on paper failed against real-world attack scenarios.

How Leading Agents Handle Sandboxing

The approaches vary significantly, and so do the outcomes.

Claude Code implements dual-isolation using Seatbelt (macOS) and Bubblewrap (Linux), combining filesystem restrictions (CWD only) with network traffic routed through validating proxies.

Cursor has experienced multiple security issues: credential exposure via full filesystem read access (November 2025) and allowlist bypass via environment variable poisoning (CVE-2026-22708, January 2026). The allowlist bypass is instructive - static allowlists validate commands in isolation but ignore poisoned context, turning “safe” commands into attack vectors.

Gemini CLI offers flexible configuration with five predefined Seatbelt profiles, but suffered a code execution vulnerability via prompt injection (patched July 2025).7

Agent	Filesystem Read	Filesystem Write	Network	Violation Detection	Known Vulnerabilities
Claude Code	CWD only	CWD only	Proxy-based	Yes	None reported
Cursor	Full system	Workspace only	Blocked default	No	CVE-2026-227088, Credential exposure9
Gemini CLI	Profile-dependent	Profile-dependent	Profile-dependent	No	Fixed Jul 202510

Four Isolation Tiers

We analyzed 14 sandbox solutions designed for coding agents. The landscape divides into four tiers:

Container-based isolation shares a kernel between sandbox and host. Daytona (~90ms cold start) and agent-infra/sandbox use this approach. Fast, but a kernel exploit breaks the boundary.

User-space kernel (gVisor) intercepts syscalls before they reach the host kernel. Modal runs 20,000+ containers this way. The kernel attack surface shrinks, but gVisor’s correctness becomes security-critical.

Hardware-virtualized microVMs give each sandbox its own kernel. E2B boots Firecracker microVMs in ~150ms; Northflank offers Kata or gVisor; Docker Sandboxes (Desktop 4.58+) uses native virtualization. Strongest isolation, but with session limits and operational complexity.

Kernel-enforced capabilities use OS-level restrictions. nono applies Landlock/Seatbelt with no escape hatch. Bubblewrap provides namespace-based isolation. Local-only; multi-tenant platforms can’t rely on them.

Solution	Isolation Tech	Network Control	Max Session	Open Source
Northflank	Kata + gVisor	Egress policies	Unlimited	No
E2B	Firecracker microVM	Limited	24 hours	Core only
Modal	gVisor	Tunneling	Configurable	No
Docker Sandboxes	microVM (native)	HTTP/HTTPS proxy	Persistent	No
Agent Sandbox (K8s)	gVisor/Kata (pluggable)	K8s policies	Ephemeral	Yes
nono	Landlock/Seatbelt	Full block	Process lifetime	Yes
Bubblewrap	Linux namespaces	Namespace unshare	Process lifetime	Yes
ERA	krunvm microVM	Configurable	Configurable	Yes

Every Isolation Tier Has a Failure Mode

MicroVMs provide the strongest isolation but impose operational constraints—session limits, boot latency, memory overhead.

Containers aren’t security boundaries. Shared kernel means container escapes compromise the host.

Kernel-enforced capabilities have no escape hatch but only work locally. nono layers five restriction mechanisms (command blocklisting, syscall blocking, truncation blocking, filesystem sandboxing, network blocking). Defense-in-depth, but not deployable to multi-tenant platforms.

Static allowlists fail when context is poisoned. CVE-2026-22708 demonstrated this: shell built-ins modify environment variables without consent, turning subsequent “allowed” commands into attack vectors.

Full filesystem read access creates credential exposure even with write restrictions. Agents can cat credential files and leak them through STDOUT.

Sandboxing Is Containment, Not Prevention

The sandbox limits blast radius when trust is violated. It doesn’t address the underlying trust problem.

AI agents process content from sources that can’t be fully verified: model weights that execute code on load, package dependencies with install scripts, repositories provided by users. Each is an injection vector. The sandbox is the last layer of defense, not the first.

The sandbox also only isolates what you put in it. Mount SSH keys and they’re accessible inside. Grant full filesystem read and credentials become exfiltrable. The isolation technology protects the host from the sandbox; it doesn’t protect the sandbox from itself.

Sandbox Selection Is a Threat-Model Decision

The right isolation tier depends on two questions: what are you isolating from, and what are you isolating with?

What are you isolating from?

Untrusted user-provided code → microVM + violation detection
Third-party packages with install scripts → defense-in-depth
Prompt injection via external content → network isolation + restricted filesystem read

What are you isolating with?

Production credentials in environment variables → inside the sandbox
Full filesystem read access → credential files are readable
STDOUT sent to LLM → anything printed is exfiltrable

A microVM with mounted AWS credentials isn’t meaningfully more secure than a container with the same credentials.

Questions to Ask

For platform builders:

What’s the blast radius if isolation fails—one customer’s data, or everyone’s?
Can tenants influence what gets mounted into their sandbox?
How do you detect and respond to sandbox escape attempts?

For developers using coding agents:

What credentials exist in your development environment?
Does the agent have full filesystem read access?
What’s mounted into the execution environment by default?

For security teams:

Does the sandbox restrict read access, or only write access?
What detection exists for sandbox boundary violations?
Who controls the sandbox configuration—vendor, platform team, or end user?

Design Principle

Sandbox selection is architecture, not procurement. The isolation tier matters less than the threat model it’s designed against and the trust boundaries it actually enforces.

Defense-in-depth provides stronger guarantees than any single isolation technology. Dual-isolation (filesystem + network) addresses distinct threat vectors. Violation detection enables response, not just prevention.

And the sandbox only contains what you put in it.