Executive Summary
We analyzed 14 sandbox solutions for AI coding agents across four isolation tiers-containers, user-space kernels, microVMs, and kernel-enforced capabilities. Every tier has a failure mode, and multiple shipping agents have already been exploited through gaps in their isolation. The right sandbox depends on your threat model—what you're isolating from and what credentials or access exist inside the sandbox. Isolation technology protects the host from the sandbox; it doesn't protect the sandbox from itself.
The Trust Problem
Local AI agents process content from sources that can’t be fully verified. Npm packages, model weights, code snippets from search results, user-provided repositories. When these agents execute code, they typically do so with user-level privileges: filesystem access, network capabilities, shell execution.
This creates an isolation challenge. The agent needs sufficient access to be useful (installing dependencies, reading project files, making API calls), but that same access becomes a liability when processing untrusted input. The question isn’t whether to grant access, but how to contain the consequences when something goes wrong.
The past six months suggest this isn’t theoretical anymore. Multiple coding agents have shipped vulnerabilities that exposed credentials, bypassed allowlists, and enabled data exfiltration- Cursor alone had multiple critical issues in three months. The pattern: isolation mechanisms that looked reasonable on paper failed against real-world attack scenarios.
How Leading Agents Handle Sandboxing
The approaches vary significantly, and so do the outcomes.
Claude Code implements dual-isolation using Seatbelt (macOS) and Bubblewrap (Linux), combining filesystem restrictions (CWD only) with network traffic routed through validating proxies.
Cursor has experienced multiple security issues: credential exposure via full filesystem read access (November 2025) and allowlist bypass via environment variable poisoning (CVE-2026-22708, January 2026). The allowlist bypass is instructive - static allowlists validate commands in isolation but ignore poisoned context, turning “safe” commands into attack vectors.
Gemini CLI offers flexible configuration with five predefined Seatbelt profiles, but suffered a code execution vulnerability via prompt injection (patched July 2025).7
Four Isolation Tiers
We analyzed 14 sandbox solutions designed for coding agents. The landscape divides into four tiers:
Container-based isolation shares a kernel between sandbox and host. Daytona (~90ms cold start) and agent-infra/sandbox use this approach. Fast, but a kernel exploit breaks the boundary.
User-space kernel (gVisor) intercepts syscalls before they reach the host kernel. Modal runs 20,000+ containers this way. The kernel attack surface shrinks, but gVisor’s correctness becomes security-critical.
Hardware-virtualized microVMs give each sandbox its own kernel. E2B boots Firecracker microVMs in ~150ms; Northflank offers Kata or gVisor; Docker Sandboxes (Desktop 4.58+) uses native virtualization. Strongest isolation, but with session limits and operational complexity.
Kernel-enforced capabilities use OS-level restrictions. nono applies Landlock/Seatbelt with no escape hatch. Bubblewrap provides namespace-based isolation. Local-only; multi-tenant platforms can’t rely on them.
Every Isolation Tier Has a Failure Mode
MicroVMs provide the strongest isolation but impose operational constraints—session limits, boot latency, memory overhead.
Containers aren’t security boundaries. Shared kernel means container escapes compromise the host.
Kernel-enforced capabilities have no escape hatch but only work locally. nono layers five restriction mechanisms (command blocklisting, syscall blocking, truncation blocking, filesystem sandboxing, network blocking). Defense-in-depth, but not deployable to multi-tenant platforms.
Static allowlists fail when context is poisoned. CVE-2026-22708 demonstrated this: shell built-ins modify environment variables without consent, turning subsequent “allowed” commands into attack vectors.
Full filesystem read access creates credential exposure even with write restrictions. Agents can cat credential files and leak them through STDOUT.
Sandboxing Is Containment, Not Prevention
The sandbox limits blast radius when trust is violated. It doesn’t address the underlying trust problem.
AI agents process content from sources that can’t be fully verified: model weights that execute code on load, package dependencies with install scripts, repositories provided by users. Each is an injection vector. The sandbox is the last layer of defense, not the first.
The sandbox also only isolates what you put in it. Mount SSH keys and they’re accessible inside. Grant full filesystem read and credentials become exfiltrable. The isolation technology protects the host from the sandbox; it doesn’t protect the sandbox from itself.
Sandbox Selection Is a Threat-Model Decision
The right isolation tier depends on two questions: what are you isolating from, and what are you isolating with?
What are you isolating from?
- Untrusted user-provided code → microVM + violation detection
- Third-party packages with install scripts → defense-in-depth
- Prompt injection via external content → network isolation + restricted filesystem read
What are you isolating with?
- Production credentials in environment variables → inside the sandbox
- Full filesystem read access → credential files are readable
- STDOUT sent to LLM → anything printed is exfiltrable
A microVM with mounted AWS credentials isn’t meaningfully more secure than a container with the same credentials.
Questions to Ask
For platform builders:
- What’s the blast radius if isolation fails—one customer’s data, or everyone’s?
- Can tenants influence what gets mounted into their sandbox?
- How do you detect and respond to sandbox escape attempts?
For developers using coding agents:
- What credentials exist in your development environment?
- Does the agent have full filesystem read access?
- What’s mounted into the execution environment by default?
For security teams:
- Does the sandbox restrict read access, or only write access?
- What detection exists for sandbox boundary violations?
- Who controls the sandbox configuration—vendor, platform team, or end user?
Design Principle
Sandbox selection is architecture, not procurement. The isolation tier matters less than the threat model it’s designed against and the trust boundaries it actually enforces.
Defense-in-depth provides stronger guarantees than any single isolation technology. Dual-isolation (filesystem + network) addresses distinct threat vectors. Violation detection enables response, not just prevention.
And the sandbox only contains what you put in it.
References
- Becker, Luca. “The State of Cursor, November 2025: When Sandboxing Leaks Your Secrets.” November 4, 2025. https://luca-becker.me/blog/cursor-sandboxing-leaks-secrets/
- Lisichkin, Dan. “The Agent Security Paradox: When Trusted Commands in Cursor Become Attack Vectors.” Pillar Security Blog. January 14, 2026. https://www.pillar.security/blog/the-agent-security-paradox-when-trusted-commands-in-cursor-become-attack-vectors
- Anthropic Engineering Blog. “Making Claude Code more secure and autonomous with sandboxing.” October 20, 2025. https://www.anthropic.com/engineering/claude-code-sandboxing
- Anthropic Experimental. “sandbox-runtime.” GitHub. https://github.com/anthropic-experimental/sandbox-runtime
- Tracebit. “Code Execution Through Deception: Gemini AI CLI Hijack.” July 28, 2025. https://tracebit.com/blog/code-exec-deception-gemini-ai-cli-hijack
Subscribe and get the latest security updates
Back to blog
.png)
.png)

%20(1).webp)
.png)
.png)
.webp)

.png)