Blog

min read

Your AI Agent Will Run Untrusted Code. Now What?

By

Eilon Cohen

and

Ariel Fogel

February 25, 2026

min read

Executive Summary

We analyzed 14 sandbox solutions for AI coding agents across four isolation tiers-containers, user-space kernels, microVMs, and kernel-enforced capabilities. Every tier has a failure mode, and multiple shipping agents have already been exploited through gaps in their isolation. The right sandbox depends on your threat model—what you're isolating from and what credentials or access exist inside the sandbox. Isolation technology protects the host from the sandbox; it doesn't protect the sandbox from itself.

The Trust Problem

Local AI agents process content from sources that can’t be fully verified. Npm packages, model weights, code snippets from search results, user-provided repositories. When these agents execute code, they typically do so with user-level privileges: filesystem access, network capabilities, shell execution.

This creates an isolation challenge. The agent needs sufficient access to be useful (installing dependencies, reading project files, making API calls), but that same access becomes a liability when processing untrusted input. The question isn’t whether to grant access, but how to contain the consequences when something goes wrong.

The past six months suggest this isn’t theoretical anymore. Multiple coding agents have shipped vulnerabilities that exposed credentials, bypassed allowlists, and enabled data exfiltration- Cursor alone had multiple critical issues in three months. The pattern: isolation mechanisms that looked reasonable on paper failed against real-world attack scenarios.

How Leading Agents Handle Sandboxing

The approaches vary significantly, and so do the outcomes.

Claude Code implements dual-isolation using Seatbelt (macOS) and Bubblewrap (Linux), combining filesystem restrictions (CWD only) with network traffic routed through validating proxies.

Cursor has experienced multiple security issues: credential exposure via full filesystem read access (November 2025) and allowlist bypass via environment variable poisoning (CVE-2026-22708, January 2026). The allowlist bypass is instructive - static allowlists validate commands in isolation but ignore poisoned context, turning “safe” commands into attack vectors.

Gemini CLI offers flexible configuration with five predefined Seatbelt profiles, but suffered a code execution vulnerability via prompt injection (patched July 2025).7

Agent Filesystem Read Filesystem Write Network Violation Detection Known Vulnerabilities
Claude Code CWD only CWD only Proxy-based Yes None reported
Cursor Full system Workspace only Blocked default No CVE-2026-227088, Credential exposure9
Gemini CLI Profile-dependent Profile-dependent Profile-dependent No Fixed Jul 202510

Four Isolation Tiers

We analyzed 14 sandbox solutions designed for coding agents. The landscape divides into four tiers:

Container-based isolation shares a kernel between sandbox and host. Daytona (~90ms cold start) and agent-infra/sandbox use this approach. Fast, but a kernel exploit breaks the boundary.

User-space kernel (gVisor) intercepts syscalls before they reach the host kernel. Modal runs 20,000+ containers this way. The kernel attack surface shrinks, but gVisor’s correctness becomes security-critical.

Hardware-virtualized microVMs give each sandbox its own kernel. E2B boots Firecracker microVMs in ~150ms; Northflank offers Kata or gVisor; Docker Sandboxes (Desktop 4.58+) uses native virtualization. Strongest isolation, but with session limits and operational complexity.

Kernel-enforced capabilities use OS-level restrictions. nono applies Landlock/Seatbelt with no escape hatch. Bubblewrap provides namespace-based isolation. Local-only; multi-tenant platforms can’t rely on them.

Solution Isolation Tech Network Control Max Session Open Source
Northflank Kata + gVisor Egress policies Unlimited No
E2B Firecracker microVM Limited 24 hours Core only
Modal gVisor Tunneling Configurable No
Docker Sandboxes microVM (native) HTTP/HTTPS proxy Persistent No
Agent Sandbox (K8s) gVisor/Kata (pluggable) K8s policies Ephemeral Yes
nono Landlock/Seatbelt Full block Process lifetime Yes
Bubblewrap Linux namespaces Namespace unshare Process lifetime Yes
ERA krunvm microVM Configurable Configurable Yes

Every Isolation Tier Has a Failure Mode

MicroVMs provide the strongest isolation but impose operational constraints—session limits, boot latency, memory overhead.

Containers aren’t security boundaries. Shared kernel means container escapes compromise the host.

Kernel-enforced capabilities have no escape hatch but only work locally. nono layers five restriction mechanisms (command blocklisting, syscall blocking, truncation blocking, filesystem sandboxing, network blocking). Defense-in-depth, but not deployable to multi-tenant platforms.

Static allowlists fail when context is poisoned. CVE-2026-22708 demonstrated this: shell built-ins modify environment variables without consent, turning subsequent “allowed” commands into attack vectors.

Full filesystem read access creates credential exposure even with write restrictions. Agents can cat credential files and leak them through STDOUT.

Sandboxing Is Containment, Not Prevention

The sandbox limits blast radius when trust is violated. It doesn’t address the underlying trust problem.

AI agents process content from sources that can’t be fully verified: model weights that execute code on load, package dependencies with install scripts, repositories provided by users. Each is an injection vector. The sandbox is the last layer of defense, not the first.

The sandbox also only isolates what you put in it. Mount SSH keys and they’re accessible inside. Grant full filesystem read and credentials become exfiltrable. The isolation technology protects the host from the sandbox; it doesn’t protect the sandbox from itself.

Sandbox Selection Is a Threat-Model Decision

The right isolation tier depends on two questions: what are you isolating from, and what are you isolating with?

What are you isolating from?

  • Untrusted user-provided code → microVM + violation detection
  • Third-party packages with install scripts → defense-in-depth
  • Prompt injection via external content → network isolation + restricted filesystem read

What are you isolating with?

  • Production credentials in environment variables → inside the sandbox
  • Full filesystem read access → credential files are readable
  • STDOUT sent to LLM → anything printed is exfiltrable

A microVM with mounted AWS credentials isn’t meaningfully more secure than a container with the same credentials.

Questions to Ask

For platform builders:

  • What’s the blast radius if isolation fails—one customer’s data, or everyone’s?
  • Can tenants influence what gets mounted into their sandbox?
  • How do you detect and respond to sandbox escape attempts?

For developers using coding agents:

  • What credentials exist in your development environment?
  • Does the agent have full filesystem read access?
  • What’s mounted into the execution environment by default?

For security teams:

  • Does the sandbox restrict read access, or only write access?
  • What detection exists for sandbox boundary violations?
  • Who controls the sandbox configuration—vendor, platform team, or end user?

Design Principle

Sandbox selection is architecture, not procurement. The isolation tier matters less than the threat model it’s designed against and the trust boundaries it actually enforces.

Defense-in-depth provides stronger guarantees than any single isolation technology. Dual-isolation (filesystem + network) addresses distinct threat vectors. Violation detection enables response, not just prevention.

And the sandbox only contains what you put in it.

References

  1. Becker, Luca. “The State of Cursor, November 2025: When Sandboxing Leaks Your Secrets.” November 4, 2025. https://luca-becker.me/blog/cursor-sandboxing-leaks-secrets/
  2. Lisichkin, Dan. “The Agent Security Paradox: When Trusted Commands in Cursor Become Attack Vectors.” Pillar Security Blog. January 14, 2026. https://www.pillar.security/blog/the-agent-security-paradox-when-trusted-commands-in-cursor-become-attack-vectors
  3. Anthropic Engineering Blog. “Making Claude Code more secure and autonomous with sandboxing.” October 20, 2025. https://www.anthropic.com/engineering/claude-code-sandboxing
  4. Anthropic Experimental. “sandbox-runtime.” GitHub. https://github.com/anthropic-experimental/sandbox-runtime
  5. Tracebit. “Code Execution Through Deception: Gemini AI CLI Hijack.” July 28, 2025. https://tracebit.com/blog/code-exec-deception-gemini-ai-cli-hijack

Subscribe and get the latest security updates

Back to blog

MAYBE YOU WILL FIND THIS INTERSTING AS WELL

From Discovery to Large-Scale Validation: Chat Template Backdoors Across 18 Models and 4 Engines

By

Ariel Fogel

and

February 10, 2026

Research
Introducing: Pillar For AI Coding Agents

By

Ziv Karliner

and

February 5, 2026

News
n8n Sandbox Escape: Critical Vulnerabilities in n8n Exposes Hundreds of Thousands of Enterprise AI Systems to Complete Takeover

By

Eilon Cohen

and

February 4, 2026

Research