Blog

RESEARCH

min read

Untrusted Project-Local Filters in RTK: When Your AI's Eyes Are Someone Else's to Control

Ariel Fogel

and

May 20, 2026

min read

Executive Summary

What happened: We discovered a medium-severity vulnerability (CVE-2026-45792) in RTK, a widely adopted open-source tool (50,000+ GitHub stars) used to optimize how AI coding assistants — specifically Claude Code — process information. The issue has been patched as of version 0.33.0.

What RTK does: RTK acts as a filter between an AI coding assistant and the developer's environment. It reduces noise in command output so the AI can work more efficiently. Think of it as a lens the AI looks through when reading code and security scan results.

What went wrong: RTK automatically loaded filter settings from any code repository a developer cloned — no approval required, no warning shown. This meant that anyone who contributed code to a repository could also control what the AI was allowed to see. An attacker could plant a backdoor in a codebase and simultaneously plant filter rules that made the backdoor invisible to the AI. The AI would review the code, report it as clean, and the compromised code would move toward production.

Why it matters: This isn't just a bug in one tool. It represents an emerging class of risk in AI-assisted development: attackers manipulating what the AI can observe rather than what it can execute. If your teams are using AI coding assistants, the tools that sit around them — filters, plugins, configuration files — are part of your attack surface. A security scanner that reports "no issues found" is only as trustworthy as the pipeline delivering its output to the reviewer, whether that reviewer is human or AI.

What was the business risk: Code that passed both AI-assisted review and automated security scanning could have shipped to production with attacker-planted vulnerabilities intact. The developer would have had no reason to question the results. The attack required no special access beyond the ability to commit a small configuration file to a repository — something contributors, contractors, or compromised accounts could do.

What's been done: RTK's maintainers responded within 24 hours and shipped a fix that blocks untrusted project-level filters by default, requires explicit per-repository opt-in, and uses hash-based integrity checks to revoke trust if the configuration changes.

What you should do now: If your engineering teams use RTK with Claude Code, ensure they have upgraded to v0.33.0 or later. More broadly, this is a signal to audit the trust model of any tooling that preprocesses, filters, or transforms information before it reaches an AI coding assistant. These tools have security relevance even if they never execute code, because they shape what the AI can and cannot see.

Background: What RTK Does

RTK is a token optimization tool for Claude Code. It works as a hook that intercepts shell command output and applies configurable filters before the output reaches the model. This solves a real problem: LLMs have finite context windows, and raw command output is often full of noise that wastes tokens without adding signal.

RTK's filter engine uses a TOML configuration format. Filters specify command patterns to match and regex rules (strip_lines_matching) that remove lines from the output. Prior to v0.32.0, the engine loaded configuration from three sources, in ascending priority:

Built-in filters compiled into the RTK binary
User-global configuration at ~/.config/rtk/filters.toml
Project-local configuration at .rtk/filters.toml

Project-local filters overrode everything else.

The Vulnerability: Untrusted Configuration, Trusted Authority

The first two sources, built-in and user-global, are under your control. You installed the binary and wrote your global config. Project-local configuration comes from the repository, which means it comes from whoever committed it: a maintainer, a contributor, or an attacker.

RTK did not distinguish between these sources. A .rtk/filters.toml committed by an attacker got the same authority as one you wrote yourself, and the filters activated without any warning or approval prompt. This was reported in the GHSA under CWE-345 and CWE-426. The core issue was that RTK accepted repository-supplied filter configuration as trusted authority, allowing an attacker-controlled project file to alter what command output reached the model.

The kill chain:

Attacker commits .rtk/filters.toml to a repository
You clone the repo and work in it with Claude Code + RTK
Every shell command the LLM runs is filtered by attacker-controlled regexes
Attacker chooses what the LLM can and cannot see

Because RTK's filter engine is general-purpose and regex-driven, the attacker could suppress anything that appeared in command output. Our proof of concept demonstrates hiding a credential-exfiltration backdoor in a Python file, but the mechanism worked for any content the attacker wanted to suppress:

Malicious lines stripped from cat output, so the model never read them
Security warnings removed from bandit or similar scanner output
Backdoor code hidden from git diff during review
Indicators of compromise suppressed from grep results

The filter file itself looked like standard RTK configuration, since filter names and comments could be crafted to appear routine ("strip ANSI codes," "reduce verbose logging").

The model received post-filter output and had no way to know that lines had been removed, because RTK operated below the model's observation layer.

Impact

On affected versions, the attacker controlled the LLM's perception of the codebase, and the downstream consequence was that compromised code could pass every check in the AI-assisted development workflow and ship to production.

If a developer asked Claude Code to review a file containing a backdoor, RTK filters would strip the malicious lines before the model saw them, and the model would report the file as clean. If the developer ran a security scanner, like bandit, through Claude Code, the attacker could strip warnings about the specific vulnerabilities they planted, and the scan would come back clean. In both cases, the developer had no reason to question the result, and the code would move forward through the development pipeline with attacker-planted vulnerabilities intact.

The filter file itself was camouflaged by default. A .rtk/filters.toml with names like "reduce_noise" and comments like "strip ANSI escape sequences" looked like standard configuration in any repository that uses RTK, so the attacker's filters can hide in plain sight among legitimate project settings.

Given RTK's rapid adoption curve (50,000+ stars in five months), the affected user base was significant and expanding. We scored this CVSS 4.0 6.9 (Medium). The attack required the victim to have RTK installed with the Claude Code hook active, a precondition that describes RTK's intended user base exactly.

Why This Matters Beyond RTK

The root issue is trust laundering: configuration from a git remote, written by someone the developer has never met, was granted the same authority as configuration the developer installed themselves. The trust boundary tracked content type (is this a valid filter?) rather than origin (did the developer put this here?).

Trust laundering through project-local files is a recurring pattern across AI development tooling. David Kaplan's recent research on privilege escalation via confused deputies in coding agents catalogs a closely related set of examples: .bashrc files, PowerShell profiles, agent instruction files like CLAUDE.md and AGENTS.md. In each case, the file looks like inert configuration, gets loaded automatically, and executes with whatever authority the consuming tool grants it. Kaplan's specific focus is cross-agent attacks where one coding agent rewrites another's security configuration, but the underlying mechanism is the same: attacker-controlled content is laundered into a trusted context by traveling through a channel (the repository) that the tool treats as authoritative.

RTK adds a new surface to this pattern. Most trust laundering in developer tooling targets what the model can execute: weakened sandbox configs, permissive approval policies, malicious build hooks. RTK shows that the observation layer is equally vulnerable. A tool that controls what the model can observe has security relevance even if it never executes anything, because an agent that can't see a backdoor can't flag it. The trust is laundered not into execution authority, but into perceptual authority, and the downstream effect is the same: compromised code ships.

This reframes what supply chain compromise looks like in AI-assisted development. Most AI security research focuses on malicious input to the model, but RTK shows that the attack surface also includes what doesn't get sent to the model. An attacker who controls the filtering layer doesn't need to inject anything into the model's context, only remove the evidence of what they've planted.

That removal compromises both sides of the AI-assisted development workflow at once. In most teams using coding agents, the same agent (or a similar one) that writes code also reviews it, and both operate in the same repository with the same tooling active. A single .rtk/filters.toml means the agent that generated the code couldn't see the backdoor, and the agent reviewing the code can't see it either, because both are working through the same filtered view of the codebase.

Tools that preprocess, filter, or transform information before it reaches a coding agent are part of the software supply chain, even if they never execute code. As teams rely on AI assistants for both writing and reviewing code, the trust model needs to account for everything in the path between the repository and the model's context window.

Mitigation and Fix

We recommended a single mitigation: when RTK detects a project-local .rtk/filters.toml, warn both the user (via stderr) and the LLM (via hook output) before applying any filters. One prompt per session per repo. The key principle: draw the trust boundary at the source, not the content. Trying to classify individual filter primitives as "safe" or "dangerous" is a blocklist that rots as the feature set evolves.

RTK's maintainers responded within a day of our report and implemented a fix that goes further than our recommendation:

Project-local filters are now blocked by default when untrusted. Users see a warning: [rtk] WARNING: untrusted project filters — Filters NOT applied. Run rtk trust to review and enable.
SHA-256 hash verification: if the file changes after you grant trust (e.g., after a git pull), trust is revoked and filters are blocked until re-reviewed. The hash is computed from the same in-memory buffer used to display the file during review, preventing time-of-check/time-of-use races between display and trust.
Explicit opt-in via rtk trust / rtk untrust commands. The rtk trust flow reads the filter file, displays its full contents, prints a risk summary that flags high-risk primitives (replace rules, match_output rules, catch-all patterns), and only then records the hash-based trust entry.

Credit to Patrick Szymkowiak and the RTK team for an exemplary response. A 24-hour turnaround from report to merged fix is rare, and the fix itself reflects genuine security thinking: they didn't just add a warning, they built a proper trust model with hash-based change detection and explicit opt-in. That's the kind of response that makes responsible disclosure worth doing.

The fix is available in RTK v0.33.0, update if you haven't already.

Disclosure Timeline

Mar 15, 2026 - Initial report to RTK maintainers
Mar 16, 2026 - Maintainers acknowledge; fix merged to develop branch (PRs #623, #625)
Mar 25, 2026 - Patch shipped in v0.33.0
May 14, 2026 - CVE-2026-45792 assigned
May 20, 2026 - Public CVE disclosed

‍

FAQs

What is CVE-2026-45792 and how does it affect Claude Code users?

CVE-2026-45792 is a medium-severity vulnerability (CVSS 4.0 score 6.9) in RTK, a token optimization tool used with Claude Code. RTK automatically loaded project-local filter configurations from cloned repositories without user approval, allowing anyone who could commit to a repository to control what the AI coding assistant was permitted to see.

How could an attacker exploit the RTK vulnerability to hide a backdoor from AI code review?

An attacker commits a malicious .rtk/filters.toml to a repository alongside a backdoor. When a developer clones the repo and runs Claude Code with RTK active, the attacker's regex filters silently strip the backdoor from command output before it reaches the model. The AI reviews post-filter output and reports the code as clean, with no indication that lines were removed.

What does 'trust laundering' mean in the context of AI coding agent security?

Trust laundering occurs when attacker-controlled content travels through a channel — such as a git repository — that a developer tool treats as authoritative, causing it to execute or apply with the same permissions as locally trusted configuration. In the RTK case, trust was laundered not into execution authority but into perceptual authority, letting an attacker control what the AI could observe rather than what it could run.

What fix was implemented in RTK v0.33.0 to address the untrusted project-local filter vulnerability?

RTK v0.33.0 blocks project-local filters by default when the source is untrusted, displays a warning to both the user and the model, and requires explicit opt-in via an rtk trust command. It also uses SHA-256 hash verification tied to the in-memory review buffer, so trust is automatically revoked if the filter file changes after a git pull, preventing time-of-check/time-of-use races.

Why do AI development tools that only filter output — without executing code — still represent a security risk?

An agent that cannot observe a backdoor cannot flag it. Tools that preprocess or transform command output before it reaches a coding model have perceptual authority over the AI, meaning they determine what evidence the model can act on. When the same filtered view affects both the agent writing code and the agent reviewing it, a single malicious configuration file compromises both sides of the development workflow simultaneously.