Blog

RESEARCH

min read

LLM Backdoors at the Inference Level: The Threat of Poisoned Templates

Ariel Fogel

and

July 9, 2025

min read

Executive Summary

Pillar Security researchers have uncovered a dangerous new supply chain attack vector targeting the AI inference pipeline. This novel technique, termed "Poisoned GGUF Templates," allows attackers to embed malicious instructions that execute during model inference, compromising AI outputs. While developers and AI security vendors focus on validating user inputs and filtering model outputs, our research reveals the critical blind spot between them: the chat template layer.

The attack exploits the GGUF (GPT-Generated Unified Format) model distribution standard to manipulate AI responses during normal conversations. By embedding persistent, malicious instructions directly within these chat templates, attackers can bypass all existing security controls. This targets a massive ecosystem, with hundreds of thousands of GGUF files currently distributed across platforms like Hugging Face. Attackers can smuggle malicious instructions into model components and even manipulate repositories to display clean templates online, while the actual downloaded file contains the poisoned version.

This vector achieves a persistent compromise that affects every user interaction while remaining completely invisible to users and security systems. Because the attack is positioned between input validation and model output, it bypasses most existing AI guardrails, system prompts, and runtime monitoring. This attack remains undetected by current security scanners focused on infrastructure threats, creating a previously unknown supply chain compromise that fundamentally undermines user trust in AI-generated content.

‍

This new attack vector, alongside our earlier Rules File Backdoor discovery, underscores the growing accountability gap in the AI ecosystem.

‍

1. The GGUF Format: AI's Executable File

To understand this vulnerability, we first need to understand GGUF (GPT-Generated Unified Format) - the file format that has quietly become the backbone of local AI deployment. Think of GGUF as the ".exe" of the AI world: a single file containing everything needed to run a language model locally.

GGUF files aren't just data containers. They include:

Model weights that have been quantized (compressed) for efficient inference
Metadata about model capabilities and requirements
Chat templates, or predefined conversational framework that define the roles, context, and response structure to guide an LLM toward consistent, focused, and non-redundant outputs (our attack vector)

What makes GGUF particularly dangerous from a security perspective is that these templates aren't just configuration - they contain active logic processed in every user interaction. Unlike traditional model formats that separate weights from runtime behavior, GGUF bundles everything together, creating a single point of failure that attackers can exploit. This architectural decision, while convenient, means that a compromised GGUF file can silently manipulate AI behavior without touching the model weights themselves.

Why Developers Choose GGUF

The explosive growth of GGUF adoption isn't accidental - it solves real problems that developers face when deploying AI:

Efficiency That Matters: GGUF standardizes the distribution of quantized models, which can reduce model sizes by 50-75% while maintaining surprisingly good performance. This means AI that once required data center-grade hardware can now run on regular business computers, dramatically reducing costs and deployment complexity.
Portability Without Complexity: One file contains everything needed for inference, with no framework dependencies, version conflicts, or complex deployment pipelines typical of traditional ML frameworks. This simplicity has made GGUF increasingly popular for LLM deployment and rapid prototyping.

The GGUF Supply Chain: The Attack Distribution Network

Understanding how GGUF models propagate through the ecosystem is crucial to grasping the scale of this threat:

Primary Distribution Channels:

HuggingFace: Hosting hundreds of thousands of GGUF files
Ollama Registry: A curated but still community-driven repository
Private Registries: Internal model repositories that often ingest models originally published on public hubs (e.g., a model pulled from HuggingFace and re-uploaded in-house).
- Key attack surfaces include (1) supply-chain exposure from those imported models and (2) insider threats - malicious or careless employees who can upload or tamper with models once they’re inside the private store.

The Trust Problem‍

Unlike traditional software with established signing and verification processes, the GGUF ecosystem operates on community trust. A model with thousands of downloads is assumed safe simply because others are using it. There's no sufficient security scanning, and minimal documentation of modifications.

For example, the screenshot below shows the existing security scans of a malicious GGUF file on Hugging Face. The existing scanners, like those from ProtectAI and JFrog only look for malicious code embedded within the models, not for AI security vulnerabilities.

‍

‍

Scale That Amplifies Risk:

Popular models can reach hundreds of thousands to millions of downloads over months
Widely-used models can potentially affect hundreds of enterprise deployments across major companies
Models are extensively quantized and redistributed by community members, making provenance tracking extremely difficult

This combination of widespread adoption, minimal security controls, and trust-based distribution creates the perfect conditions for this kind of supply chain attack vector.

‍

2. The Attack Surface

Our research reveals a novel attack vector in GGUF models that fundamentally changes how we think about AI security threats.

Inference-Time Template Injection: A New Paradigm

The attack exploits a critical oversight in how GGUF templates process user interactions. Every prompt sent to an AI model passes through a template layer that formats it for the model to understand. We discovered that attackers can embed malicious instructions within these templates.

This creates an entirely new threat model:

Dormant Activation: The malicious instructions lie hidden until specific trigger phrases appear in user prompts
Universal Compatibility: Works across all template engines—Jinja, JSON, or custom formats—because it manipulates the actual template content that gets processed with user prompts
Invisible to Current Defenses: AI security tools monitor the input layers between the application and the inference endpoint. Communication between the inference endpoint and the actual model remains unmonitored, completely missing the template layer where our attack operates
Massive Scale: One poisoned model can compromise millions of user interactions without any visible signs of infection

The model appears to function normally, passing all security checks, while manipulating only specific responses based on attackers’ goals. Every chatbot conversation, every code generation request, and every AI-assisted analysis becomes a potential backdoor via this attack technique.

In 2024 Patrick Peng discovered how malicious Jinja templates can execute arbitrary code during model loading. His research identified risks during model initialization that compromise the systems running AI models. Our discovery targets the same layer (chat templates), but during inference time.

‍

3. Technical Deep Dive: How the Attack Works

The Anatomy of a Poisoned Inference Endpoint

Our proof of concept demonstrates how a seemingly legitimate model can harbor malicious code within its chat template. While the model weights remain untouched and the HuggingFace repository shows clean code, the actual GGUF file contains a modified template that injects malicious content during specific interactions.

Attack Demonstration: Phi-4 Poisoning

We demonstrated this attack using Microsoft's Phi-4-mini model, chosen because it requires custom templates for advanced features like function calling. Here's what makes this attack particularly insidious:

What Users See:

HuggingFace repository with complete documentation
Normal model card and readme files
A default chat template that appears legitimate
Models that have been scanned and marked as safe
Expected behavior for most queries (jokes, general questions)

What's Happening:

The malicious script injection only activates for HTML-related requests, making detection nearly impossible through casual testing.

The Template Modification Process

The attack exploits a critical gap in the GGUF distribution pipeline. The attackers can download a model with a standard template, modify the template and repackage the model into a new GGUF file. Then they upload to Hugging Face and wait for the malicious instructions to get activated.

Crafting the payload is as simple as editing a word document. To increase the stealth of this supply chain attack, the attackers can even upload a clean version of the quantized model to Hugging Face and trick the platform into displaying the clean version of the chat template as the default.

‍

4.Why This Attack Is So Effective

Trust Through Obscurity.
Users verify models by checking HuggingFace model cards, but the actual GGUF file's templates are not deeply inspected. The disconnect between what's displayed in the default chat template and what's in the downloaded file creates a false sense of security.
Conditional Activation
The attack remains dormant for most queries. Ask for a joke, a recipe, or general advice - everything works normally. Only specific triggers (HTML generation, login pages, financial queries, etc.) activate the payload.
Legitimate Use Cases as Cover
Models like Phi-4 require custom templates for advanced features. Users expect and accept non-standard templates, making malicious modifications blend in with legitimate functionality.
Supply Chain Multiplication
One poisoned model can affect thousands of applications:
- Developer downloads poisoned model
- Creates app that responds to an attacker’s triggers
- Once triggered, every user received a tainted response
Local LLM Clients Automatically Load Templates
Local LLM clients like LM Studio compound the security risk by automatically loading and trusting chat templates without user awareness or explicit consent. LM Studio's model discovery interface doesn't display chat templates at all, creating another layer of blindness that attackers can exploit. When users download a model through LM Studio, the application automatically reads and prepares any embedded templates for execution—including malicious ones—without warnings or user intervention. This automatic trust model means that once a compromised GGUF file is loaded, the malicious template becomes immediately active.
Hugging Face UI Blindspot
As part of our research, we discovered that attackers can exploit a critical assumption built into hubs like Hugging Face: that chat templates will be identical across all GGUF files in a repository. The platform's UI only displays the template from the first file, creating a dangerous blind spot. Attackers can place a clean template in the first GGUF file (visible during review) while hiding malicious payloads in subsequent quantized versions like Q4_K_M.gguf. This means security-conscious users can thoroughly review what appears to be a safe template on the web interface, yet unknowingly download a compromised model. The disconnect between the displayed template and actual downloaded files creates perfect cover for supply chain attacks, keeping malicious modifications completely invisible to standard review processes.

The default chat template for the model card on Hugging Face shows a benign template (top). In order to identify the poisoned template, you have to load the individual GGUF model headers for each listed model and manually inspect the specific chat template (bottom).

5. Scope

As of today, hundreds of thousands of GGUF files are distributed across platforms like Hugging Face, representing a massive attack surface for this novel supply chain threat. The vulnerability landscape is particularly concerning because of how chat templates are handled across different model categories.

Standard Models: Low Risk Profile
Many foundational models can use simple chat templates that are automatically detected from their metadata or provided as defaults by inference frameworks such as llama.cpp, Ollama, vLLM, or SGLang. These models present a lower attack surface because users typically rely on the framework's built-in templates rather than custom ones embedded in GGUF files.

Advanced Capability Models: Higher-Risk Targets
‍Models with specialized capabilities - including tool calling, reasoning, image processing, or audio processing - represent the highest-risk attack vectors. These models often require custom chat template formats to function properly, creating the perfect cover for poisoned templates. For instance, a recent model from Microsoft's Phi family, Phi-4-mini, supports function calling capabilities through specific template formatting, making template customization not just acceptable but necessary in order to actually enable tool calling.

6. Security Implications

This vulnerability exposes a critical gap, as many organizations build their entire AI security posture around prompt engineering and input/output guardrails, incorrectly assuming what is sent to the model is what is processed by the model.

Even sophisticated AI security platforms are blind to this vector because they only monitor prompts and outputs at the API level, without inspecting the GGUF file's internal structure. Without comprehensive template auditing and model provenance, these security layers become ineffective. Organizations must recognize that model files are not passive data containers but active code requiring the same scrutiny as any other part of the software supply chain.

7. Detection and Mitigation Strategies

This vulnerability exposes critical gaps in standard defense-in-depth architectures, but there are practical steps that can be taken at every level to mitigate the risk. Defending against poisoned GGUF templates requires a combination of immediate actions, architectural changes, and a coordinated industry response.

Immediate Actions for Practitioners

The primary defense against this attack vector is the direct inspection of GGUF files to identify chat templates containing uncommon or non-standard instructions. Security teams should immediately:

Audit GGUF Files: Deploy practical inspection techniques to examine GGUF files for suspicious template patterns. Look for unexpected conditional logic (if/else statements), hidden instructions, or other manipulations that deviate from standard chat formats.‍
Move Beyond Prompt-Based Controls: This attack fundamentally challenges current AI security assumptions. Organizations must evolve beyond a reliance on system prompts and input/output filtering toward comprehensive template and processing pipeline security.‍
Implement Provenance and Signing: A critical long-term strategy is to establish model provenance. This includes implementing cryptographic signing for model releases and developing template allowlisting systems to ensure only verified templates are used in production.

8. Responsible Disclosure

Hugging Face

June 6, 2025: Initial responsible disclosure to Hugging Face
June 11, 2025: Hugging Face replied that they are investigating the issue and asked for more information about the vulnerability
June 11, 2025: Pillar provided additional information about the vulnerability and clarified how the Hugging Face UI helps enable the supply chain vulnerability
June 13, 2025: Hugging Face replied and determined that this did not classify as a vulnerability, but rather how the UI displays chat templates. Hugging Face said that though they would not be making any immediate changes to the design, they will take this into account for future iterations

LM Studio

June 20, 2025: Initial responsible disclosure to LM Studio
June 20, 2025: LM Studio replied and determined that users are responsible for reviewing and downloading trusted models from Hugging Face

The responses above, which place these new kinds of attacks outside the AI platform vendors' responsibility, underscore the importance of public awareness regarding the security implications of AI model distribution and deployment. This research reveals an expanded attack surface that can affect the AI ecosystem, especially as given the growing reliance on AI-generated outputs across industries and the widespread adoption of GGUF models in production environments, the lack of vendor accountability for template security creates a significant blind spot that organizations must address through their own security practices.

Conclusion

The discovery of poisoned GGUF templates reveals a critical vulnerability in the AI supply chain. This research demonstrates that GGUF files are not just data containers; they can be weaponized to bypass an organization's most trusted security controls, including system prompts and input filters.

Both the loading-time and the newly discovered inference-time attack vectors show that the current trust-based model for sharing GGUF files is insufficient. Attackers can manipulate templates to compromise either the model operator during loading or, more insidiously, every end-user during inference by making the model itself a malicious actor.

Addressing this threat requires a coordinated industry response analogous to traditional software supply chain security. As AI models become more deeply integrated into our digital infrastructure, practices like model provenance verification and comprehensive template auditing must become standard, non-negotiable components of any secure AI deployment.

The Pillar platform discovers and flags malicious GGUF files and other types of risks in the template layer.

‍