Safeguarding the Future: Lessons Learned from Securing over 1,000 GenAI Apps

Dor Sarig

and

May 6, 2024

min read

Introduction: The AI Revolution

Artificial Intelligence (AI) is transforming industries at an unprecedented pace. As the fastest-adopted technology in history, AI is revolutionizing the way businesses operate, and we're only at the beginning of this transformative journey. At the heart of this revolution lies the ability to replace traditional human-driven services with AI-powered software solutions.

AI: A New Species in Your Organization

AI software can be seen as an extension or a new kind of species within your organization. Just like any employee, AI can create, reason, and interact. However, it can also be attacked and manipulated, making monitoring and protection crucial. Imagine leaving your employees' computers without proper endpoint protection or your corporate network without a firewall. The same principle applies to AI software. Leaving your AI applications unsecured can lead to harmful consequences, affecting your users, data, and app integrity.

Lessons Learned from Protecting over 1,000 GenAI Apps

Over the past year, we at Pillar have been working tirelessly to build the most advanced security stack for teams building AI apps. We've gained invaluable insights from monitoring and safeguarding over 1,000 AI-powered applications in runtime. Here's what we've learned:

Language-Agnostic Attacks: Attacks can happen in every language, as long as the Large Language Model (LLM) was trained on different languages and can understand them.
Vulnerabilities at Every Stage: Attacks can occur at every stage where there's interaction with LLMs, including inputs, instructions, tool outputs, and model outputs.
Persistent Jailbreak Attempts: Attackers will attempt to jailbreak your app dozens of times due to the ease of use. Some even employ specialized tools to generate a large volume of attack variations.
Consequences of Successful Jailbreaks: Besides data leakage and manipulation, attackers who successfully jailbreak an app can abuse it for their own needs. Bypassing model restrictions grants access to powerful computing capabilities that can be exploited for various tasks.
Limitations of Hardening Prompts and Instructions: While hardening your system prompts and instructions can improve your app's resilience to some extent, the complex and unstructured nature of language makes it insufficient compared to having an external, model-independent security layer.
The Open-Source vs. Commercial Model Divide: There's a significant gap in resilience to attacks between open-source and commercial models, with the latter being much more secure.
The Importance of Session-Level Monitoring: A single prompt may not appear risky or malicious on its own. However, monitoring at the session level is crucial to understand the bigger picture and detect coordinated attacks.
The Offline Attack Vector: Blocking internet access to your app doesn't eliminate risk. Malicious instructions can be injected into input fields and files that are interpreted by your model.
The Ever-Evolving Threat Landscape: New attack methods are discovered every day. We've witnessed new techniques to bypass model restrictions on an almost daily basis, some of which are highly sophisticated and never seen before.
The Non-Deterministic Nature of LLMs: LLMs are non-deterministic software with numerous moving parts, such as models, system prompts, and user instructions. Their reasoning mechanism cannot be fully understood. An attack technique blocked yesterday may bypass your app restrictions today.

The Pillar Solution: All-in-One Security for AI Applications

Pillar is an all-in-one security platform tailored specifically for AI applications. Our platform provides complete visibility and granular control over your AI projects, enabling robust oversight and detailed logging of in-production activities. With advanced, model-agnostic security guardrails, Pillar defends your AI applications against potential threats and vulnerabilities during usage. Our system continuously runs automated stress tests and in-depth evaluations to proactively mitigate risks, ensuring your AI is robust, reliable, and compliant.

Take Action: Secure Your AI with Pillar

To experience firsthand how Pillar can enhance the security and resilience of your AI applications, reach out to us for a demo. Witness how, with just a few lines of code, we can expose vulnerabilities and implement robust protections to safeguard your AI apps.

‍

FAQs

Why is hardening system prompts not enough to protect LLM-based applications?

Hardening system prompts improves resilience to some attacks, but the complex and unstructured nature of language makes it insufficient as a standalone defense. An external, model-independent security layer is necessary because LLMs are non-deterministic — an attack technique blocked one day may successfully bypass restrictions the next.

How do attackers persist against AI applications even after initial jailbreak attempts fail?

Attackers will attempt to jailbreak an AI application dozens of times, exploiting how easy it is to iterate on attack variations. Some adversaries use specialized tools to generate high volumes of attack permutations automatically, making single-attempt defenses inadequate and session-level monitoring essential for detecting coordinated attack patterns.

Can an AI application be compromised even when its internet access is blocked?

Yes. Blocking internet access does not eliminate the attack surface. Malicious instructions can be injected through input fields and files that are processed by the model, meaning offline deployments remain vulnerable to prompt injection and indirect attack vectors that bypass network-level controls entirely.

Why does monitoring individual prompts miss security threats in GenAI applications?

A single prompt may appear benign in isolation, but coordinated attacks unfold across multiple interactions within a session. Session-level monitoring is critical to understanding the full context of user behavior, identifying multi-step manipulation attempts, and detecting threats that only become visible when interactions are analyzed together.

What is the security gap between open-source and commercial LLMs based on real-world deployment data?

Across more than 1,000 GenAI applications monitored in runtime, Pillar Security observed a significant resilience gap between open-source and commercial models, with commercial models demonstrating substantially stronger resistance to adversarial attacks and jailbreak attempts. This gap is a critical factor when selecting models for production AI deployments.