Blog
min read
If there’s one thing I’ve learned from a decade of building at the edge of tech, it’s how to spot the pattern of a true paradigm shift. I saw it when I founded a company building big data analytics as the discipline of data science was taking shape. I lived it again while building high-performance cloud infrastructure as the world moved to microservices and DevOps. In each wave, the way we built software changed fundamentally, creating new opportunities and new challenges.
But the revolution brought on by Generative AI is different. I co-founded an AI company in 2020 and experienced the generative AI wave from its epicenter, and this isn't just a change in tooling or even culture. it's a change in the very nature of the application itself, and the old rulebook for ensuring quality and security simply didn't apply anymore. I realized the most important work wasn't just in creating these amazing systems, but in solving the puzzle of how to build and run them safely, and monitor them as they engage with real data and real users.
Just a few years ago, a traditional AI development cycle began with the monumental task of gathering and meticulously labeling vast datasets. This process was slow, expensive, and often an insurmountable barrier.
LLM-native development completely supersedes this model. In my more recent roles, I’ve seen firsthand how a team can begin building a meaningful proof-of-concept in hours, not months, using powerful pre-trained models. The focus shifts from upfront data perfection to rapid, iterative experimentation with prompts, data, and system architecture.
Crucially, this also inverts the entire data strategy. Instead of requiring a perfect dataset from day one, one of the most valuable sources of data becomes the live application itself. We can now monitor and utilize the nuances of real-world user interactions and feedback. This means monitoring post-release behavior, and securing both the model and the data it ingests becomes a critical challenge. It also creates a fundamental tension, our greatest source of improvement, live user data, is also our most significant new vulnerability. Securing this pipeline isn't an afterthought, but a core challenge of the entire paradigm.
Static benchmarks like F1-scores or accuracy, the bedrock of traditional ML evaluation, are insufficient for the open-ended nature of LLMs. We can no longer measure a simple right or wrong answer. Instead, we must evaluate for more nuanced, qualitative traits: Is the output helpful? Is it coherent? Is it safe? Does it align with our brand's voice? This demands a new evaluation toolkit, one that relies on human-in-the-loop review, sophisticated red teaming, and even using secondary LLMs as "judges" to assess outputs at scale.
Until recently, deployment was a finish line. For LLM applications, it's the starting gun for the most critical phase. As a product leader, I know that moving from the lab to a live, adversarial environment is the only way to truly see how models behave. This isn't about monitoring for server uptime. It is about a new discipline of continuous output monitoring and security validation. We must be on constant alert for security failures that are not bugs in the code, but emergent behaviors of the model itself. The most common ones that kept me up at night include:
These weaknesses cannot be caught by a pre-deployment QA process because they are triggered by the infinite variability of live user input and the ever-changing nature of the model. The model’s value lies in its interaction with live users and access to data or web browsing, but so do the dangers. Your application's security posture is only as strong as your ability to maintain a strong security posture, and to monitor and react to its behavior in real-time.
In a traditional software stack, improving a core feature might take days. With an LLM-powered application, a developer can fundamentally alter the system's behavior by rewriting a few sentences in a system prompt. This creates a hyper-compressed feedback loop: hypothesize, rewrite, test, and deploy in minutes. This is an incredible competitive advantage, but it also underscores the need for robust versioning, testing, and governance around prompts, which have effectively become the new source code. This shift from a linear, predictable process to a live, circular one changes everything. It demands a new breed of builder who is part engineer, part data-scientist, and part security strategist.
After building these very systems and facing these challenges head-on, I realized that navigating this new lifecycle requires moving security from a pre-deployment gate to an integrated, continuous part of the live application.
It became clear to me that this is the most important problem to solve for our industry to move forward. The organizations that thrive will be those that embrace this new reality and build the tools and processes to monitor and secure their AI in real-time. That is Pillar’s mission, to allow organizations to build and run secure AI, and this is why I joined.
Subscribe and get the latest security updates
Back to blog