March 2026

Your AI Prompts Are the Next Data Breach Category

There's a new kind of data breach emerging, and most organisations aren't ready for it.

Troy Hunt recently added the KomikoAI breach to Have I Been Pwned (HIBP). The AI-powered comic platform had 1 million unique email addresses exposed, with 22% already in HIBP. What made it notable wasn't the scale. It was the content. The breached data included users' AI prompts and their mapping back to email addresses.

This is the second breach Troy has processed containing AI prompts. The first was Muah.AI, where 1.9 million users had their prompts exposed. Some of that content was bad enough that Troy contacted law enforcement. It won't be the last.

Why Prompt Data Is Different

We've spent years learning to protect passwords, payment details, and personal data. We understand that a breached email address is bad; a breached password hash is worse; a breached credit card number triggers immediate action.

But prompts? Prompts are something new entirely.

When someone types a question into an AI system, they're often at their most candid. They ask things they wouldn't search for on Google. They share context about their business, their health, their relationships, their anxieties. They paste internal documents, code snippets, customer data. They do this because the interaction feels private. Like thinking out loud.

It isn't.

Every prompt you type into an AI-powered system is data. It's stored somewhere. It has a retention policy (or doesn't). It's linked to your identity (or could be). And if that system gets breached, your inner monologue becomes someone else's reading material.

From a privacy perspective, prompts are more sensitive than search history. A Google search for "symptoms of anxiety" is one data point. A series of prompts to an AI therapist bot is an entire narrative, with enough context and specificity that re-identification is trivial even without explicit user IDs.

The Real Problem: Identity Coupling

The technical failure that makes these breaches so damaging is the unnecessary coupling of user identity to prompt content.

In the KomikoAI breach, prompts were mapped directly back to users' email addresses. Those email addresses, in turn, map to multiple other sources of personally identifiable data across the internet.

This is an architecture decision, not an inevitability.

Yes, your system needs to know who's asking what. You need it for session management, billing, rate limiting, abuse detection. But there is a difference between operationally linking a prompt to a user during an active session and persistently storing that link in a way that survives a breach.

What Good Looks Like

The mitigations aren't novel. They're established data protection patterns applied to a new data category:

1. Tokenised session identifiers

Map prompts to ephemeral session tokens, not user accounts. Only resolve the token-to-user mapping when operationally necessary, behind separate access controls. If your prompt store is breached, attackers get anonymous conversations, not identifiable ones.

2. Aggressive retention policies

If you only need conversation context for the current session, flush it when the session ends. If you need it for model improvement, anonymise it first. GDPR Article 5(1)(e) already requires storage limitation: "kept in a form which permits identification of data subjects for no longer than is necessary." Most organisations storing prompts indefinitely are already non-compliant. They just haven't been challenged on it yet.

3. Decoupled storage architecture

Store prompts and user identity in separate datastores with independent access controls, encryption keys, and breach boundaries. A compromise of one system shouldn't automatically expose the other. This is the same principle we apply to payment card data. Why wouldn't we apply it to something more personal?

4. Pseudonymisation at rest

If you must maintain a link between prompts and users for analytics or compliance, use irreversible hashing rather than storing raw identifiers alongside prompt data. You can still aggregate usage patterns without making individual conversations attributable.

5. Minimal context retention

Not every prompt needs to be stored in full. For billing and rate limiting, you need metadata (timestamp, token count, model used), not content. For abuse detection, you might need content temporarily but can purge it after review windows close. Design retention around what you actually need, not what might be useful someday.

The Organisational Blind Spot

This isn't just a problem for AI startups building chatbots. It's a problem for every organisation that has integrated LLM capabilities into their products or internal tools.

If your engineering team has bolted an AI assistant onto your customer support platform, where are those prompts going? If your sales team is using an AI tool to draft proposals, who has access to the prompts that include prospect details and pricing? If your developers are pasting code into AI coding assistants, are those prompts being stored by a third party?

The attack surface here is twofold:

Your own systems - if you're running LLM features, you're storing prompt data. Is it properly segregated? Is retention appropriate? Would a breach of your prompt store be a reportable incident under your data protection obligations?
Third-party AI services - if your staff are using external AI tools, their prompts (potentially containing your proprietary data, customer information, or strategic plans) are stored on someone else's infrastructure, subject to someone else's security posture. The KomikoAI breach is a reminder of what that looks like when it goes wrong.

What To Do Now

If you're building with AI:

Audit your prompt storage. Where is it, how long is it kept, and what identity data is coupled to it?
Treat prompts as sensitive personal data. They are. Apply the same protections you'd apply to medical records or financial data.
Architect for breach. Assume your prompt store will be compromised and design so that a breach exposes as little as possible.
Review third-party AI tool usage. Understand where your organisation's prompts are going and what the provider's data handling commitments are.

The breach of AI prompt data is an emerging category, not an edge case. The organisations that take it seriously now will be the ones that aren't explaining themselves to the ICO later.

At ThreatControl, we help organisations understand and manage their AI security risk, including prompt data handling, AI tool governance, and AI security testing. Our Fractional CTO service helps you build the policies and architecture to get this right. Get in touch.

← Back to blog