1026 CE. Give or take a decade. Leofric, Earl of Mercia, was taxing his people at a brutal rate. Even his wife’s continued demands for their relief fell on deaf ears. So when Leofric finally agreed to lift the taxes, but only if she rode naked through the marketplace at nearby Coventry, Lady Godgifu (anglicized – Godiva) bared all, let her hair down, decreed that none of the citizenry should watch, and forced him to keep his promise.
A thousand years later, the topology is strikingly familiar. Everywhere, not just in Coventry, people with little recourse feed their most sensitive thoughts—medical fears, legal questions, financial anxieties, private relationship troubles—to AI systems. The interfaces may look and feel private, but it’s increasingly evident that we’re all riding naked through the AI marketplace. The assurances of AI vendors that they won’t watch cover a lot less than perhaps the Lady’s hair did. Will those vendors take a principled stand to guarantee the privacy of their users? OpenAI’s recent actions indicate exactly the opposite, as they seemingly leap to take on the role of “peeping Tom”—a phrase that also originated in the Godiva story. For the second time, it seems that someone is going to have to ride through the marketplace, taking a stand for privacy.
How Bare Are We?
Starkers. Full Monty. Peeping Toms, breaches, subpoenas, and crooked insiders are serious threats, but not nearly the only ones in AI. A wave of recent scientific work shows that even the things we do to make AIs fast enough also leak user personal information in unexpected ways.
In Remote Timing Attacks on Efficient Language Model Inference, we see that a number of common optimizations introduce data-dependent timing signatures visible to anyone monitoring network traffic. Without decrypting a single byte, it’s possible to learn the topic of a user’s conversation with over 90% accuracy.
When Speculation Spills Secrets spotlights the commonplace technique of speculative decoding, and shows how it produces a telltale pattern that can fingerprint user queries with 75% or better accuracy. The Whisper Leak paper extends this analysis, achieving 100% accuracy in identifying sensitive conversation topics without even seeing data, while also recovering up to 20% of the full conversations.
Even just a little such leaked information can enable surprisingly effective LLM-powered de-anonymization attacks, leaving people and businesses open to surveillance and exploitation. You can’t hide much on horseback.
The Foundational Weakness, and an Architectural Solution
Every kind of attack described above exploits the same foundational weakness of AI: data must be decrypted before it can be processed, and that processing produces decrypted results.
Fully Homomorphic Encryption (FHE) addresses this weakness at its root. FHE mathematically secures computation by computing directly on encrypted queries, producing encrypted results that only the data owner can decrypt. The AI cloud never sees your plaintext input, never produces a plaintext intermediate or final result, and never holds a decryption key. This approach doesn’t just raise the cost of an attack, as Trusted Enclaves and other approaches claim to do—it eliminates entire categories of attack: no visibility, no side channels, no de-anonymization, and no pesky saddle sores.
The Power of Narrative
The story of Lady Godgifu endures because it captures something essential about the relationship between power, exposure, and protection. The townspeople of Coventry were subject to an extraction regime they didn’t choose and couldn’t refuse. Their privacy—well, their economic survival—depended on someone with the ability to make a meaningful difference.
Today’s AI users—individuals and businesses alike—are in a structurally identical position. We submit our most sensitive data to systems that are, as the research comprehensively demonstrates, vulnerable to surveillance and exfiltration at every layer: the network, the model, and the agent’s tool ecosystem.
The cure is architectural. If data never needs to be decrypted for processing, it cannot be inferred, observed, extracted, or exfiltrated in the clear. FHE is the technical embodiment of that principle, and hardware-accelerated FHE is the workhorse that makes the principle practical.
Ah, I see your horse has arrived. Ready to go for a less revealing ride?
Just When You Thought It Was Safe to Mount Up…
Everything discussed in this post assumes a simple interaction model: a user submits a query, a model returns a response. But that is an increasingly quaint picture of how AI actually works. Modern AI systems don’t just answer questions—they take actions. They write documents, conduct research, send messages, make reservations, execute code. Every one of those actions requires the system to observe its own intermediate outputs and decide what to do next. But that’s specifically what FHE cannot, by design, allow an AI system to do: make decisions. Protecting a single query-response pair, as we’ve discussed here, is the easy case. The hard case—and the one that matters—is protecting the entire chain of actions that constitutes a useful AI interaction. That is the subject of our next post.
Useful References
Roger of Wendover. Flores Historiarum. Circa 1235. Edited by Henry G. Hewlett. London: Longman & Co., 1886.
Carlini, Nicholas, and Milad Nasr. “Remote Timing Attacks on Efficient Language Model Inference.” arXiv preprint arXiv:2410.17175, 2024. Demonstrates topic classification (90%+ precision) and PII recovery via timing side channels on production LLM systems including ChatGPT and Claude.
Wei, Jiankun, Jian Liu, Haobin Xing, Yanbing Liu, and Yang Gao. “When Speculation Spills Secrets: Side Channels via Speculative Decoding in LLMs.” arXiv preprint arXiv:2411.01076, 2024. Achieves up to 100% query fingerprinting accuracy across four speculative decoding schemes and demonstrates confidential datastore leakage at 25+ tokens/sec.
McDonald, Geoff, and Microsoft Defender Security Research. “Whisper Leak: A Side-Channel Attack on Large Language Models.” arXiv preprint arXiv:2511.03675, 2025. Industry-wide audit of 28 LLMs showing near-perfect topic classification from encrypted traffic metadata, with 100% precision on sensitive topics at 10,000:1 class imbalance.
Lermen, Simon, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, and Florian Tramèr. “Large-Scale Online Deanonymization with LLMs.” arXiv preprint arXiv:2602.16800, 2026. Demonstrates LLM-based deanonymization of users across Hacker News, Reddit, LinkedIn, and anonymized transcripts, scaling to tens of thousands of candidates.
Carlini, Nicholas, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, and Alina Oprea. “Extracting Training Data from Large Language Models.” In Proceedings of the 30th USENIX Security Symposium, 2633–2650. USENIX Association, 2021. Foundational work on verbatim training data extraction from GPT-2.
Nasr, Milad, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Florian Tramèr, and Katherine Lee. “Scalable Extraction of Training Data from Aligned, Production Language Models.” In Proceedings of the Thirteenth International Conference on Learning Representations (ICLR). Singapore, 2025. Novel attacks that undo model alignment to recover thousands of training examples from production systems.
Wu, Xiaoyu, Yifei Pang, Terrance Liu, and Zhiwei Steven Wu. “Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM.” In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS). 2025. arXiv:2505.24379. Demonstrates that exact unlearning can paradoxically increase privacy leakage.
Wu, Hengyu, and Yang Cao. “Membership Inference Attacks on Large-Scale Models: A Survey.” arXiv preprint arXiv:2503.19338, 2025. First comprehensive review of MIAs targeting LLMs and LMMs across all pipeline stages.
Puerto, Haritz, Martin Gubri, Sangdoo Yun, and Seong Joon Oh. “Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models.” In Findings of the Association for Computational Linguistics: NAACL 2025, 4165–4182. Albuquerque, New Mexico: Association for Computational Linguistics, 2025. Demonstrates effective MIA at document and dataset scale.
Feng, Qizhang, Siva Rajesh Kasa, Santhosh Kumar Kasa, Hyokun Yun, Choon Hui Teo, and Sravan Babu Bodapati. “Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment.” In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 258:5221–5229. 2025. Shows DPO-aligned models are more vulnerable to MIA than PPO-aligned models.
Zhou, Zhanke, Jianing Zhu, Fengfei Yu, Xuan Li, Xiong Peng, Tongliang Liu, and Bo Han. “Model Inversion Attacks: A Survey of Approaches and Countermeasures.” arXiv preprint arXiv:2411.10023, 2024. Comprehensive survey of model inversion methods across images, text, and graph data.
OWASP Foundation. “OWASP Top 10 for LLM Applications 2025.” https://genai.owasp.org/. 2025. Ranks prompt injection as the #1 vulnerability in production AI systems.
Reddy, Pavan, and Aim Labs. “EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System.” CVE-2025-32711. arXiv preprint arXiv:2509.10540, 2025. Zero-click prompt injection and data exfiltration attack against Microsoft 365 Copilot, patched May 2025.
Cheng, Shuai, et al. “Effective PII Extraction from LLMs through Augmented Few-Shot Learning.” In Proceedings of the 34th USENIX Security Symposium. USENIX Association, 2025. Systematic PII extraction using few-shot prompts against production LLMs.
Leave a comment