Simon Willison's Weblog: lethal-trifecta

OpenAI Help: Lockdown Mode

2026-06-05T23:56:40+00:00

OpenAI first teased this in February, but now it's live and "rolling out to eligible personal accounts, including Free, Go, Plus, and Pro, and self-serve ChatGPT Business accounts":

Lockdown Mode is designed to help prevent the final stage of data exfiltration from a prompt injection attack by limiting outbound network requests that could transfer sensitive data to an attacker. Lockdown Mode does not prevent prompt injections from appearing in the content ChatGPT processes. For example, a prompt injection could appear in cached web content or in an uploaded file, and could still affect the behavior or accuracy of a response.

This looks really good to me.

The Lethal Trifecta occurs when an LLM system has access to all three of access to private data, exposure to untrusted content and a way to steal data and transmit it back to the attacker.

The only way to solve the trifecta is to cut off one of the three legs, and by far the easiest leg to restrict without making your LLM systems far less useful is the exfiltration vectors to steal data.

It looks to me like lockdown mode directly attacks that leg, using mechanisms that are deterministic and, crucially, are not evaluated by AI systems that themselves can be subverted by sufficiently devious attacks.

The existence of lockdown mode does however imply that ChatGPT, in its default settings, does not provide robust protection against sufficiently determined data exfiltration attacks!

Update: This tweet OpenAI CISO Dane Stuckey:

Lockdown mode is not meant for everyone. However, for folks who have an elevated risk profile - due to who they are, what they work on, or the types of data they work with - it's an excellent tool for further securing themselves. This has some tradeoffs on functionality and utility, but for these users, the tradeoff is worthwhile.

Tags: security, ai, openai, prompt-injection, llms, lethal-trifecta

Microsoft Copilot Cowork Exfiltrates Files

2026-05-26T15:36:48+00:00

Microsoft Copilot Cowork Exfiltrates Files

The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data.

In this case Microsoft Copilot Cowork (yes, that's a real product name) was allowing agents to send emails to the user's own inbox without approval... but those messages were then displayed in a way that could leak data to an attacker via rendered images:

Because these messages can contain external images that trigger network requests to external websites, data can be exfiltrated when a user opens a compromised message sent by the agent.

Since OneDrive can create pre-authenticated download links, a successful prompt injection could cause those links to be leaked, allowing files to be downloaded by the attacker.

Via Hacker News

Tags: microsoft, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, lethal-trifecta

My fireside chat about agentic engineering at the Pragmatic Summit

2026-03-14T18:19:38+00:00

I was a speaker last month at the Pragmatic Summit in San Francisco, where I participated in a fireside chat session about Agentic Engineering hosted by Eric Lui from Statsig.

The video is available on YouTube. Here are my highlights from the conversation.

Stages of AI adoption

We started by talking about the different phases a software developer goes through in adopting AI coding tools.

02:45

I feel like there are different stages of AI adoption as a programmer. You start off with you've got ChatGPT and you ask it questions and occasionally it helps you out. And then the big step is when you move to the coding agents that are writing code for you—initially writing bits of code and then there's that moment where the agent writes more code than you do, which is a big moment. And that for me happened only about maybe six months ago.

03:42

The new thing as of what, three weeks ago, is you don't read the code. If anyone saw StrongDM—they had a big thing come out last week where they talked about their software factory and their two principles were nobody writes any code, nobody reads any code, which is clear insanity. That is wildly irresponsible. They're a security company building security software, which is why it's worth paying close attention—like how could this possibly be working?

I talked about StrongDM more in How StrongDM's AI team build serious software without even looking at the code.

Trusting AI output

We discussed the challenge of knowing when to trust the AI's output as opposed to reviewing every line with a fine tooth-comb.