<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: anthropic</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/anthropic.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-04-16T20:37:12+00:00</updated><author><name>Simon Willison</name></author><entry><title>llm-anthropic 0.25</title><link href="https://simonwillison.net/2026/Apr/16/llm-anthropic/#atom-tag" rel="alternate"/><published>2026-04-16T20:37:12+00:00</published><updated>2026-04-16T20:37:12+00:00</updated><id>https://simonwillison.net/2026/Apr/16/llm-anthropic/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-anthropic/releases/tag/0.25"&gt;llm-anthropic 0.25&lt;/a&gt;&lt;/p&gt;
    &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New model: &lt;code&gt;claude-opus-4.7&lt;/code&gt;, which supports &lt;code&gt;thinking_effort&lt;/code&gt;: &lt;code&gt;xhigh&lt;/code&gt;. #66&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;thinking_display&lt;/code&gt; and &lt;code&gt;thinking_adaptive&lt;/code&gt; boolean options. &lt;code&gt;thinking_display&lt;/code&gt; summarized output is currently only available in JSON output or JSON logs.&lt;/li&gt;
&lt;li&gt;Increased default &lt;code&gt;max_tokens&lt;/code&gt; to the maximum allowed for each model.&lt;/li&gt;
&lt;li&gt;No longer uses obsolete &lt;code&gt;structured-outputs-2025-11-13&lt;/code&gt; beta header for older models.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="anthropic"/><category term="claude"/></entry><entry><title>Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7</title><link href="https://simonwillison.net/2026/Apr/16/qwen-beats-opus/#atom-tag" rel="alternate"/><published>2026-04-16T17:16:52+00:00</published><updated>2026-04-16T17:16:52+00:00</updated><id>https://simonwillison.net/2026/Apr/16/qwen-beats-opus/#atom-tag</id><summary type="html">
    &lt;p&gt;For anyone who has been (inadvisably) taking my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;pelican riding a bicycle benchmark&lt;/a&gt; seriously as a robust way to test models, here are pelicans from this morning's two big model releases - &lt;a href="https://qwen.ai/blog?id=qwen3.6-35b-a3b"&gt;Qwen3.6-35B-A3B from Alibaba&lt;/a&gt; and &lt;a href="https://www.anthropic.com/news/claude-opus-4-7"&gt;Claude Opus 4.7 from Anthropic&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the Qwen 3.6 pelican, generated using &lt;a href="https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/Qwen3.6-35B-A3B-UD-Q4_K_S.gguf"&gt;this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf&lt;/a&gt; quantized model by Unsloth, running on my MacBook Pro M5 via &lt;a href="https://lmstudio.ai/"&gt;LM Studio&lt;/a&gt; (and the &lt;a href="https://github.com/agustif/llm-lmstudio"&gt;llm-lmstudio&lt;/a&gt; plugin) - &lt;a href="https://gist.github.com/simonw/4389d355d8e162bc6e4547da214f7dd2"&gt;transcript here&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/Qwen3.6-35B-A3B-UD-Q4_K_S-pelican.png" alt="The bicycle frame is the correct shape. There are clouds in the sky. The pelican has a dorky looking pouch. A caption on the ground reads Pelican on a Bicycle!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And here's one I got from Anthropic's &lt;a href="https://www.anthropic.com/news/claude-opus-4-7"&gt;brand new Claude Opus 4.7&lt;/a&gt; (&lt;a href="https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c118"&gt;transcript&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican.png" alt="The bicycle frame is entirely the wrong shape. No clouds, a yellow sun. The pelican is looking behind itself, and has a less pronounced pouch than I would like." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!&lt;/p&gt;
&lt;p&gt;I tried Opus a second time passing &lt;code&gt;thinking_level: max&lt;/code&gt;. It didn't do much better (&lt;a href="https://gist.github.com/simonw/7566e04a81accfb9affda83451c0f363"&gt;transcript&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican-max.png" alt="The bicycle frame is entirely the wrong shape but in a different way. Lines are more bold. Pelican looks a bit more like a pelican." style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;h4 id="i-dont-think-qwen-are-cheating"&gt;I don't think Qwen are cheating&lt;/h4&gt;
&lt;p&gt;A lot of people are &lt;a href="https://simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/"&gt;convinced that the labs train for my stupid benchmark&lt;/a&gt;. I don't think they do, but honestly this result did give me a little glint of suspicion. So I'm burning one of my secret backup tests - here's what I got from Qwen3.6-35B-A3B and Opus 4.7 for "Generate an SVG of a flamingo riding a unicycle":&lt;/p&gt;

&lt;div style="display: flex; gap: 4px;"&gt;
  &lt;figure style="flex: 1; text-align: center; margin: 0;"&gt;
    &lt;figcaption style="margin-bottom: 1em"&gt;Qwen3.6-35B-A3B&lt;br /&gt;(&lt;a href="https://gist.github.com/simonw/f1d1ff01c34dda5fdedf684cfc430d92"&gt;transcript&lt;/a&gt;)&lt;/figcaption&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/qwen-flamingo.png" alt="The unicycle spokes are a too long. The pelican has sunglasses, a bowtie and appears to be smoking a cigarette. It has two heart emoji surrounding the caption Flamingo on a Unicycle. It has a lot of charisma." style="max-width: 100%; height: auto;" /&gt;
  &lt;/figure&gt;
  &lt;figure style="flex: 1; text-align: center; margin: 0;"&gt;
    &lt;figcaption style="margin-bottom: 1em"&gt;Opus 4.7&lt;br /&gt;(&lt;a href="https://gist.github.com/simonw/35121ad5dcf23bf860397a103ae88d50"&gt;transcript&lt;/a&gt;)&lt;/figcaption&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/opus-flamingo.png" alt="The unicycle has a black wheel. The flamingo is a competent if slightly dull vector illustration of a flamingo. It has no flair." style="max-width: 100%; height: auto;" /&gt;
  &lt;/figure&gt;
&lt;/div&gt;


&lt;p&gt;I'm giving this one to Qwen too, partly for the excellent &lt;code&gt;&amp;lt;!-- Sunglasses on flamingo! --&amp;gt;&lt;/code&gt; SVG comment.&lt;/p&gt;

&lt;h4 id="what-can-we-learn-from-this-"&gt;What can we learn from this?&lt;/h4&gt;
&lt;p&gt;The pelican benchmark has always been meant as a joke - it's mainly a statement on how obtuse and absurd the task of comparing these models is.&lt;/p&gt;
&lt;p&gt;The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those &lt;a href="https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/"&gt;first pelicans from October 2024&lt;/a&gt; were junk. The &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;more recent entries&lt;/a&gt; have generally been much, much better - to the point that Gemini 3.1 Pro produces &lt;a href="https://simonwillison.net/2026/Feb/19/gemini-31-pro/"&gt;illustrations you could actually use somewhere&lt;/a&gt;, provided you had a pressing need to illustrate a pelican riding a bicycle.&lt;/p&gt;
&lt;p&gt;Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic's latest proprietary release.&lt;/p&gt;
&lt;p&gt;If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lm-studio"&gt;lm-studio&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="qwen"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="lm-studio"/></entry><entry><title>Trusted access for the next era of cyber defense</title><link href="https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag" rel="alternate"/><published>2026-04-14T21:23:59+00:00</published><updated>2026-04-14T21:23:59+00:00</updated><id>https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/"&gt;Trusted access for the next era of cyber defense&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI's answer to &lt;a href="https://simonwillison.net/2026/Apr/7/project-glasswing/"&gt;Claude Mythos&lt;/a&gt; appears to be a new model called GPT-5.4-Cyber:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our models specifically to enable defensive cybersecurity use cases, starting today with a variant of GPT‑5.4 trained to be cyber-permissive: GPT‑5.4‑Cyber.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They're also extending a program they launched in February (which I had missed) called &lt;a href="https://openai.com/index/trusted-access-for-cyber/"&gt;Trusted Access for Cyber&lt;/a&gt;, where users can verify their identity (via a photo of a government-issued ID processed by &lt;a href="https://withpersona.com/"&gt;Persona&lt;/a&gt;) to gain "reduced friction" access to OpenAI's models for cybersecurity work.&lt;/p&gt;
&lt;p&gt;Honestly, this OpenAI announcement is difficult to follow. Unsurprisingly they don't mention Anthropic at all, but much of the piece emphasizes their many years of existing cybersecurity work and their goal to "democratize access" to these tools, hence the emphasis on that self-service verification flow from February.&lt;/p&gt;
&lt;p&gt;If you want access to their best security tools you still need to go through an extra Google Form application process though, which doesn't feel particularly different to me from Anthropic's &lt;a href="https://www.anthropic.com/glasswing"&gt;Project Glasswing&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47770770"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;&lt;/p&gt;



</summary><category term="security"/><category term="generative-ai"/><category term="ai-security-research"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="anthropic"/></entry><entry><title>Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me</title><link href="https://simonwillison.net/2026/Apr/7/project-glasswing/#atom-tag" rel="alternate"/><published>2026-04-07T20:52:54+00:00</published><updated>2026-04-07T20:52:54+00:00</updated><id>https://simonwillison.net/2026/Apr/7/project-glasswing/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic &lt;em&gt;didn't&lt;/em&gt; release their latest model, Claude Mythos (&lt;a href="https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf"&gt;system card PDF&lt;/a&gt;), today. They have instead made it available to a very restricted set of preview partners under their newly announced &lt;a href="https://www.anthropic.com/glasswing"&gt;Project Glasswing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The model is a general purpose model, similar to Claude Opus 4.6, but Anthropic claim that its cyber-security research abilities are strong enough that they need to give the software industry as a whole time to prepare.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Mythos Preview has already found thousands of high-severity vulnerabilities, including some in &lt;em&gt;every major operating system and web browser&lt;/em&gt;. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely.&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems—systems that represent a very large portion of the world’s shared cyberattack surface. We anticipate this work will focus on tasks like local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There's a great deal more technical detail in &lt;a href="https://red.anthropic.com/2026/mythos-preview/"&gt; Assessing Claude Mythos Preview’s cybersecurity capabilities&lt;/a&gt; on the Anthropic Red Team blog:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex &lt;a href="https://en.wikipedia.org/wiki/JIT_spraying "&gt;JIT heap spray&lt;/a&gt; that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Plus this comparison with Claude 4.6 Opus:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Saying "our model is too dangerous to release" is a great way to build buzz around a new model, but in this case I expect their caution is warranted.&lt;/p&gt;
&lt;p&gt;Just a few days (&lt;a href="https://simonwillison.net/2026/Apr/3/"&gt;last Friday&lt;/a&gt;) ago I started a new &lt;a href="https://simonwillison.net/tags/ai-security-research/"&gt;ai-security-research&lt;/a&gt; tag on this blog to acknowledge an uptick in credible security professionals pulling the alarm on how good modern LLMs have got at vulnerability research.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_kernel/"&gt;Greg Kroah-Hartman&lt;/a&gt; of the Linux kernel:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.&lt;/p&gt;
&lt;p&gt;Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://mastodon.social/@bagder/116336957584445742"&gt;Daniel Stenberg&lt;/a&gt; of &lt;code&gt;curl&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.&lt;/p&gt;
&lt;p&gt;I'm spending hours per day on this now. It's intense.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Thomas Ptacek published &lt;a href="https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/"&gt;Vulnerability Research Is Cooked&lt;/a&gt;, a post inspired by his &lt;a href="https://securitycryptographywhatever.com/2026/03/25/ai-bug-finding/"&gt;podcast conversation&lt;/a&gt; with Anthropic's Nicholas Carlini.&lt;/p&gt;
&lt;p&gt;Anthropic have a 5 minute &lt;a href="https://www.youtube.com/watch?v=INGOC6-LLv0"&gt;talking heads video&lt;/a&gt; describing the Glasswing project. Nicholas Carlini appears as one of those talking heads, where he said (highlights mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome. [...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I've found more bugs in the last couple of weeks than I found in the rest of my life combined&lt;/strong&gt;. We've used the model to scan a bunch of open source code, and the thing that we went for first was operating systems, because this is the code that underlies the entire internet infrastructure. &lt;strong&gt;For OpenBSD, we found a bug that's been present for 27 years, where I can send a couple of pieces of data to any OpenBSD server and crash it&lt;/strong&gt;. On Linux, we found a number of vulnerabilities where as a user with no permissions, I can elevate myself to the administrator by just running some binary on my machine. For each of these bugs, we told the maintainers who actually run the software about them, and they went and fixed them and have deployed the patches  patches so that anyone who runs the software is no longer vulnerable to these attacks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I found this on the &lt;a href="https://www.openbsd.org/errata78.html"&gt;OpenBSD 7.8 errata page&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;025: RELIABILITY FIX: March 25, 2026&lt;/strong&gt;  &lt;em&gt;All architectures&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;TCP packets with invalid SACK options could crash the kernel.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ftp.openbsd.org/pub/OpenBSD/patches/7.8/common/025_sack.patch.sig"&gt;A source code patch exists which remedies this problem.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I tracked that change down in the &lt;a href="https://github.com/openbsd/src"&gt;GitHub mirror&lt;/a&gt; of the OpenBSD CVS repo (apparently they still use CVS!) and found it &lt;a href="https://github.com/openbsd/src/blame/master/sys/netinet/tcp_input.c#L2461"&gt;using git blame&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/openbsd-27-years.jpg" alt="Screenshot of a Git blame view of C source code around line 2455 showing TCP SACK hole validation logic. Code includes checks using SEQ_GT, SEQ_LT macros on fields like th-&amp;gt;th_ack, tp-&amp;gt;snd_una, sack.start, sack.end, tp-&amp;gt;snd_max, and tp-&amp;gt;snd_holes. Most commits are from 25–27 years ago with messages like &amp;quot;more SACK hole validity testin...&amp;quot; and &amp;quot;knf&amp;quot;, while one recent commit from 3 weeks ago (&amp;quot;Ignore TCP SACK packets wit...&amp;quot;) is highlighted with an orange left border, adding a new guard &amp;quot;if (SEQ_LT(sack.start, tp-&amp;gt;snd_una)) continue;&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Sure enough, the surrounding code is from 27 years ago.&lt;/p&gt;
&lt;p&gt;I'm not sure which Linux vulnerability Nicholas was describing, but it may have been &lt;a href="https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5133b61aaf437e5f25b1b396b14242a6bb0508e2"&gt;this NFS one&lt;/a&gt; recently covered &lt;a href="https://mtlynch.io/claude-code-found-linux-vulnerability/"&gt;by Michael Lynch
&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's enough smoke here that I believe there's a fire. It's not surprising to find vulnerabilities in decades-old software, especially given that they're mostly written in C, but what's new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues.&lt;/p&gt;
&lt;p&gt;I actually thought to myself on Friday that this sounded like an industry-wide reckoning in the making, and that it might warrant a huge investment of time and money to get ahead of the inevitable barrage of vulnerabilities. Project Glasswing incorporates "$100M in usage credits ... as well as $4M in direct donations to open-source security organizations". Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation. It would be great to see OpenAI involved as well - GPT-5.4 already has a strong reputation for finding security vulnerabilities and they have stronger models on the near horizon.&lt;/p&gt;
&lt;p&gt;The bad news for those of us who are &lt;em&gt;not&lt;/em&gt; trusted partners is this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I can live with that. I think the security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="security"/><category term="thomas-ptacek"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="nicholas-carlini"/><category term="ai-ethics"/><category term="llm-release"/><category term="ai-security-research"/></entry><entry><title>Quoting A member of Anthropic’s alignment-science team</title><link href="https://simonwillison.net/2026/Mar/16/blackmail/#atom-tag" rel="alternate"/><published>2026-03-16T21:38:55+00:00</published><updated>2026-03-16T21:38:55+00:00</updated><id>https://simonwillison.net/2026/Mar/16/blackmail/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.newyorker.com/news/annals-of-inquiry/the-pentagon-went-to-war-with-anthropic-whats-really-at-stake?_sp=9a6e0ff7-2bfd-46f8-a9e1-3941ef2003b5.1773495048769"&gt;&lt;p&gt;The point of &lt;a href="https://simonwillison.net/2025/Jun/20/agentic-misalignment/"&gt;the blackmail exercise&lt;/a&gt; was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.newyorker.com/news/annals-of-inquiry/the-pentagon-went-to-war-with-anthropic-whats-really-at-stake?_sp=9a6e0ff7-2bfd-46f8-a9e1-3941ef2003b5.1773495048769"&gt;A member of Anthropic’s alignment-science team&lt;/a&gt;, as told to Gideon Lewis-Kraus&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="anthropic"/><category term="claude"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>1M context is now generally available for Opus 4.6 and Sonnet 4.6</title><link href="https://simonwillison.net/2026/Mar/13/1m-context/#atom-tag" rel="alternate"/><published>2026-03-13T18:29:13+00:00</published><updated>2026-03-13T18:29:13+00:00</updated><id>https://simonwillison.net/2026/Mar/13/1m-context/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://claude.com/blog/1m-context-ga"&gt;1M context is now generally available for Opus 4.6 and Sonnet 4.6&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's what surprised me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Standard pricing now applies across the full 1M window for both models, with no long-context premium.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI and Gemini both &lt;a href="https://www.llm-prices.com/#sel=gemini-3-1-pro-preview-200k%2Cgpt-5.4-272k%2Cgemini-3-1-pro-preview%2Cgpt-5.4"&gt;charge more&lt;/a&gt; for prompts where the token count goes above a certain point - 200,000 for Gemini 3.1 Pro and 272,000 for GPT-5.4.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/long-context"&gt;long-context&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="generative-ai"/><category term="long-context"/><category term="llm-pricing"/><category term="ai"/><category term="llms"/></entry><entry><title>Anthropic and the Pentagon</title><link href="https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag" rel="alternate"/><published>2026-03-06T17:26:50+00:00</published><updated>2026-03-06T17:26:50+00:00</updated><id>https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.schneier.com/blog/archives/2026/03/anthropic-and-the-pentagon.html"&gt;Anthropic and the Pentagon&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This piece by Bruce Schneier and Nathan E. Sanders is the most thoughtful and grounded coverage I've seen of the recent and ongoing Pentagon/OpenAI/Anthropic contract situation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI models are increasingly commodified. The top-tier offerings have about the same performance, and there is little to differentiate one from the other. The latest models from Anthropic, OpenAI and Google, in particular, tend to leapfrog each other with minor hops forward in quality every few months. [...]&lt;/p&gt;
&lt;p&gt;In this sort of market, branding matters a lot. Anthropic and its CEO, Dario Amodei, are positioning themselves as the moral and trustworthy AI provider. That has market value for both consumers and enterprise clients.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bruce-schneier"&gt;bruce-schneier&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="bruce-schneier"/><category term="anthropic"/><category term="generative-ai"/><category term="openai"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting Donald Knuth</title><link href="https://simonwillison.net/2026/Mar/3/donald-knuth/#atom-tag" rel="alternate"/><published>2026-03-03T23:59:04+00:00</published><updated>2026-03-03T23:59:04+00:00</updated><id>https://simonwillison.net/2026/Mar/3/donald-knuth/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf"&gt;&lt;p&gt;Shock! Shock! I learned yesterday that an open problem I'd been working on for several weeks had just been solved by Claude Opus 4.6 - Anthropic's hybrid reasoning model that had been released three weeks earlier! It seems that I'll have to revise my opinions about "generative AI" one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf"&gt;Donald Knuth&lt;/a&gt;, Claude's Cycles&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/donald-knuth"&gt;donald-knuth&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;&lt;/p&gt;



</summary><category term="november-2025-inflection"/><category term="claude"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="donald-knuth"/><category term="llm-reasoning"/><category term="anthropic"/></entry><entry><title>Quoting claude.com/import-memory</title><link href="https://simonwillison.net/2026/Mar/1/claude-import-memory/#atom-tag" rel="alternate"/><published>2026-03-01T11:21:45+00:00</published><updated>2026-03-01T11:21:45+00:00</updated><id>https://simonwillison.net/2026/Mar/1/claude-import-memory/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://claude.com/import-memory"&gt;&lt;p&gt;&lt;code&gt;I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://claude.com/import-memory"&gt;claude.com/import-memory&lt;/a&gt;, Anthropic's "import your memories to Claude" feature is a prompt&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-memory"&gt;llm-memory&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="prompt-engineering"/><category term="llm-memory"/><category term="anthropic"/><category term="claude"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Free Claude Max for (large project) open source maintainers</title><link href="https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/#atom-tag" rel="alternate"/><published>2026-02-27T18:08:22+00:00</published><updated>2026-02-27T18:08:22+00:00</updated><id>https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://claude.com/contact-sales/claude-for-oss"&gt;Free Claude Max for (large project) open source maintainers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic are now offering their $200/month Claude Max 20x plan for free to open source maintainers... for six months... and you have to meet the following criteria:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Maintainers:&lt;/strong&gt; You're a primary maintainer or core team member of a public repo with 5,000+ GitHub stars &lt;em&gt;or&lt;/em&gt; 1M+ monthly NPM downloads. You've made commits, releases, or PR reviews within the last 3 months.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don't quite fit the criteria&lt;/strong&gt; If you maintain something the ecosystem quietly depends on, apply anyway and tell us about it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Also in the small print: "Applications are reviewed on a rolling basis. We accept up to 10,000 contributors".

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47178371"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="anthropic"/><category term="claude"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Claude Code Remote Control</title><link href="https://simonwillison.net/2026/Feb/25/claude-code-remote-control/#atom-tag" rel="alternate"/><published>2026-02-25T17:33:24+00:00</published><updated>2026-02-25T17:33:24+00:00</updated><id>https://simonwillison.net/2026/Feb/25/claude-code-remote-control/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/remote-control"&gt;Claude Code Remote Control&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New Claude Code feature dropped yesterday: you can now run a "remote control" session on your computer and then use the Claude Code for web interfaces (on web, iOS and native desktop app) to send prompts to that session.&lt;/p&gt;
&lt;p&gt;It's a little bit janky right now. Initially when I tried it I got the error "Remote Control is not enabled for your account. Contact your administrator." (but I &lt;em&gt;am&lt;/em&gt; my administrator?) - then I logged out and back into the Claude Code terminal app and it started working:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;claude remote-control
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can only run one session on your machine at a time. If you upgrade the Claude iOS app it then shows up as "Remote Control Session (Mac)" in the Code tab.&lt;/p&gt;
&lt;p&gt;It appears not to support the &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; flag (I passed that to &lt;code&gt;claude remote-control&lt;/code&gt; and it didn't reject the option, but it also appeared to have no effect) - which means you have to approve every new action it takes.&lt;/p&gt;
&lt;p&gt;I also managed to get it to a state where every prompt I tried was met by an API 500 error.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img src="https://static.simonwillison.net/static/2026/vampire-remote.jpg" alt="Screenshot of a &amp;quot;Remote Control session&amp;quot; (Mac:dev:817b) chat interface. User message: &amp;quot;Play vampire by Olivia Rodrigo in music app&amp;quot;. Response shows an API Error: 500 {&amp;quot;type&amp;quot;:&amp;quot;error&amp;quot;,&amp;quot;error&amp;quot;:{&amp;quot;type&amp;quot;:&amp;quot;api_error&amp;quot;,&amp;quot;message&amp;quot;:&amp;quot;Internal server error&amp;quot;},&amp;quot;request_id&amp;quot;:&amp;quot;req_011CYVBLH9yt2ze2qehrX8nk&amp;quot;} with a &amp;quot;Try again&amp;quot; button. Below, the assistant responds: &amp;quot;I&amp;#39;ll play &amp;quot;Vampire&amp;quot; by Olivia Rodrigo in the Music app using AppleScript.&amp;quot; A Bash command panel is open showing an osascript command: osascript -e &amp;#39;tell application &amp;quot;Music&amp;quot; activate set searchResults to search playlist &amp;quot;Library&amp;quot; for &amp;quot;vampire Olivia Rodrigo&amp;quot; if (count of searchResults) &amp;gt; 0 then play item 1 of searchResults else return &amp;quot;Song not found in library&amp;quot; end if end tell&amp;#39;" style="max-width: 80%;" /&gt;&lt;/p&gt;

&lt;p&gt;Restarting the program on the machine also causes existing sessions to start returning mysterious API errors rather than neatly explaining that the session has terminated.&lt;/p&gt;
&lt;p&gt;I expect they'll iron out all of these issues relatively quickly. It's interesting to then contrast this to solutions like OpenClaw, where one of the big selling points is the ability to control your personal device from your phone.&lt;/p&gt;
&lt;p&gt;Claude Code still doesn't have a documented mechanism for running things on a schedule, which is the other killer feature of the Claw category of software.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I spoke too soon: also today Anthropic announced &lt;a href="https://support.claude.com/en/articles/13854387-schedule-recurring-tasks-in-cowork"&gt;Schedule recurring tasks in Cowork&lt;/a&gt;, Claude Code's &lt;a href="https://simonwillison.net/2026/Jan/12/claude-cowork/"&gt;general agent sibling&lt;/a&gt;. These do include an important limitation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scheduled tasks only run while your computer is awake and the Claude Desktop app is open. If your computer is asleep or the app is closed when a task is scheduled to run, Cowork will skip the task, then run it automatically once your computer wakes up or you open the desktop app again.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I really hope they're working on a Cowork Cloud product.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/claudeai/status/2026418433911603668"&gt;@claudeai&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openclaw"&gt;openclaw&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/applescript"&gt;applescript&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="ai"/><category term="claude-code"/><category term="llms"/><category term="coding-agents"/><category term="generative-ai"/><category term="openclaw"/><category term="applescript"/></entry><entry><title>The Claude C Compiler: What It Reveals About the Future of Software</title><link href="https://simonwillison.net/2026/Feb/22/ccc/#atom-tag" rel="alternate"/><published>2026-02-22T23:58:43+00:00</published><updated>2026-02-22T23:58:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/ccc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software"&gt;The Claude C Compiler: What It Reveals About the Future of Software&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
On February 5th Anthropic's Nicholas Carlini wrote about a project to use &lt;a href="https://www.anthropic.com/engineering/building-c-compiler"&gt;parallel Claudes to build a C compiler&lt;/a&gt; on top of the brand new Opus 4.6&lt;/p&gt;
&lt;p&gt;Chris Lattner (Swift, LLVM, Clang, Mojo) knows more about C compilers than most. He just published this review of the code.&lt;/p&gt;
&lt;p&gt;Some points that stood out to me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Good software depends on judgment, communication, and clear abstraction. AI has amplified this.&lt;/li&gt;
&lt;li&gt;AI coding is automation of implementation, so design and stewardship become more important.&lt;/li&gt;
&lt;li&gt;Manual rewrites and translation work are becoming AI-native tasks, automating a large category of engineering effort.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Chris is generally impressed with CCC (the Claude C Compiler):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Taken together, CCC looks less like an experimental research compiler and more like a competent textbook implementation, the sort of system a strong undergraduate team might build early in a project before years of refinement. That alone is remarkable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's a long way from being a production-ready compiler though:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Several design choices suggest optimization toward passing tests rather than building general abstractions like a human would. [...] These flaws are informative rather than surprising, suggesting that current AI systems excel at assembling known techniques and optimizing toward measurable success criteria, while struggling with the open-ended generalization required for production-quality systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The project also leads to deep open questions about how agentic engineering interacts with licensing and IP for both open source and proprietary code:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If AI systems trained on decades of publicly available code can reproduce familiar structures, patterns, and even specific implementations, where exactly is the boundary between learning and copying?&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/compilers"&gt;compilers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/c"&gt;c&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="compilers"/><category term="anthropic"/><category term="claude"/><category term="nicholas-carlini"/><category term="ai"/><category term="open-source"/><category term="coding-agents"/><category term="ai-assisted-programming"/><category term="c"/><category term="agentic-engineering"/></entry><entry><title>Quoting Thariq Shihipar</title><link href="https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-tag" rel="alternate"/><published>2026-02-20T07:13:19+00:00</published><updated>2026-02-20T07:13:19+00:00</updated><id>https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/trq212/status/2024574133011673516"&gt;&lt;p&gt;Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. [...]&lt;/p&gt;
&lt;p&gt;At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they're too low.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/trq212/status/2024574133011673516"&gt;Thariq Shihipar&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="prompt-engineering"/><category term="anthropic"/><category term="claude-code"/><category term="ai-agents"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>SWE-bench February 2026 leaderboard update</title><link href="https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag" rel="alternate"/><published>2026-02-19T04:48:47+00:00</published><updated>2026-02-19T04:48:47+00:00</updated><id>https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.swebench.com/"&gt;SWE-bench February 2026 leaderboard update&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of it against the current generation of models, which is notable because it's always good to see benchmark results like this that &lt;em&gt;weren't&lt;/em&gt; self-reported by the labs.&lt;/p&gt;
&lt;p&gt;The fresh results are for their "Bash Only" benchmark, which runs their &lt;a href="https://github.com/SWE-agent/mini-swe-agent"&gt;mini-swe-bench&lt;/a&gt; agent (~9,000 lines of Python, &lt;a href="https://github.com/SWE-agent/mini-swe-agent/blob/v2.2.1/src/minisweagent/config/benchmarks/swebench.yaml"&gt;here are the prompts&lt;/a&gt; they use) against the &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench"&gt;SWE-bench&lt;/a&gt; dataset of coding problems - 2,294 real-world examples pulled from 12 open source repos: &lt;a href="https://github.com/django/django"&gt;django/django&lt;/a&gt; (850), &lt;a href="https://github.com/sympy/sympy"&gt;sympy/sympy&lt;/a&gt; (386), &lt;a href="https://github.com/scikit-learn/scikit-learn"&gt;scikit-learn/scikit-learn&lt;/a&gt; (229), &lt;a href="https://github.com/sphinx-doc/sphinx"&gt;sphinx-doc/sphinx&lt;/a&gt; (187), &lt;a href="https://github.com/matplotlib/matplotlib"&gt;matplotlib/matplotlib&lt;/a&gt; (184), &lt;a href="https://github.com/pytest-dev/pytest"&gt;pytest-dev/pytest&lt;/a&gt; (119), &lt;a href="https://github.com/pydata/xarray"&gt;pydata/xarray&lt;/a&gt; (110), &lt;a href="https://github.com/astropy/astropy"&gt;astropy/astropy&lt;/a&gt; (95), &lt;a href="https://github.com/pylint-dev/pylint"&gt;pylint-dev/pylint&lt;/a&gt; (57), &lt;a href="https://github.com/psf/requests"&gt;psf/requests&lt;/a&gt; (44), &lt;a href="https://github.com/mwaskom/seaborn"&gt;mwaskom/seaborn&lt;/a&gt; (22), &lt;a href="https://github.com/pallets/flask"&gt;pallets/flask&lt;/a&gt; (11).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Correction&lt;/strong&gt;: &lt;em&gt;The Bash only benchmark runs against SWE-bench Verified, not original SWE-bench. Verified is a manually curated subset of 500 samples &lt;a href="https://openai.com/index/introducing-swe-bench-verified/"&gt;described here&lt;/a&gt;, funded by OpenAI. Here's &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified"&gt;SWE-bench Verified&lt;/a&gt; on Hugging Face - since it's just 2.1MB of Parquet it's easy to browse &lt;a href="https://lite.datasette.io/?parquet=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fprinceton-nlp%2FSWE-bench_Verified%2Fresolve%2Fmain%2Fdata%2Ftest-00000-of-00001.parquet#/data/test-00000-of-00001?_facet=repo"&gt;using Datasette Lite&lt;/a&gt;, which cuts those numbers down to django/django (231), sympy/sympy (75), sphinx-doc/sphinx (44), matplotlib/matplotlib (34), scikit-learn/scikit-learn (32), astropy/astropy (22), pydata/xarray (22), pytest-dev/pytest (19), pylint-dev/pylint (10), psf/requests (8), mwaskom/seaborn (2), pallets/flask (1)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Here's how the top ten models performed:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart showing &amp;quot;% Resolved&amp;quot; by &amp;quot;Model&amp;quot;. Bars in descending order: Claude 4.5 Opus (high reasoning) 76.8%, Gemini 3 Flash (high reasoning) 75.8%, MiniMax M2.5 (high reasoning) 75.8%, Claude Opus 4.6 75.6%, GLM-5 (high reasoning) 72.8%, GPT-5.2 (high reasoning) 72.8%, Claude 4.5 Sonnet (high reasoning) 72.8%, Kimi K2.5 (high reasoning) 71.4%, DeepSeek V3.2 (high reasoning) 70.8%, Claude 4.5 Haiku (high reasoning) 70.0%, and a partially visible final bar at 66.6%." src="https://static.simonwillison.net/static/2026/swbench-feb-2026.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;It's interesting to see Claude Opus 4.5 beat Opus 4.6, though only by about a percentage point. 4.5 Opus is top, then Gemini 3 Flash, then MiniMax M2.5 - a 229B model released &lt;a href="https://www.minimax.io/news/minimax-m25"&gt;last week&lt;/a&gt; by Chinese lab MiniMax. GLM-5, Kimi K2.5 and DeepSeek V3.2 are three more Chinese models that make the top ten as well.&lt;/p&gt;
&lt;p&gt;OpenAI's GPT-5.2 is their highest performing model at position 6, but it's worth noting that their best coding model, GPT-5.3-Codex, is not represented - maybe because it's not yet available in the OpenAI API.&lt;/p&gt;
&lt;p&gt;This benchmark uses the same system prompt for every model, which is important for a fair comparison but does mean that the quality of the different harnesses or optimized prompts is not being measured here.&lt;/p&gt;
&lt;p&gt;The chart above is a screenshot from the SWE-bench website, but their charts don't include the actual percentage values visible on the bars. I successfully used Claude for Chrome to add these - &lt;a href="https://claude.ai/share/81a0c519-c727-4caa-b0d4-0d866375d0da"&gt;transcript here&lt;/a&gt;. My prompt sequence included:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Use claude in chrome to open https://www.swebench.com/&lt;/p&gt;
&lt;p&gt;Click on "Compare results" and then select "Select top 10"&lt;/p&gt;
&lt;p&gt;See those bar charts? I want them to display the percentage on each bar so I can take a better screenshot, modify the page like that&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm impressed at how well this worked - Claude injected custom JavaScript into the page to draw additional labels on top of the existing chart.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Claude AI conversation showing browser automation. A thinking step reads &amp;quot;Pivoted strategy to avoid recursion issues with chart labeling &amp;gt;&amp;quot; followed by the message &amp;quot;Good, the chart is back. Now let me carefully add the labels using an inline plugin on the chart instance to avoid the recursion issue.&amp;quot; A collapsed &amp;quot;Browser_evaluate&amp;quot; section shows a browser_evaluate tool call with JavaScript code using Chart.js canvas context to draw percentage labels on bars: meta.data.forEach((bar, index) =&amp;gt; { const value = dataset.data[index]; if (value !== undefined &amp;amp;&amp;amp; value !== null) { ctx.save(); ctx.textAlign = 'center'; ctx.textBaseline = 'bottom'; ctx.fillStyle = '#333'; ctx.font = 'bold 12px sans-serif'; ctx.fillText(value.toFixed(1) + '%', bar.x, bar.y - 5); A pending step reads &amp;quot;Let me take a screenshot to see if it worked.&amp;quot; followed by a completed &amp;quot;Done&amp;quot; step, and the message &amp;quot;Let me take a screenshot to check the result.&amp;quot;" src="https://static.simonwillison.net/static/2026/claude-chrome-draw-on-chart.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: If you look at the transcript Claude claims to have switched to Playwright, which is confusing because I didn't think I had that configured.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/KLieret/status/2024176335782826336"&gt;@KLieret&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browser-agents"&gt;browser-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/benchmarks"&gt;benchmarks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/minimax"&gt;minimax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;&lt;/p&gt;



</summary><category term="browser-agents"/><category term="anthropic"/><category term="claude"/><category term="openai"/><category term="benchmarks"/><category term="ai"/><category term="ai-in-china"/><category term="llms"/><category term="minimax"/><category term="coding-agents"/><category term="generative-ai"/><category term="django"/></entry><entry><title>Introducing Claude Sonnet 4.6</title><link href="https://simonwillison.net/2026/Feb/17/claude-sonnet-46/#atom-tag" rel="alternate"/><published>2026-02-17T23:58:58+00:00</published><updated>2026-02-17T23:58:58+00:00</updated><id>https://simonwillison.net/2026/Feb/17/claude-sonnet-46/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.anthropic.com/news/claude-sonnet-4-6"&gt;Introducing Claude Sonnet 4.6&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Sonnet 4.6 is out today, and Anthropic claim it offers similar performance to &lt;a href="https://simonwillison.net/2025/Nov/24/claude-opus/"&gt;November's Opus 4.5&lt;/a&gt; while maintaining the Sonnet pricing of $3/million input and $15/million output tokens (the Opus models are $5/$25). Here's &lt;a href="https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7a0b4484f84.pdf"&gt;the system card PDF&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sonnet 4.6 has a "reliable knowledge cutoff" of August 2025, compared to Opus 4.6's May 2025 and Haiku 4.5's February 2025. Both Opus and Sonnet default to 200,000 max input tokens but can stretch to 1 million in beta and at a higher cost.&lt;/p&gt;
&lt;p&gt;I just released &lt;a href="https://github.com/simonw/llm-anthropic/releases/tag/0.24"&gt;llm-anthropic 0.24&lt;/a&gt; with support for both Sonnet 4.6 and Opus 4.6. Claude Code &lt;a href="https://github.com/simonw/llm-anthropic/pull/65"&gt;did most of the work&lt;/a&gt; - the new models had a fiddly amount of extra details around adaptive thinking and no longer supporting prefixes, as described &lt;a href="https://platform.claude.com/docs/en/about-claude/models/migration-guide"&gt;in Anthropic's migration guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/b185576a95e9321b441f0a4dfc0e297c"&gt;what I got&lt;/a&gt; from:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx --with llm-anthropic llm 'Generate an SVG of a pelican riding a bicycle' -m claude-sonnet-4.6
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="The pelican has a jaunty top hat with a red band. There is a string between the upper and lower beaks for some reason. The bicycle frame is warped in the wrong way." src="https://static.simonwillison.net/static/2026/pelican-sonnet-4.6.png" /&gt;&lt;/p&gt;
&lt;p&gt;The SVG comments include:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!-- Hat (fun accessory) --&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tried a second time and also got a top hat. Sonnet 4.6 apparently loves top hats!&lt;/p&gt;
&lt;p&gt;For comparison, here's the pelican Opus 4.5 drew me &lt;a href="(https://simonwillison.net/2025/Nov/24/claude-opus/)"&gt;in November&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The pelican is cute and looks pretty good. The bicycle is not great - the frame is wrong and the pelican is facing backwards when the handlebars appear to be forwards.There is also something that looks a bit like an egg on the handlebars." src="https://static.simonwillison.net/static/2025/claude-opus-4.5-pelican.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;And here's Anthropic's current best pelican, drawn by Opus 4.6 &lt;a href="https://simonwillison.net/2026/Feb/5/two-new-models/"&gt;on February 5th&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Slightly wonky bicycle frame but an excellent pelican, very clear beak and pouch, nice feathers." src="https://static.simonwillison.net/static/2026/opus-4.6-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;Opus 4.6 produces the best pelican beak/pouch. I do think the top hat from Sonnet 4.6 is a nice touch though.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47050488"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm"/><category term="anthropic"/><category term="claude"/><category term="llm-pricing"/><category term="ai"/><category term="llms"/><category term="llm-release"/><category term="generative-ai"/><category term="pelican-riding-a-bicycle"/><category term="claude-code"/></entry><entry><title>llm-anthropic 0.24</title><link href="https://simonwillison.net/2026/Feb/17/llm-anthropic/#atom-tag" rel="alternate"/><published>2026-02-17T23:51:23+00:00</published><updated>2026-02-17T23:51:23+00:00</updated><id>https://simonwillison.net/2026/Feb/17/llm-anthropic/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-anthropic/releases/tag/0.24"&gt;llm-anthropic 0.24&lt;/a&gt;&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="claude"/><category term="anthropic"/></entry><entry><title>Increase web search accuracy and efficiency with dynamic filtering</title><link href="https://simonwillison.net/2026/Feb/17/dynamic-filtering/#atom-tag" rel="alternate"/><published>2026-02-17T16:38:49+00:00</published><updated>2026-02-17T16:38:49+00:00</updated><id>https://simonwillison.net/2026/Feb/17/dynamic-filtering/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://claude.com/blog/improved-web-search-with-dynamic-filtering"&gt;Increase web search accuracy and efficiency with dynamic filtering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Interesting new feature in the Claude API - yet more evidence that code execution really is the ultimate swiss army knife for improving the way LLMs work with data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Alongside Claude &lt;a href="https://www.anthropic.com/news/claude-opus-4-6"&gt;Opus 4.6&lt;/a&gt; and &lt;a href="https://www.anthropic.com/news/claude-sonnet-4-6"&gt;Sonnet 4.6&lt;/a&gt;, we're releasing new versions of our &lt;a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool"&gt;web search&lt;/a&gt; and &lt;a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-fetch-tool"&gt;web fetch&lt;/a&gt; tools. Claude can now natively write and execute code during web searches to filter results before they reach the context window, improving its accuracy and token efficiency. [...]&lt;/p&gt;
&lt;p&gt;To improve Claude’s performance on web searches, our web search and web fetch tools now automatically write and execute code to post-process query results. Instead of reasoning over full HTML files, Claude can dynamically filter the search results before loading them into context, keeping only what’s relevant and discarding the rest.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;(Draft post I forgot to publish until March 26th!)&lt;/em&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="generative-ai"/><category term="llm-tool-use"/><category term="ai"/><category term="llms"/></entry><entry><title>Rodney and Claude Code for Desktop</title><link href="https://simonwillison.net/2026/Feb/16/rodney-claude-code/#atom-tag" rel="alternate"/><published>2026-02-16T16:38:57+00:00</published><updated>2026-02-16T16:38:57+00:00</updated><id>https://simonwillison.net/2026/Feb/16/rodney-claude-code/#atom-tag</id><summary type="html">
    &lt;p&gt;I'm a very heavy user of &lt;a href="https://code.claude.com/docs/en/claude-code-on-the-web"&gt;Claude Code on the web&lt;/a&gt;, Anthropic's excellent but poorly named cloud version of Claude Code where everything runs in a container environment managed by them, greatly reducing the risk of anything bad happening to a computer I care about.&lt;/p&gt;
&lt;p&gt;I don't use the web interface at all (hence my dislike of the name) - I access it exclusively through their native iPhone and Mac desktop apps.&lt;/p&gt;
&lt;p&gt;Something I particularly appreciate about the desktop app is that it lets you see images that Claude is "viewing" via its &lt;code&gt;Read /path/to/image&lt;/code&gt; tool. Here's what that looks like:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Claude Code session in Claude Desktop. Claude says: The debug page looks good - all items listed with titles and descriptions. Now let me check the nav
menu -  Analyzed menu image file - Bash uvx rodney open &amp;quot;http://localhost:8765/&amp;quot; 2&amp;gt;&amp;amp;1 &amp;amp;&amp;amp; uvx rodney click &amp;quot;details.nav-menu summary&amp;quot; 2&amp;gt;&amp;amp;1 &amp;amp;% sleep 0.5 &amp;amp;&amp;amp; uvx rodney screenshot /tmp/menu.png 2&amp;gt;&amp;amp;1 Output reads: Datasette: test, Clicked, /tmp/menu.png - then it says Read /tmp/menu.png and reveals a screenshot of the Datasette interface with the nav menu open, showing only &amp;quot;Debug&amp;quot; and &amp;quot;Log out&amp;quot; options. Claude continues: The menu now has just &amp;quot;Debug&amp;quot; and “Log out&amp;quot; — much cleaner. Both pages look good. Let me clean up the server and run the remaining tests." src="https://static.simonwillison.net/static/2026/rodney-claude-desktop.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;This means you can get a visual preview of what it's working on while it's working, without waiting for it to push code to GitHub for you to try out yourself later on.&lt;/p&gt;
&lt;p&gt;The prompt I used to trigger the above screenshot was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Run "uvx rodney --help" and then use Rodney to manually test the new pages and menu - look at screenshots from it and check you think they look OK&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I designed &lt;a href="https://simonwillison.net/2026/Feb/10/showboat-and-rodney/#rodney-cli-browser-automation-designed-to-work-with-showboat"&gt;Rodney&lt;/a&gt; to have &lt;a href="https://github.com/simonw/rodney/blob/main/help.txt"&gt;--help output&lt;/a&gt; that provides everything a coding agent needs to know in order to use the tool.&lt;/p&gt;
&lt;p&gt;The Claude iPhone app doesn't display opened images yet, so I &lt;a href="https://twitter.com/simonw/status/2023432616066879606"&gt;requested it as a feature&lt;/a&gt; just now in a thread on Twitter.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rodney"&gt;rodney&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="ai"/><category term="claude-code"/><category term="llms"/><category term="async-coding-agents"/><category term="coding-agents"/><category term="generative-ai"/><category term="projects"/><category term="ai-assisted-programming"/><category term="rodney"/></entry><entry><title>Quoting Boris Cherny</title><link href="https://simonwillison.net/2026/Feb/14/boris/#atom-tag" rel="alternate"/><published>2026-02-14T23:59:09+00:00</published><updated>2026-02-14T23:59:09+00:00</updated><id>https://simonwillison.net/2026/Feb/14/boris/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/bcherny/status/2022762422302576970"&gt;&lt;p&gt;Someone has to prompt the Claudes, talk to customers, coordinate with other teams, decide what to build next. Engineering is changing and great engineers are more important than ever.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/bcherny/status/2022762422302576970"&gt;Boris Cherny&lt;/a&gt;, Claude Code creator, on why Anthropic are still hiring developers&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="careers"/><category term="anthropic"/><category term="ai"/><category term="claude-code"/><category term="llms"/><category term="coding-agents"/><category term="ai-assisted-programming"/><category term="generative-ai"/></entry><entry><title>Anthropic's public benefit mission</title><link href="https://simonwillison.net/2026/Feb/13/anthropic-public-benefit-mission/#atom-tag" rel="alternate"/><published>2026-02-13T23:59:51+00:00</published><updated>2026-02-13T23:59:51+00:00</updated><id>https://simonwillison.net/2026/Feb/13/anthropic-public-benefit-mission/#atom-tag</id><summary type="html">
    &lt;p&gt;Someone &lt;a href="https://news.ycombinator.com/item?id=47008560#47008978"&gt;asked&lt;/a&gt; if there was an Anthropic equivalent to &lt;a href="https://simonwillison.net/2026/Feb/13/openai-mission-statement/"&gt;OpenAI's IRS mission statements over time&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Anthropic are a "public benefit corporation" but not a non-profit, so they don't have the same requirements to file public documents with the IRS every year.&lt;/p&gt;
&lt;p&gt;But when I asked Claude it ran a search and dug up this &lt;a href="https://drive.google.com/drive/folders/1ImqXYv9_H2FTNAujZfu3EPtYFD4xIlHJ"&gt;Google Drive folder&lt;/a&gt; where Zach Stein-Perlman shared Certificate of Incorporation documents he &lt;a href="https://ailabwatch.substack.com/p/anthropics-certificate-of-incorporation"&gt;obtained from the State of Delaware&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Anthropic's are much less interesting that OpenAI's. The earliest document from 2021 states:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The specific public benefit that the Corporation will promote is to responsibly develop and maintain advanced Al for the cultural, social and technological improvement of humanity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Every subsequent document up to 2024 uses an updated version which says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The specific public benefit that the Corporation will promote is to responsibly develop and maintain advanced AI for the long term benefit of humanity.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="anthropic"/><category term="ai"/></entry><entry><title>Quoting Anthropic</title><link href="https://simonwillison.net/2026/Feb/12/anthropic/#atom-tag" rel="alternate"/><published>2026-02-12T20:22:14+00:00</published><updated>2026-02-12T20:22:14+00:00</updated><id>https://simonwillison.net/2026/Feb/12/anthropic/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation"&gt;&lt;p&gt;Claude Code was made available to the general public in May 2025. Today, Claude Code’s run-rate revenue has grown to over $2.5 billion; this figure has more than doubled since the beginning of 2026. The number of weekly active Claude Code users has also doubled since January 1 [&lt;em&gt;six weeks ago&lt;/em&gt;].&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation"&gt;Anthropic&lt;/a&gt;, announcing their $30 billion series G&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="coding-agents"/><category term="anthropic"/><category term="claude-code"/><category term="ai-agents"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Covering electricity price increases from our data centers</title><link href="https://simonwillison.net/2026/Feb/12/covering-electricity-price-increases/#atom-tag" rel="alternate"/><published>2026-02-12T20:01:23+00:00</published><updated>2026-02-12T20:01:23+00:00</updated><id>https://simonwillison.net/2026/Feb/12/covering-electricity-price-increases/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.anthropic.com/news/covering-electricity-price-increases"&gt;Covering electricity price increases from our data centers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
One of the sub-threads of the AI energy usage discourse has been the impact new data centers have on the cost of electricity to nearby residents. Here's &lt;a href="https://www.bloomberg.com/graphics/2025-ai-data-centers-electricity-prices/"&gt;detailed analysis from Bloomberg in September&lt;/a&gt; reporting "Wholesale electricity costs as much as 267% more than it did five years ago in areas near data centers".&lt;/p&gt;
&lt;p&gt;Anthropic appear to be taking on this aspect of the problem directly, promising to cover 100% of necessary grid upgrade costs and also saying:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We will work to bring net-new power generation online to match our data centers’ electricity needs. Where new generation isn’t online, we’ll work with utilities and external experts to estimate and cover demand-driven price effects from our data centers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I look forward to genuine energy industry experts picking this apart to judge if it will actually have the claimed impact on consumers.&lt;/p&gt;
&lt;p&gt;As always, I remain frustrated at the refusal of the major AI labs to fully quantify their energy usage. The best data we've had on this still comes from Mistral's report &lt;a href="https://simonwillison.net/2025/Jul/22/mistral-environmental-standard/"&gt;last July&lt;/a&gt; and even that lacked key data such as the breakdown between energy usage for training vs inference.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://x.com/anthropicai/status/2021694494215901314"&gt;@anthropicai&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-energy-usage"&gt;ai-energy-usage&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="ai-energy-usage"/><category term="anthropic"/><category term="ai"/></entry><entry><title>Quoting Thomas Ptacek</title><link href="https://simonwillison.net/2026/Feb/8/thomas-ptacek/#atom-tag" rel="alternate"/><published>2026-02-08T02:25:53+00:00</published><updated>2026-02-08T02:25:53+00:00</updated><id>https://simonwillison.net/2026/Feb/8/thomas-ptacek/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/tqbf/status/2019493645888462993"&gt;&lt;p&gt;People on the orange site are laughing at this, assuming it's just an ad and that there's nothing to it. Vulnerability researchers I talk to do not think this is a joke. As an erstwhile vuln researcher myself: do not bet against LLMs on this.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.axios.com/2026/02/05/anthropic-claude-opus-46-software-hunting"&gt;Axios: Anthropic's Claude Opus 4.6 uncovers 500 zero-day flaws in open-source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think vulnerability research might be THE MOST LLM-amenable software engineering problem. Pattern-driven. Huge corpus of operational public patterns. Closed loops. Forward progress from stimulus/response tooling. Search problems.&lt;/p&gt;
&lt;p&gt;Vulnerability research outcomes are in THE MODEL CARDS for frontier labs. Those companies have so much money they're literally distorting the economy. Money buys vuln research outcomes. Why would you think they were faking any of this?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/tqbf/status/2019493645888462993"&gt;Thomas Ptacek&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;



</summary><category term="thomas-ptacek"/><category term="anthropic"/><category term="claude"/><category term="security"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="open-source"/><category term="ai-security-research"/></entry><entry><title>Claude: Speed up responses with fast mode</title><link href="https://simonwillison.net/2026/Feb/7/claude-fast-mode/#atom-tag" rel="alternate"/><published>2026-02-07T23:10:33+00:00</published><updated>2026-02-07T23:10:33+00:00</updated><id>https://simonwillison.net/2026/Feb/7/claude-fast-mode/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/fast-mode"&gt;Claude: Speed up responses with fast mode&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New "research preview" from Anthropic today: you can now access a faster version of their frontier model Claude Opus 4.6 by typing &lt;code&gt;/fast&lt;/code&gt; in Claude Code... but at a cost that's 6x the normal price.&lt;/p&gt;
&lt;p&gt;Opus is usually $5/million input and $25/million output. The new fast mode is $30/million input and $150/million output!&lt;/p&gt;
&lt;p&gt;There's a 50% discount until the end of February 16th, so only a 3x multiple (!) before then.&lt;/p&gt;
&lt;p&gt;How much faster is it? The linked documentation doesn't say, but &lt;a href="https://x.com/claudeai/status/2020207322124132504"&gt;on Twitter&lt;/a&gt; Claude say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our teams have been building with a 2.5x-faster version of Claude Opus 4.6.&lt;/p&gt;
&lt;p&gt;We’re now making it available as an early experiment via Claude Code and our API.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Claude Opus 4.5 had a context limit of 200,000 tokens. 4.6 has an option to increase that to 1,000,000 at 2x the input price ($10/m) and 1.5x the output price ($37.50/m) once your input exceeds 200,000 tokens. These multiples hold for fast mode too, so after Feb 16th you'll be able to pay a hefty $60/m input and $225/m output for Anthropic's fastest best model.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm-performance"/><category term="anthropic"/><category term="claude"/><category term="claude-code"/><category term="llm-pricing"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Opus 4.6 and Codex 5.3</title><link href="https://simonwillison.net/2026/Feb/5/two-new-models/#atom-tag" rel="alternate"/><published>2026-02-05T20:29:20+00:00</published><updated>2026-02-05T20:29:20+00:00</updated><id>https://simonwillison.net/2026/Feb/5/two-new-models/#atom-tag</id><summary type="html">
    &lt;p&gt;Two major new model releases today, within about 15 minutes of each other.&lt;/p&gt;
&lt;p&gt;Anthropic &lt;a href="https://www.anthropic.com/news/claude-opus-4-6"&gt;released Opus 4.6&lt;/a&gt;. Here's &lt;a href="https://gist.github.com/simonw/a6806ce41b4c721e240a4548ecdbe216"&gt;its pelican&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Slightly wonky bicycle frame but an excellent pelican, very clear beak and pouch, nice feathers." src="https://static.simonwillison.net/static/2026/opus-4.6-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;OpenAI &lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/"&gt;release GPT-5.3-Codex&lt;/a&gt;, albeit only via their Codex app, not yet in their API. Here's &lt;a href="https://gist.github.com/simonw/bfc4a83f588ac762c773679c0d1e034b"&gt;its pelican&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Not nearly as good - the bicycle is a bit mangled, the pelican not nearly as well rendered - it's more of a line drawing." src="https://static.simonwillison.net/static/2026/codex-5.3-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;I've had a bit of preview access to both of these models and to be honest I'm finding it hard to find a good angle to write about them - they're both &lt;em&gt;really good&lt;/em&gt;, but so were their predecessors Codex 5.2 and Opus 4.5. I've been having trouble finding tasks that those previous models couldn't handle but the new ones are able to ace.&lt;/p&gt;
&lt;p&gt;The most convincing story about capabilities of the new model so far is Nicholas Carlini from Anthropic talking about Opus 4.6 and &lt;a href="https://www.anthropic.com/engineering/building-c-compiler"&gt;Building a C compiler with a team of parallel Claudes&lt;/a&gt; - Anthropic's version of Cursor's &lt;a href="https://simonwillison.net/2026/Jan/23/fastrender/"&gt;FastRender project&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/c"&gt;c&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm-release"/><category term="anthropic"/><category term="generative-ai"/><category term="openai"/><category term="pelican-riding-a-bicycle"/><category term="ai"/><category term="llms"/><category term="parallel-agents"/><category term="c"/><category term="nicholas-carlini"/></entry><entry><title>Claude's new constitution</title><link href="https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-tag" rel="alternate"/><published>2026-01-21T23:39:49+00:00</published><updated>2026-01-21T23:39:49+00:00</updated><id>https://simonwillison.net/2026/Jan/21/claudes-new-constitution/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.anthropic.com/news/claude-new-constitution"&gt;Claude&amp;#x27;s new constitution&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Late last year Richard Weiss &lt;a href="https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document"&gt;found something interesting&lt;/a&gt; while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was &lt;em&gt;not&lt;/em&gt; part of the system prompt but appeared instead to be baked in during training, and which described Claude's core values at great length.&lt;/p&gt;
&lt;p&gt;He called this leak the &lt;strong&gt;soul document&lt;/strong&gt;, and Amanda Askell from Anthropic &lt;a href="https://simonwillison.net/2025/Dec/2/claude-soul-document/"&gt;quickly confirmed&lt;/a&gt; that it was indeed part of Claude's training procedures.&lt;/p&gt;
&lt;p&gt;Today Anthropic made this official, &lt;a href="https://www.anthropic.com/news/claude-new-constitution"&gt;releasing that full "constitution" document&lt;/a&gt; under a CC0 (effectively public domain) license. There's a lot to absorb! It's over 35,000 tokens, more than 10x the length of the &lt;a href="https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-5"&gt;published Opus 4.5 system prompt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One detail that caught my eye is the acknowledgements at the end, which include a list of &lt;a href="https://www.anthropic.com/constitution#acknowledgements"&gt;external contributors&lt;/a&gt; who helped review the document. I was intrigued to note that two of the fifteen listed names are Catholic members of the clergy - &lt;a href="https://www.frbrendanmcguire.org/biography"&gt;Father Brendan McGuire&lt;/a&gt; is a pastor in Los Altos with a Master’s degree in Computer Science and Math and &lt;a href="https://en.wikipedia.org/wiki/Paul_Tighe"&gt;Bishop Paul Tighe&lt;/a&gt; is an Irish Catholic bishop with a background in moral theology.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-personality"&gt;ai-personality&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/amanda-askell"&gt;amanda-askell&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="ai-personality"/><category term="amanda-askell"/><category term="ai"/><category term="llms"/><category term="ai-ethics"/><category term="generative-ai"/></entry><entry><title>Claude Cowork Exfiltrates Files</title><link href="https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-tag" rel="alternate"/><published>2026-01-14T22:15:22+00:00</published><updated>2026-01-14T22:15:22+00:00</updated><id>https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files"&gt;Claude Cowork Exfiltrates Files&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.&lt;/p&gt;
&lt;p&gt;Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the &lt;code&gt;https://api.anthropic.com/v1/files&lt;/code&gt; endpoint, allowing the attacker to retrieve their content later.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46622328"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lethal-trifecta"&gt;lethal-trifecta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/exfiltration-attacks"&gt;exfiltration-attacks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-cowork"&gt;claude-cowork&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="ai-agents"/><category term="ai"/><category term="claude-code"/><category term="llms"/><category term="prompt-injection"/><category term="security"/><category term="generative-ai"/><category term="lethal-trifecta"/><category term="exfiltration-attacks"/><category term="claude-cowork"/></entry><entry><title>Anthropic invests $1.5 million in the Python Software Foundation and open source security</title><link href="https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-tag" rel="alternate"/><published>2026-01-13T23:58:17+00:00</published><updated>2026-01-13T23:58:17+00:00</updated><id>https://simonwillison.net/2026/Jan/13/anthropic-invests-15-million-in-the-python-software-foundation-a/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://pyfound.blogspot.com/2025/12/anthropic-invests-in-python.html?m=1"&gt;Anthropic invests $1.5 million in the Python Software Foundation and open source security&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is outstanding news, especially given our decision to withdraw from that NSF grant application &lt;a href="https://simonwillison.net/2025/Oct/27/psf-withdrawn-proposal/"&gt;back in October&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note that while security is a focus these funds will also support other aspects of the PSF's work:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/psf"&gt;psf&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="anthropic"/><category term="python"/><category term="ai"/><category term="psf"/></entry><entry><title>First impressions of Claude Cowork, Anthropic's general agent</title><link href="https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-tag" rel="alternate"/><published>2026-01-12T21:46:13+00:00</published><updated>2026-01-12T21:46:13+00:00</updated><id>https://simonwillison.net/2026/Jan/12/claude-cowork/#atom-tag</id><summary type="html">
    &lt;p&gt;New from Anthropic today is &lt;a href="https://claude.com/blog/cowork-research-preview"&gt;Claude Cowork&lt;/a&gt;, a "research preview" that they describe as "Claude Code for the rest of your work". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. &lt;strong&gt;Update 16th January 2026&lt;/strong&gt;: it's now also available to $20/month Claude Pro subscribers.&lt;/p&gt;
&lt;p&gt;I've been saying for a while now that Claude Code is a "general agent" disguised as a developer tool. It can help you with any computer task that can be achieved by executing code or running terminal commands... which covers almost anything, provided you know what you're doing with it! What it really needs is a UI that doesn't involve the terminal and a name that doesn't scare away non-developers.&lt;/p&gt;
&lt;p&gt;"Cowork" is a pretty solid choice on the name front!&lt;/p&gt;
&lt;h4 id="what-it-looks-like"&gt;What it looks like&lt;/h4&gt;
&lt;p&gt;The interface for Cowork is a new tab in the Claude desktop app, called Cowork. It sits next to the existing Chat and Code tabs.&lt;/p&gt;
&lt;p&gt;It looks very similar to the desktop interface for regular Claude Code. You start with a prompt, optionally attaching a folder of files. It then starts work.&lt;/p&gt;
&lt;p&gt;I tried it out against my perpetually growing "blog-drafts" folder with the following prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/claude-cowork.jpg" alt="Screenshot of Claude AI desktop application showing a &amp;quot;Cowork&amp;quot; task interface. Left sidebar shows tabs for &amp;quot;Chat&amp;quot;, &amp;quot;Code&amp;quot;, and &amp;quot;Cowork&amp;quot; (selected), with &amp;quot;+ New task&amp;quot; button and a task titled &amp;quot;Review unpublished drafts for pu...&amp;quot; listed below. Text reads &amp;quot;These tasks run locally and aren't synced across devices&amp;quot;. Main panel header shows &amp;quot;Review unpublished drafts for publication&amp;quot;. User message in green bubble reads: &amp;quot;Look at my drafts that were started within the last three months and then check that I didn't publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready&amp;quot;. Claude responds: &amp;quot;I'll help you find drafts from the last three months and check if they've been published. Let me start by looking at your drafts folder.&amp;quot; Below is an expanded &amp;quot;Running command&amp;quot; section showing Request JSON with command: find /sessions/zealous-bold-ramanujan/mnt/blog-drafts -type f \\( -name \&amp;quot;*.md\&amp;quot; -o -name \&amp;quot;*.txt\&amp;quot; -o -name \&amp;quot;*.html\&amp;quot; \\) -mtime -90 -exec ls -la {} \\;, description: Find draft files modified in the last 90 days. Response text begins: &amp;quot;Found 46 draft files. Next let me read the content of each to get their titles/topics, then&amp;quot;. Right sidebar shows Progress section with three circular indicators (two checked, one pending) and text &amp;quot;Steps will show as the task unfolds.&amp;quot;, Artifacts section listing &amp;quot;publish-encouragement.html&amp;quot;, Context section with &amp;quot;Selected folders&amp;quot; showing &amp;quot;blog-drafts&amp;quot; folder, Connectors showing &amp;quot;Web search&amp;quot;, and Working files listing &amp;quot;llm-digest-october-2025.md&amp;quot;, &amp;quot;tests-not-optional-coding-agen...&amp;quot;, and &amp;quot;digest-november-2025.md&amp;quot;. Bottom shows reply input field, &amp;quot;Opus 4.5&amp;quot; model selector, user &amp;quot;Simon Willison&amp;quot; with &amp;quot;Max plan&amp;quot;, and disclaimer &amp;quot;Claude is AI and can make mistakes. Please double-check responses.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It started by running this command:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;find /sessions/zealous-bold-ramanujan/mnt/blog-drafts \
  -type f &lt;span class="pl-cce"&gt;\(&lt;/span&gt; -name &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;*.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -o -name &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;*.txt&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -o -name &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;*.html&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-cce"&gt;\)&lt;/span&gt; \
  -mtime -90 -exec ls -la {} &lt;span class="pl-cce"&gt;\;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;/sessions/zealous-bold-ramanujan/mnt/blog-drafts&lt;/code&gt; path instantly caught my eye. Anthropic say that Cowork can only access files you grant it access to - it looks to me like they're mounting those files into a containerized environment, which should mean we can trust Cowork not to be able to access anything outside of that sandbox.&lt;/p&gt;
&lt;p&gt;It turns out I have 46 draft files from the past three months. Claude then went to work with its search tool, running 44 individual searches against &lt;code&gt;site:simonwillison.net&lt;/code&gt; to figure out which of my drafts had already been published.&lt;/p&gt;
&lt;p&gt;Here's the eventual reply:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Based on my analysis, here are &lt;strong&gt;your unpublished drafts that appear closest to being ready for publication&lt;/strong&gt;:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;🔥 Most Ready to Publish (substantial content, not yet published)&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;freq-argued-llms.md&lt;/code&gt;&lt;/strong&gt; - "Frequently Argued Questions about LLMs" (22,602 bytes)
&lt;ul&gt;
&lt;li&gt;This is a meaty piece documenting common arguments about LLMs with your counterpoints&lt;/li&gt;
&lt;li&gt;Well-structured with a TL;DR and multiple sections&lt;/li&gt;
&lt;li&gt;No matching published article found on your site&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Very close to ready&lt;/strong&gt; - just needs a final review pass&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;claude-code-timeline.md&lt;/code&gt;&lt;/strong&gt; - "Claude Code Timeline and Codex Timeline" (3,075 bytes)
&lt;ul&gt;
&lt;li&gt;About viewing JSONL session logs from Claude Code and Codex&lt;/li&gt;
&lt;li&gt;You published &lt;code&gt;claude-code-transcripts&lt;/code&gt; on Dec 25, but this appears to be a different/earlier piece about timeline viewing tools&lt;/li&gt;
&lt;li&gt;Shorter but seems complete&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;datasette-1a20-upgrade-instructions.md&lt;/code&gt;&lt;/strong&gt; - Plugin Upgrade Guide (3,147 bytes)
&lt;ul&gt;
&lt;li&gt;Technical guide for plugin authors&lt;/li&gt;
&lt;li&gt;You published the main 1.0a20 announcement but this companion upgrade guide appears unpublished&lt;/li&gt;
&lt;li&gt;Would be valuable for plugin maintainers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;That's a good response! It found exactly what I needed to see, although those upgrade instructions are actually published elsewhere now (&lt;a href="https://docs.datasette.io/en/latest/upgrade_guide.html"&gt;in the Datasette docs&lt;/a&gt;) and weren't actually intended for my blog.&lt;/p&gt;
&lt;p&gt;Just for fun, and because I &lt;a href="https://simonwillison.net/2024/Oct/21/claude-artifacts/"&gt;really like artifacts&lt;/a&gt;, I asked for a follow-up:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Make me an artifact with exciting animated encouragements to get me to do it&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's what I got:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/claude-cowork-artifact.jpg" alt="Screenshot of the same Claude AI desktop application Cowork interface, now showing completed task results. Left panel shows &amp;quot;1 step &amp;gt;&amp;quot; with link &amp;quot;View your animated encouragement page&amp;quot;. Claude's response reads: &amp;quot;I created an over-the-top animated encouragement page with:&amp;quot; followed by bullet points: &amp;quot;🚀 Pulsing rockets and bouncing stats&amp;quot;, &amp;quot;✨ Falling emoji rain and confetti&amp;quot;, &amp;quot;🔥 Dancing fire emojis around your draft title&amp;quot;, &amp;quot;💫 Sparkles that follow your mouse&amp;quot;, &amp;quot;📊 An animated '95% ready' progress bar&amp;quot;, &amp;quot;💬 Rotating motivational quotes&amp;quot;, &amp;quot;🎉 A 'I'M GONNA DO IT!' button that triggers an explosion of confetti when clicked&amp;quot;. Center shows an artifact preview of the generated HTML page with dark background featuring animated rocket emojis, large white text &amp;quot;PUBLISH TIME!&amp;quot;, stats showing &amp;quot;22,602 bytes of wisdom waiting&amp;quot;, &amp;quot;95% ready to ship&amp;quot;, infinity symbol with &amp;quot;future arguments saved&amp;quot;, and a fire emoji with yellow text &amp;quot;Frequently&amp;quot; (partially visible). Top toolbar shows &amp;quot;Open in Firefox&amp;quot; button. Right sidebar displays Progress section with checkmarks, Artifacts section with &amp;quot;publish-encouragement.html&amp;quot; selected, Context section showing &amp;quot;blog-drafts&amp;quot; folder, &amp;quot;Web search&amp;quot; connector, and Working files listing &amp;quot;llm-digest-october-2025.md&amp;quot;, &amp;quot;tests-not-optional-coding-agen...&amp;quot;, and &amp;quot;digest-november-2025.md&amp;quot;. Bottom shows reply input, &amp;quot;Opus 4.5&amp;quot; model selector, and disclaimer text." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I couldn't figure out how to close the right sidebar so the artifact ended up cramped into a thin column but it did work. I expect Anthropic will fix that display bug pretty quickly.&lt;/p&gt;
&lt;h4 id="isn-t-this-just-claude-code-"&gt;Isn't this just Claude Code?&lt;/h4&gt;
&lt;p&gt;I've seen a few people ask what the difference between this and regular Claude Code is. The answer is &lt;em&gt;not a lot&lt;/em&gt;. As far as I can tell Claude Cowork is regular Claude Code wrapped in a less intimidating default interface and with a filesystem sandbox configured for you without you needing to know what a "filesystem sandbox" is.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: It's more than just a filesystem sandbox - I had Claude Code reverse engineer the Claude app and &lt;a href="https://gist.github.com/simonw/35732f187edbe4fbd0bf976d013f22c8"&gt;it found out&lt;/a&gt; that Claude uses VZVirtualMachine - the Apple Virtualization Framework - and downloads and boots a custom Linux root filesystem.&lt;/p&gt;
&lt;p&gt;I think that's a really smart product. Claude Code has an enormous amount of value that hasn't yet been unlocked for a general audience, and this seems like a pragmatic approach.&lt;/p&gt;

&lt;h4 id="the-ever-present-threat-of-prompt-injection"&gt;The ever-present threat of prompt injection&lt;/h4&gt;
&lt;p&gt;With a feature like this, my first thought always jumps straight to security. How big is the risk that someone using this might be hit by hidden malicious instruction somewhere that break their computer or steal their data?&lt;/p&gt;
&lt;p&gt;Anthropic touch on that directly in the announcement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You should also be aware of the risk of "&lt;a href="https://www.anthropic.com/research/prompt-injection-defenses"&gt;prompt injections&lt;/a&gt;": attempts by attackers to alter Claude's plans through content it might encounter on the internet. We've built sophisticated defenses against prompt injections, but agent safety---that is, the task of securing Claude's real-world actions---is still an active area of development in the industry.&lt;/p&gt;
&lt;p&gt;These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation. We recommend taking precautions, particularly while you learn how it works. We provide more detail in our &lt;a href="https://support.claude.com/en/articles/13364135-using-cowork-safely"&gt;Help Center&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That help page includes the following tips:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To minimize risks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Avoid granting access to local files with sensitive information, like financial documents.&lt;/li&gt;
&lt;li&gt;When using the Claude in Chrome extension, limit access to trusted sites.&lt;/li&gt;
&lt;li&gt;If you chose to extend Claude’s default internet access settings, be careful to only extend internet access to sites you trust.&lt;/li&gt;
&lt;li&gt;Monitor Claude for suspicious actions that may indicate prompt injection.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I do not think it is fair to tell regular non-programmer users to watch out for "suspicious actions that may indicate prompt injection"!&lt;/p&gt;
&lt;p&gt;I'm sure they have some impressive mitigations going on behind the scenes. I recently learned that the summarization applied by the WebFetch function in Claude Code and now in Cowork is partly intended as a prompt injection protection layer via &lt;a href="https://x.com/bcherny/status/1989025306980860226"&gt;this tweet&lt;/a&gt; from Claude Code creator Boris Cherny:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Summarization is one thing we do to reduce prompt injection risk. Are you running into specific issues with it?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But Anthropic are being honest here with their warnings: they can attempt to filter out potential attacks all they like but the one thing they can't provide is guarantees that no future attack will be found that sneaks through their defenses and steals your data (see &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/"&gt;the lethal trifecta&lt;/a&gt; for more on this.)&lt;/p&gt;
&lt;p&gt;The problem with prompt injection remains that until there's a high profile incident it's really hard to get people to take it seriously. I myself have all sorts of Claude Code usage that could cause havoc if a malicious injection got in. Cowork does at least run in a filesystem sandbox by default, which is more than can be said for my &lt;code&gt;claude --dangerously-skip-permissions&lt;/code&gt; habit!&lt;/p&gt;
&lt;p&gt;I wrote more about this in my 2025 round-up: &lt;a href="https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-year-of-yolo-and-the-normalization-of-deviance"&gt;The year of YOLO and the Normalization of Deviance&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="this-is-still-a-strong-signal-of-the-future"&gt;This is still a strong signal of the future&lt;/h4&gt;
&lt;p&gt;Security worries aside, Cowork represents something really interesting. This is a general agent that looks well positioned to bring the wildly powerful capabilities of Claude Code to a wider audience.&lt;/p&gt;
&lt;p&gt;I would be very surprised if Gemini and OpenAI don't follow suit with their own offerings in this category.&lt;/p&gt;
&lt;p&gt;I imagine OpenAI are already regretting burning the name "ChatGPT Agent" on their janky, experimental and mostly forgotten browser automation tool &lt;a href="https://simonwillison.net/2025/Aug/4/chatgpt-agents-user-agent/"&gt;back in August&lt;/a&gt;!&lt;/p&gt;
&lt;h4 id="bonus-and-a-silly-logo"&gt;Bonus: and a silly logo&lt;/h4&gt;
&lt;p&gt;bashtoni &lt;a href="https://news.ycombinator.com/item?id=46593022#46593553"&gt;on Hacker News&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Simple suggestion: logo should be a cow and and orc to match how I originally read the product name.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I couldn't resist &lt;a href="https://gist.github.com/simonw/d06dec3d62dee28f2bd993eb78beb2ce"&gt;throwing that one at Nano Banana&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/cow-ork.jpg" alt="An anthropic style logo with a cow and an ork on it" style="max-width: 100%;" /&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lethal-trifecta"&gt;lethal-trifecta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-cowork"&gt;claude-cowork&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sandboxing"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-agents"/><category term="claude-code"/><category term="lethal-trifecta"/><category term="claude-cowork"/></entry><entry><title>The November 2025 inflection point</title><link href="https://simonwillison.net/2026/Jan/4/inflection/#atom-tag" rel="alternate"/><published>2026-01-04T23:21:42+00:00</published><updated>2026-01-04T23:21:42+00:00</updated><id>https://simonwillison.net/2026/Jan/4/inflection/#atom-tag</id><summary type="html">
    &lt;p&gt;It genuinely feels to me like GPT-5.2 and Opus 4.5 in November represent an inflection point - one of those moments where the models get incrementally better in a way that tips across an invisible capability line where suddenly a whole bunch of much harder coding problems open up.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-4"&gt;claude-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="gpt-5"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="claude-4"/><category term="november-2025-inflection"/></entry></feed>