<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: gpt-5</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/gpt-5.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-01-04T23:21:42+00:00</updated><author><name>Simon Willison</name></author><entry><title>The November 2025 inflection point</title><link href="https://simonwillison.net/2026/Jan/4/inflection/#atom-tag" rel="alternate"/><published>2026-01-04T23:21:42+00:00</published><updated>2026-01-04T23:21:42+00:00</updated><id>https://simonwillison.net/2026/Jan/4/inflection/#atom-tag</id><summary type="html">
    &lt;p&gt;It genuinely feels to me like GPT-5.2 and Opus 4.5 in November represent an inflection point - one of those moments where the models get incrementally better in a way that tips across an invisible capability line where suddenly a whole bunch of much harder coding problems open up.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-4"&gt;claude-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="gpt-5"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="claude-4"/><category term="november-2025-inflection"/></entry><entry><title>I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours</title><link href="https://simonwillison.net/2025/Dec/15/porting-justhtml/#atom-tag" rel="alternate"/><published>2025-12-15T23:58:38+00:00</published><updated>2025-12-15T23:58:38+00:00</updated><id>https://simonwillison.net/2025/Dec/15/porting-justhtml/#atom-tag</id><summary type="html">
    &lt;p&gt;I &lt;a href="https://simonwillison.net/2025/Dec/14/justhtml/"&gt;wrote about JustHTML yesterday&lt;/a&gt; - Emil Stenström's project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try &lt;strong&gt;porting JustHTML from Python to JavaScript&lt;/strong&gt; with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.&lt;/p&gt;
&lt;h4 id="tl-dr"&gt;TL;DR&lt;/h4&gt;
&lt;p&gt;I built &lt;a href="https://github.com/simonw/justjshtml"&gt;simonw/justjshtml&lt;/a&gt;, a dependency-free HTML5 parsing library in JavaScript which passes 9,200 tests from the html5lib-tests suite and imitates the API design of Emil's JustHTML library.&lt;/p&gt;
&lt;p&gt;It took two initial prompts and a few tiny follow-ups. &lt;a href="https://simonwillison.net/2025/Dec/11/gpt-52/"&gt;GPT-5.2&lt;/a&gt; running in &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.&lt;/p&gt;
&lt;p&gt;Time elapsed from project idea to finished library: about 4 hours, during which I also bought and decorated a Christmas tree with family and watched the latest Knives Out movie.&lt;/p&gt;
&lt;h4 id="some-background"&gt;Some background&lt;/h4&gt;
&lt;p&gt;One of the most important contributions of the HTML5 specification ten years ago was the way it precisely specified how &lt;em&gt;invalid&lt;/em&gt; HTML should be parsed. The world is full of invalid documents and having a specification that covers those means browsers can treat them in the same way - there's no more "undefined behavior" to worry about when building parsing software.&lt;/p&gt;
&lt;p&gt;Unsurprisingly, those invalid parsing rules are pretty complex! The free online book &lt;a href="https://htmlparser.info/"&gt;Idiosyncrasies of the HTML parser&lt;/a&gt; by Simon Pieters is an excellent deep dive into this topic, in particular &lt;a href="https://htmlparser.info/parser/"&gt;Chapter 3. The HTML parser&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Python &lt;a href="https://github.com/html5lib/html5lib-python"&gt;html5lib&lt;/a&gt; project started the &lt;a href="https://github.com/html5lib/html5lib-tests"&gt;html5lib-tests&lt;/a&gt; repository with a set of implementation-independent tests. These have since become the gold standard for interoperability testing of HTML5 parsers, and are used by projects such as &lt;a href="https://github.com/servo/servo"&gt;Servo&lt;/a&gt; which used them to help build &lt;a href="https://github.com/servo/html5ever"&gt;html5ever&lt;/a&gt;, a "high-performance browser-grade HTML5 parser" written in Rust.&lt;/p&gt;
&lt;p&gt;Emil Stenström's &lt;a href="https://github.com/EmilStenstrom/justhtml"&gt;JustHTML&lt;/a&gt; project is a pure-Python implementation of an HTML5 parser that passes the full html5lib-tests suite. Emil &lt;a href="https://friendlybit.com/python/writing-justhtml-with-coding-agents/"&gt;spent a couple of months&lt;/a&gt; working on this as a side project, deliberately picking a problem with a comprehensive existing test suite to see how far he could get with coding agents.&lt;/p&gt;
&lt;p&gt;At one point he had the agents rewrite it based on a close inspection of the Rust html5ever library. I don't know how much of this was direct translation versus inspiration (here's Emil's &lt;a href="https://news.ycombinator.com/item?id=46264195#46267059"&gt;commentary on that&lt;/a&gt;) - his project has 1,215 commits total so it appears to have included a huge amount of iteration, not just a straight port.&lt;/p&gt;
&lt;p&gt;My project &lt;strong&gt;is&lt;/strong&gt; a straight port. I instructed Codex CLI to build a JavaScript version of Emil's Python code.&lt;/p&gt;
&lt;h4 id="the-process-in-detail"&gt;The process in detail&lt;/h4&gt;
&lt;p&gt;I started with a bit of mise en place. I checked out two repos and created an empty third directory for the new project:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; &lt;span class="pl-k"&gt;~&lt;/span&gt;/dev
git clone https://github.com/EmilStenstrom/justhtml
git clone https://github.com/html5lib/html5lib-tests
mkdir justjshtml
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; justjshtml&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I started Codex CLI for GPT-5.2 like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex --yolo -m gpt-5.2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;--yolo&lt;/code&gt; flag is a shortcut for &lt;code&gt;--dangerously-bypass-approvals-and-sandbox&lt;/code&gt;, which is every bit as dangerous as it sounds.&lt;/p&gt;
&lt;p&gt;My first prompt told Codex to inspect the existing code and use it to build a specification for the new JavaScript library:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. It is going to have a similar API to the Python library but in JavaScript. It will have no dependencies other than raw JavaScript, hence it will work great in the browser and node.js and other environments. Start by reading ~/dev/justhtml and designing the user-facing API for the new library - create a spec.md containing your plan.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I reviewed the spec, which included a set of proposed milestones, and told it to add another:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add an early step to the roadmap that involves an initial version that parses a simple example document that is valid and returns the right results. Then add and commit the spec.md file.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/justjshtml/blob/19b8eb1f2ca80f428a3c40862d5ec05d36e5166b/spec.md"&gt;the resulting spec.md file&lt;/a&gt;. My request for that initial version became "Milestone 0.5" which looked like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Milestone 0.5 — End-to-end smoke parse (single valid document)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement the smallest end-to-end slice so the public API is real early:
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;new JustHTML("&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;Hello&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;")&lt;/code&gt; returns a tree with the expected tag structure and text nodes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;doc.toText()&lt;/code&gt; returns &lt;code&gt;"Hello"&lt;/code&gt; and &lt;code&gt;doc.errors&lt;/code&gt; is empty for this valid input.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;scripts/smoke.js&lt;/code&gt; (no deps) that runs the example and asserts the expected structure/output.&lt;/li&gt;
&lt;li&gt;Gate: &lt;code&gt;node scripts/smoke.js&lt;/code&gt; passes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Implement Milestone 0.5&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And off it went. The resulting code appeared to work so I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Configure GitHub Actions test.yml to run that on every commit, then commit and push&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I ran &lt;code&gt;git init&lt;/code&gt; and created a private GitHub repository for this project at this point, and set up the local directory to push to that remote.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/justjshtml/commit/48497cbd05584c88af757a0736a2ee447468b6b3"&gt;that initial push&lt;/a&gt;. Then I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;OK do the rest, commit and push often&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And that was almost it!&lt;/p&gt;
&lt;p&gt;I set my laptop to not fall asleep and left it to its devices while we went off to buy a Christmas tree. The "commit and push often" meant I could monitor its progress on my phone by refreshing &lt;a href="https://github.com/simonw/justjshtml/commits/main"&gt;the commit log on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I was running this against my $20/month ChatGPT Plus account, which has a five hour token allowance window for Codex CLI. That ran out at 6:35pm and Codex paused, so I waited until the reset point at 7:14pm and typed:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Keep going&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At 9:30pm it declared itself done with the following summary message:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;• What’s Done

  - html5lib-tests now run in CI and pass: tokenizer 6810/6810, tree 1770/1782
    (12 skipped: #script-on), serializer 230/230, encoding 82/83 (1 skipped:
    scripted) (.github/workflows/test.yml:1).
  - Public API expanded: stream() (src/stream.js:1), CSS selectors query()/
    matches() (src/selector.js:1), and toMarkdown() (src/markdown.js:1), all
    exported via src/index.js:1.
  - Added no-deps unit scripts for selectors/stream/markdown (scripts/run-
    selector-tests.js:1, scripts/run-stream-tests.js:1, scripts/run-markdown-
    tests.js:1) and wired them into CI.
  - Fixed TreeBuilder so &amp;lt;br&amp;gt; no longer closes &amp;lt;p&amp;gt; (src/treebuilder.js:647).
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a finishing touch, I had it add a playground interface so I could try out the new library in my browser. I prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add a playground.html in the top level folder which loads the necessary ES modules from ./src/... and implements the exact same functionality as seen on https://tools.simonwillison.net/justhtml but using the JavaScript library instead of Pyodide&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It fetched my &lt;a href="https://tools.simonwillison.net/justhtml"&gt;existing JustHTML playground page&lt;/a&gt; (&lt;a href="https://simonwillison.net/2025/Dec/14/justhtml/#first-impressions-of-justhtml"&gt;described here&lt;/a&gt;) using &lt;code&gt;curl&lt;/code&gt; and built a new &lt;code&gt;playground.html&lt;/code&gt; file that loaded the new JavaScript code instead. This worked &lt;em&gt;perfectly&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I enabled GitHub Pages for my still-private repo which meant I could access the new playground at this URL:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://simonw.github.io/justjshtml/playground.html"&gt;https://simonw.github.io/justjshtml/playground.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/justjshtml-playground.jpg" alt="Screenshot of JustJSHTML Playground web application. Header reads &amp;quot;JustJSHTML Playground&amp;quot; with subtitle &amp;quot;A dependency-free JavaScript HTML5 parser - GitHub&amp;quot;. Below is a status bar showing &amp;quot;JavaScript Environment&amp;quot; with a green &amp;quot;Ready&amp;quot; badge. The main input area has &amp;quot;Paste HTML&amp;quot; and &amp;quot;Fetch from URL&amp;quot; buttons, with a text area containing HTML code: &amp;quot;&amp;lt;!DOCTYPE html&amp;gt; &amp;lt;html&amp;gt; &amp;lt;head&amp;gt; &amp;lt;title&amp;gt;Example Page&amp;lt;/title&amp;gt; &amp;lt;/head&amp;gt; &amp;lt;body&amp;gt; &amp;lt;header&amp;gt; &amp;lt;nav&amp;gt; &amp;lt;ul&amp;gt;&amp;quot;. A &amp;quot;Playground Mode&amp;quot; section shows buttons for &amp;quot;CSS Selector Query&amp;quot;, &amp;quot;Pretty Print HTML&amp;quot;, &amp;quot;Tree Structure&amp;quot;, &amp;quot;Stream Events&amp;quot;, &amp;quot;Extract Text&amp;quot;, and &amp;quot;To Markdown&amp;quot; (highlighted in purple). Below is a text field labeled &amp;quot;CSS Selector (optional - leave empty for whole document):&amp;quot; with placeholder &amp;quot;e.g., article, main, .content (or leave empty)&amp;quot; and a green &amp;quot;Convert to Markdown&amp;quot; button. The Output section has a teal header with &amp;quot;Whole document&amp;quot; badge and displays converted markdown: &amp;quot;Example Page&amp;quot; followed by &amp;quot;- [Home](/)&amp;quot; &amp;quot;- [About](/about)&amp;quot; &amp;quot;- [Contact](/contact)&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;All it needed now was some documentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add a comprehensive README with full usage instructions including attribution plus how this was built plus how to use in in HTML plus how to use it in Node.js&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can &lt;a href="https://github.com/simonw/justjshtml/blob/f3a33fdb29bf97846fd017185edc8cf82783032e/README.md"&gt;read the result here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are now at eight prompts total, running for just over four hours and I've decorated for Christmas and watched &lt;a href="https://en.wikipedia.org/wiki/Wake_Up_Dead_Man"&gt;Wake Up Dead Man&lt;/a&gt; on Netflix.&lt;/p&gt;
&lt;p&gt;According to Codex CLI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Token usage: total=2,089,858 input=1,464,295 (+ 97,122,176 cached) output=625,563 (reasoning 437,010)&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My &lt;a href="https://www.llm-prices.com/#it=2089858&amp;amp;cit=97122176&amp;amp;ot=625563&amp;amp;sel=gpt-5.2"&gt;llm-prices.com calculator&lt;/a&gt; estimates that at $29.41 if I was paying for those tokens at API prices, but they were included in my $20/month ChatGPT Plus subscription so the actual extra cost to me was zero.&lt;/p&gt;
&lt;h4 id="what-can-we-learn-from-this-"&gt;What can we learn from this?&lt;/h4&gt;
&lt;p&gt;I'm sharing this project because I think it demonstrates a bunch of interesting things about the state of LLMs in December 2025.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Frontier LLMs really can perform complex, multi-hour tasks with hundreds of tool calls and minimal supervision. I used GPT-5.2 for this but I have no reason to believe that Claude Opus 4.5 or Gemini 3 Pro would not be able to achieve the same thing - the only reason I haven't tried is that I don't want to burn another 4 hours of time and several million tokens on more runs.&lt;/li&gt;
&lt;li&gt;If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed. I called this &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designing the agentic loop&lt;/a&gt; a few months ago. I think it's the key skill to unlocking the potential of LLMs for complex tasks.&lt;/li&gt;
&lt;li&gt;Porting entire open source libraries from one language to another via a coding agent works extremely well.&lt;/li&gt;
&lt;li&gt;Code is so cheap it's practically free. Code that &lt;em&gt;works&lt;/em&gt; continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.&lt;/li&gt;
&lt;li&gt;We haven't even &lt;em&gt;begun&lt;/em&gt; to unpack the etiquette and ethics around this style of development. Is it responsible and appropriate to churn out a direct port of a library like this in a few hours while watching a movie? What would it take for code built like this to be trusted in production?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'll end with some open questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this library represent a legal violation of copyright of either the Rust library or the Python one?&lt;/li&gt;
&lt;li&gt;Even if this is legal, is it ethical to build a library in this way?&lt;/li&gt;
&lt;li&gt;Does this format of development hurt the open source ecosystem?&lt;/li&gt;
&lt;li&gt;Can I even assert copyright over this, given how much of the work was produced by the LLM?&lt;/li&gt;
&lt;li&gt;Is it responsible to publish software libraries built in this way?&lt;/li&gt;
&lt;li&gt;How much better would this library be if an expert team hand crafted it over the course of several months?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update 11th January 2026&lt;/strong&gt;: I originally ended this post with just these open questions, but I've now provided &lt;a href="https://simonwillison.net/2026/Jan/11/answers/"&gt;my own answers to the questions&lt;/a&gt; in a new post.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-porting"&gt;vibe-porting&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="html"/><category term="javascript"/><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="gpt-5"/><category term="codex-cli"/><category term="november-2025-inflection"/><category term="vibe-porting"/></entry><entry><title>OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI</title><link href="https://simonwillison.net/2025/Dec/12/openai-skills/#atom-tag" rel="alternate"/><published>2025-12-12T23:29:51+00:00</published><updated>2025-12-12T23:29:51+00:00</updated><id>https://simonwillison.net/2025/Dec/12/openai-skills/#atom-tag</id><summary type="html">
    &lt;p&gt;One of the things that most excited me about &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Anthropic's new Skills mechanism&lt;/a&gt; back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.&lt;/p&gt;
&lt;h4 id="skills-in-chatgpt"&gt;Skills in ChatGPT&lt;/h4&gt;
&lt;p&gt;I learned about this &lt;a href="https://x.com/elias_judin/status/1999491647563006171"&gt;from Elias Judin&lt;/a&gt; this morning. It turns out the Code Interpreter feature of ChatGPT now has a new &lt;code&gt;/home/oai/skills&lt;/code&gt; folder which you can access simply by prompting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Create a zip file of /home/oai/skills&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I &lt;a href="https://chatgpt.com/share/693c9645-caa4-8006-9302-0a9226ea7599"&gt;tried that myself&lt;/a&gt; and got back &lt;a href="https://static.simonwillison.net/static/cors-allow/2025/skills.zip"&gt;this zip file&lt;/a&gt;. Here's &lt;a href="https://tools.simonwillison.net/zip-wheel-explorer?url=https%3A%2F%2Fstatic.simonwillison.net%2Fstatic%2Fcors-allow%2F2025%2Fskills.zip"&gt;a UI for exploring its content&lt;/a&gt; (&lt;a href="https://tools.simonwillison.net/colophon#zip-wheel-explorer.html"&gt;more about that tool&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/skills-explore.jpg" alt="Screenshot of file explorer. Files skills/docs/render_docsx.py and skills/docs/skill.md and skills/pdfs/ and skills/pdfs/skill.md - that last one is expanded and reads: # PDF reading, creation, and review guidance  ## Reading PDFs - Use pdftoppm -png $OUTDIR/$BASENAME.pdf $OUTDIR/$BASENAME to convert PDFs to PNGs. - Then open the PNGs and read the images. - pdfplumber is also installed and can be used to read PDFs. It can be used as a complementary tool to pdftoppm but not replacing it. - Only do python printing as a last resort because you will miss important details with text extraction (e.g. figures, tables, diagrams).  ## Primary tooling for creating PDFs - Generate PDFs programmatically with reportlab as the primary tool. In most cases, you should use reportlab to create PDFs. - If there are other packages you think are necessary for the task (eg. pypdf, pyMuPDF), you can use them but you may need topip install them first. - After each meaningful update—content additions, layout adjustments, or style changes—render the PDF to images to check layout fidelity:   - pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX - Inspect every exported PNG before continuing work. If anything looks off, fix the source and re-run the render → inspect loop until the pages are clean.  ## Quality expectations - Maintain a polished, intentional visual design: consistent typography, spacing, margins, color palette, and clear section breaks across all pages. - Avoid major rendering issues—no clipped text, overlapping elements, black squares, broken tables, or unreadable glyphs. The rendered pages should look like a curated document, not raw template output. - Charts, tables, diagrams, and images must be sharp, well-aligned, and properly labeled in the PNGs. Legends and axes should be readable without excessive zoom. - Text must be readable at normal viewing size; avoid walls of filler text or dense, unstructured bullet lists. Use whitespace to separate ideas. - Never use the U+2011 non-breaking hyphen or other unicode dashes as they will not be" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;So far they cover spreadsheets, docx and PDFs. Interestingly their chosen approach for PDFs and documents is to convert them to rendered per-page PNGs and then pass those through their vision-enabled GPT models, presumably to maintain information from layout and graphics that would be lost if they just ran text extraction.&lt;/p&gt;
&lt;p&gt;Elias &lt;a href="https://github.com/eliasjudin/oai-skills"&gt;shared copies in a GitHub repo&lt;/a&gt;. They look very similar to Anthropic's implementation of the same kind of idea, currently published in their &lt;a href="https://github.com/anthropics/skills/tree/main/skills"&gt;anthropics/skills&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;I tried it out by prompting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a PDF with a summary of the rimu tree situation right now and what it means for kakapo breeding season&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sure enough, GPT-5.2 Thinking started with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Reading skill.md for PDF creation guidelines&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Searching rimu mast and Kākāpō 2025 breeding status&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It took &lt;a href="https://chatgpt.com/share/693ca54b-f770-8006-904b-9f31a585180a"&gt;just over eleven minutes&lt;/a&gt; to produce &lt;a href="https://static.simonwillison.net/static/cors-allow/2025/rimu_kakapo_breeding_brief.pdf"&gt;this PDF&lt;/a&gt;, which was long enough that I had Claude Code for web &lt;a href="https://github.com/simonw/tools/pull/155"&gt;build me a custom PDF viewing tool&lt;/a&gt; while I waited.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://tools.simonwillison.net/view-pdf?url=https%3A%2F%2Fstatic.simonwillison.net%2Fstatic%2Fcors-allow%2F2025%2Frimu_kakapo_breeding_brief.pdf"&gt;Here's ChatGPT's PDF in that tool&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/rimu.jpg" alt="Screenshot of my tool. There is a URL at the top, a Load PDF button and pagination controls. Then the PDF itself is shown, which reads: Rimu mast status and what it means for the kākāpō breeding season Summary as of 12 December 2025 (Pacific/Auckland context) Kākāpō breeding is tightly linked to rimu (Dacrydium cupressinum) mast events: when rimu trees set and ripen large amounts of fruit, female kākāpō are much more likely to nest, and more chicks can be successfully raised. Current monitoring indicates an unusually strong rimu fruiting signal heading into the 2025/26 season, which sets the stage for a potentially large breeding year in 2026.^1,2 Key numbers at a glance Kākāpō population (official DOC count) 237 birds alive Breeding trigger (rimu fruiting)&amp;gt;10% of rimu branch tips bearing fruit Forecast rimu fruiting for 2026 (DOC monitoring) Around 50–60% fruiting across breeding islands¹Breeding-age females (DOC 2025 planning figure)About 87 females (potentially nearly all could nest)" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;(I am &lt;strong&gt;very excited&lt;/strong&gt; about &lt;a href="https://www.auckland.ac.nz/en/news/2025/12/03/bumper-breeding-season-for-kakapo-on-the-cards.html"&gt;Kākāpō breeding season this year&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The reason it took so long is that it was fastidious about looking at and tweaking its own work. I appreciated that at one point it tried rendering the PDF and noticed that the macrons in kākāpō were not supported by the chosen font, so it switched to something else:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/skills-macrons.jpg" alt="ChatGPT screenshot. Analyzed image. There's an image of a page of PDF with obvious black blocks on some of the letters in the heading. It then says: Fixing font issues with macrons. The page is showing black squares for words like &amp;quot;kākāpō,&amp;quot; probably because Helvetica can't handle macrons. I'll switch to a font that supports them, such as DejaVu Sans or Noto Sans. I'll register both regular and bold fonts, then apply them to the document. I'll update the footer to note the issue with Helvetica. Time to rebuild the PDF!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="skills-in-codex-cli"&gt;Skills in Codex CLI&lt;/h4&gt;
&lt;p&gt;Meanwhile, two weeks ago OpenAI's open source Codex CLI tool landed a PR titled &lt;a href="https://github.com/openai/codex/pull/7412"&gt;feat: experimental support for skills.md&lt;/a&gt;. The most recent docs for that are in &lt;a href="https://github.com/openai/codex/blob/main/docs/skills.md"&gt;docs/skills.md&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The documentation suggests that any folder in &lt;code&gt;~/.codex/skills&lt;/code&gt; will be treated as a skill.&lt;/p&gt;
&lt;p&gt;I dug around and found the code that generates the prompt that drives the skill system in &lt;a href="https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd16f9f5fb64a573a69/codex-rs/core/src/skills/render.rs#L20-L38"&gt;codex-rs/core/src/skills/render.rs&lt;/a&gt; - here's a Gist with &lt;a href="https://gist.github.com/simonw/25f2c3a9e350274bc2b76a79bc8ae8b2"&gt;a more readable version of that prompt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://claude.ai/share/0a9b369b-f868-4065-91d1-fd646c5db3f4"&gt;used Claude Opus 4.5's skill authoring skill&lt;/a&gt; to create &lt;a href="https://github.com/datasette/skill"&gt;this skill for creating Datasette plugins&lt;/a&gt;, then installed it into my Codex CLI skills folder like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone https://github.com/datasette/skill \
  &lt;span class="pl-k"&gt;~&lt;/span&gt;/.codex/skills/datasette-plugin&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You have to run Codex with the &lt;code&gt;--enable skills&lt;/code&gt; option. I ran this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; /tmp
mkdir datasette-cowsay
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; datasette-cowsay
codex --enable skills -m gpt-5.2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;list skills&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Codex replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;- datasette-plugins — Writing Datasette plugins using Python + pluggy (file: /Users/simon/.codex/skills/datasette-plugin/SKILL.md)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;- Discovery — How to find/identify available skills (no SKILL.md path provided in the list)&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Write a Datasette plugin in this folder adding a /-/cowsay?text=hello page that displays a pre with cowsay from PyPI saying that text&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It worked perfectly! Here's &lt;a href="https://github.com/simonw/datasette-cowsay"&gt;the plugin code it wrote&lt;/a&gt; and here's &lt;a href="http://gistpreview.github.io/?96ee928370b18eabc2e0fad9aaa46d4b"&gt;a copy of the full Codex CLI transcript&lt;/a&gt;, generated with my &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;terminal-to-html tool&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can try that out yourself if you have &lt;code&gt;uvx&lt;/code&gt; installed like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uvx --with https://github.com/simonw/datasette-cowsay/archive/refs/heads/main.zip \
  datasette&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then visit:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;http://127.0.0.1:8001/-/cowsay?text=This+is+pretty+fun
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/cowsay-datasette.jpg" alt="Screenshot of that URL in Firefox, an ASCII art cow says This is pretty fun." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="skills-are-a-keeper"&gt;Skills are a keeper&lt;/h4&gt;
&lt;p&gt;When I first wrote about skills in October I said &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Claude Skills are awesome, maybe a bigger deal than MCP&lt;/a&gt;. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me that I called that one correctly.&lt;/p&gt;
&lt;p&gt;Skills are based on a &lt;em&gt;very&lt;/em&gt; light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere. This could be a good initiative for the new &lt;a href="https://aaif.io/"&gt;Agentic AI Foundation&lt;/a&gt; (&lt;a href="https://simonwillison.net/2025/Dec/9/agentic-ai-foundation/"&gt;previously&lt;/a&gt;) to take on.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kakapo"&gt;kakapo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="pdf"/><category term="ai"/><category term="kakapo"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="anthropic"/><category term="coding-agents"/><category term="gpt-5"/><category term="codex-cli"/><category term="skills"/></entry><entry><title>GPT-5.2</title><link href="https://simonwillison.net/2025/Dec/11/gpt-52/#atom-tag" rel="alternate"/><published>2025-12-11T23:58:04+00:00</published><updated>2025-12-11T23:58:04+00:00</updated><id>https://simonwillison.net/2025/Dec/11/gpt-52/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI reportedly &lt;a href="https://www.wsj.com/tech/ai/openais-altman-declares-code-red-to-improve-chatgpt-as-google-threatens-ai-lead-7faf5ea6"&gt;declared a "code red"&lt;/a&gt; on the 1st of December in response to increasingly credible competition from the likes of Google's Gemini 3. It's less than two weeks later and they just &lt;a href="https://openai.com/index/introducing-gpt-5-2/"&gt;announced GPT-5.2&lt;/a&gt;, calling it "the most capable model series yet for professional knowledge work".&lt;/p&gt;
&lt;h4 id="key-characteristics-of-gpt-5-2"&gt;Key characteristics of GPT-5.2&lt;/h4&gt;
&lt;p&gt;The new model comes in two variants: GPT-5.2 and GPT-5.2 Pro. There's no Mini variant yet.&lt;/p&gt;
&lt;p&gt;GPT-5.2 is available via their UI in both "instant" and "thinking" modes, presumably still corresponding to the API concept of different reasoning effort levels.&lt;/p&gt;
&lt;p&gt;The knowledge cut-off date for both variants is now &lt;strong&gt;August 31st 2025&lt;/strong&gt;. This is significant - GPT 5.1 and 5 were both Sep 30, 2024 and GPT-5 mini was May 31, 2024.&lt;/p&gt;
&lt;p&gt;Both of the 5.2 models have a 400,000 token context window and 128,000 max output tokens - no different from 5.1 or 5.&lt;/p&gt;
&lt;p&gt;Pricing wise 5.2 is a rare &lt;em&gt;increase&lt;/em&gt; - it's 1.4x the cost of GPT 5.1, at $1.75/million input and $14/million output. GPT-5.2 Pro is $21.00/million input and a hefty $168.00/million output, putting it &lt;a href="https://www.llm-prices.com/#sel=gpt-4.5%2Co1-pro%2Cgpt-5.2-pro"&gt;up there&lt;/a&gt; with their previous most expensive models o1 Pro and GPT-4.5.&lt;/p&gt;
&lt;p&gt;So far the main benchmark results we have are self-reported by OpenAI. The most interesting ones are a 70.9% score on their GDPval "Knowledge work tasks" benchmark (GPT-5 got 38.8%) and a 52.9% on ARC-AGI-2 (up from 17.6% for GPT-5.1 Thinking).&lt;/p&gt;
&lt;p&gt;The ARC Prize Twitter account provided &lt;a href="https://x.com/arcprize/status/1999182732845547795"&gt;this interesting note&lt;/a&gt; on the efficiency gains for GPT-5.2 Pro&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A year ago, we verified a preview of an unreleased version of @OpenAI
o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task&lt;/p&gt;
&lt;p&gt;Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task&lt;/p&gt;
&lt;p&gt;This represents a ~390X efficiency improvement in one year&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;GPT-5.2 can be accessed in OpenAI's Codex CLI tool like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;codex -m gpt-5.2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are three new API models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.2"&gt;gpt-5.2&lt;/a&gt; - I think this is what you get if you select "GPT-5.2 Thinking" in ChatGPT but &lt;a href="https://twitter.com/simonw/status/1999603339382976785"&gt;I'm a little confused&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/models/gpt-5.2-chat-latest"&gt;gpt-5.2-chat-latest&lt;/a&gt; - the model used by ChatGPT for "GPT-5.2 Instant" mode. It's priced the same as GPT-5.2 but has a reduced 128,000 context window with 16,384 max output tokens.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.2-pro"&gt;gpt-5.2-pro&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenAI have published a new &lt;a href="https://cookbook.openai.com/examples/gpt-5/gpt-5-2_prompting_guide"&gt;GPT-5.2 Prompting Guide&lt;/a&gt;. An interesting note from that document is that compaction can now be run with &lt;a href="https://platform.openai.com/docs/api-reference/responses/compact"&gt;a new dedicated server-side API&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For long-running, tool-heavy workflows that exceed the standard context window, GPT-5.2 with Reasoning supports response compaction via the &lt;code&gt;/responses/compact&lt;/code&gt; endpoint. Compaction performs a loss-aware compression pass over prior conversation state, returning encrypted, opaque items that preserve task-relevant information while dramatically reducing token footprint. This allows the model to continue reasoning across extended workflows without hitting context limits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id="it-s-better-at-vision"&gt;It's better at vision&lt;/h4&gt;
&lt;p&gt;One note from the announcement that caught my eye:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT‑5.2 Thinking is our strongest vision model yet, cutting error rates roughly in half on chart reasoning and software interface understanding.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I had &lt;a href="https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-coding/"&gt;disappointing results from GPT-5&lt;/a&gt; on an OCR task a while ago. I tried it against GPT-5.2 and it did &lt;em&gt;much&lt;/em&gt; better:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m gpt-5.2 ocr -a https://static.simonwillison.net/static/2025/ft.jpeg&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/b4a13f1e424e58b8b0aca72ae2c3cb00"&gt;the result&lt;/a&gt; from that, which cost 1,520 input and 1,022 for a total of &lt;a href="https://www.llm-prices.com/#it=1520&amp;amp;ot=1022&amp;amp;sel=gpt-5.2"&gt;1.6968 cents&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="rendering-some-pelicans"&gt;Rendering some pelicans&lt;/h4&gt;
&lt;p&gt;For my classic "Generate an SVG of a pelican riding a bicycle" test:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m gpt-5.2 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/gpt-2.5-pelican.png" alt="Described by GPT-5.2: Cartoon-style illustration: A white, duck-like bird with a small black eye, oversized orange beak (with a pale blue highlight along the lower edge), and a pink neckerchief rides a blue-framed bicycle in side view; the bike has two large black wheels with gray spokes, a blue front fork, visible black crank/pedal area, and thin black handlebar lines, with gray motion streaks and a soft gray shadow under the bike on a light-gray road; background is a pale blue sky with a simple yellow sun at upper left and two rounded white clouds (one near upper center-left and one near upper right)." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And for the more advanced alternative test, which tests instruction following in a little more depth:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m gpt-5.2 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a California brown pelican riding a bicycle. The bicycle&lt;/span&gt;
&lt;span class="pl-s"&gt;must have spokes and a correctly shaped bicycle frame. The pelican must have its&lt;/span&gt;
&lt;span class="pl-s"&gt;characteristic large pouch, and there should be a clear indication of feathers.&lt;/span&gt;
&lt;span class="pl-s"&gt;The pelican must be clearly pedaling the bicycle. The image should show the full&lt;/span&gt;
&lt;span class="pl-s"&gt;breeding plumage of the California brown pelican.&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/gpt-5.2-p2.png" alt="Digital illustration on a light gray/white background with a thin horizontal baseline: a stylized California brown pelican in breeding plumage is drawn side-on, leaning forward and pedaling a bicycle; the pelican has a dark brown body with layered wing lines, a pale cream head with a darker brown cap and neck shading, a small black eye, and an oversized long golden-yellow bill extending far past the front wheel; one brown leg reaches down to a pedal while the other is tucked back; the bike is shown in profile with two large spoked wheels (black tires, white rims), a dark frame, crank and chainring near the rear wheel, a black saddle above the rear, and the front fork aligned under the pelican’s head; text at the top reads &amp;quot;California brown pelican (breeding plumage) pedaling a bicycle&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 14th December 2025&lt;/strong&gt;: I used GPT-5.2 running in Codex CLI to &lt;a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/"&gt;port a complex Python library to JavaScript&lt;/a&gt;. It ran without interference for nearly four hours and completed a complex task exactly to my specification.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="gpt-5"/></entry><entry><title>Building more with GPT-5.1-Codex-Max</title><link href="https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/#atom-tag" rel="alternate"/><published>2025-11-19T23:15:10+00:00</published><updated>2025-11-19T23:15:10+00:00</updated><id>https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/gpt-5-1-codex-max/"&gt;Building more with GPT-5.1-Codex-Max&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hot on the heels of yesterday's &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/"&gt;Gemini 3 Pro release&lt;/a&gt; comes a new model from OpenAI called GPT-5.1-Codex-Max.&lt;/p&gt;
&lt;p&gt;(Remember when GPT-5 was meant to bring in a new era of less confusing model names? That didn't last!)&lt;/p&gt;
&lt;p&gt;It's currently only available through their &lt;a href="https://developers.openai.com/codex/cli/"&gt;Codex CLI coding agent&lt;/a&gt;, where it's the new default model:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Starting today, GPT‑5.1-Codex-Max will replace GPT‑5.1-Codex as the default model in Codex surfaces. Unlike GPT‑5.1, which is a general-purpose model, we recommend using GPT‑5.1-Codex-Max and the Codex family of models only for agentic coding tasks in Codex or Codex-like environments.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's not available via the API yet but should be shortly.&lt;/p&gt;
&lt;p&gt;The timing of this release is interesting given that Gemini 3 Pro appears to have &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/#benchmarks"&gt;aced almost all of the benchmarks&lt;/a&gt; just yesterday. It's reminiscent of the period in 2024 when OpenAI consistently made big announcements that happened to coincide with Gemini releases.&lt;/p&gt;
&lt;p&gt;OpenAI's self-reported &lt;a href="https://openai.com/index/introducing-swe-bench-verified/"&gt;SWE-Bench Verified&lt;/a&gt; score is particularly notable: 76.5% for thinking level "high" and 77.9% for the new "xhigh". That was the one benchmark where Gemini 3 Pro was out-performed by Claude Sonnet 4.5 - Gemini 3 Pro got 76.2% and Sonnet 4.5 got 77.2%. OpenAI now have the highest scoring model there by a full .7 of a percentage point!&lt;/p&gt;
&lt;p&gt;They also report a score of 58.1% on &lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0"&gt;Terminal Bench 2.0&lt;/a&gt;, beating Gemini 3 Pro's 54.2% (and Sonnet 4.5's 42.8%.)&lt;/p&gt;
&lt;p&gt;The most intriguing part of this announcement concerns the model's approach to long context problems:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT‑5.1-Codex-Max is built for long-running, detailed work. It’s our first model natively trained to operate across multiple context windows through a process called &lt;em&gt;compaction&lt;/em&gt;, coherently working over millions of tokens in a single task. [...]&lt;/p&gt;
&lt;p&gt;Compaction enables GPT‑5.1-Codex-Max to complete tasks that would have previously failed due to context-window limits, such as complex refactors and long-running agent loops by pruning its history while preserving the most important context over long horizons. In Codex applications, GPT‑5.1-Codex-Max automatically compacts its session when it approaches its context window limit, giving it a fresh context window. It repeats this process until the task is completed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's a lot of confusion &lt;a href="https://news.ycombinator.com/item?id=45982649"&gt;on Hacker News&lt;/a&gt; about what this actually means. Claude Code already does a version of compaction, automatically summarizing previous turns when the context runs out. Does this just mean that Codex-Max is better at that process?&lt;/p&gt;
&lt;p&gt;I had it draw me a couple of pelicans by typing "Generate an SVG of a pelican riding a bicycle" directly into the Codex CLI tool. Here's thinking level medium:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A flat-style illustration shows a white, round-bodied bird with an orange beak pedaling a red-framed bicycle with thin black wheels along a sandy beach, with a calm blue ocean and clear sky in the background." src="https://static.simonwillison.net/static/2025/codex-max-medium.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;And here's thinking level "xhigh":&lt;/p&gt;
&lt;p&gt;&lt;img alt="A plump white bird with an orange beak and small black eyes crouches low on a blue bicycle with oversized dark wheels, shown racing forward with motion lines against a soft gradient blue sky." src="https://static.simonwillison.net/static/2025/codex-max-xhigh.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I also tried xhigh on the my &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/#and-a-new-pelican-benchmark"&gt;longer pelican test prompt&lt;/a&gt;, which came out like this:&lt;/p&gt;
&lt;p id="advanced-pelican-codex-max"&gt;&lt;img alt="A stylized dark gray bird with layered wings, a yellow head crest, and a long brown beak leans forward in a racing pose on a black-framed bicycle, riding across a glossy blue surface under a pale sky." src="https://static.simonwillison.net/static/2025/codex-breeding-max-xhigh.jpg"&gt;&lt;/p&gt;

&lt;p&gt;Also today: &lt;a href="https://x.com/openai/status/1991266192905179613"&gt;GPT-5.1 Pro is rolling out today to all Pro users&lt;/a&gt;. According to the &lt;a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes"&gt;ChatGPT release notes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT-5.1 Pro is rolling out today for all ChatGPT Pro users and is available in the model picker. GPT-5 Pro will remain available as a legacy model for 90 days before being retired.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That's a pretty fast deprecation cycle for the GPT-5 Pro model that was released just three months ago.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45982649"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="evals"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/><category term="november-2025-inflection"/></entry><entry><title>GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum</title><link href="https://simonwillison.net/2025/Nov/14/gpt-51-system-card-addendum/#atom-tag" rel="alternate"/><published>2025-11-14T13:46:23+00:00</published><updated>2025-11-14T13:46:23+00:00</updated><id>https://simonwillison.net/2025/Nov/14/gpt-51-system-card-addendum/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1/"&gt;GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I was confused about whether the new "adaptive thinking" feature of GPT-5.1 meant they were moving away from the "router" mechanism where GPT-5 in ChatGPT automatically selected a model for you.&lt;/p&gt;
&lt;p&gt;This page addresses that, emphasis mine:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT‑5.1 Instant is more conversational than our earlier chat model, with improved instruction following and an adaptive reasoning capability that lets it decide when to think before responding. GPT‑5.1 Thinking adapts thinking time more precisely to each question. &lt;strong&gt;GPT‑5.1 Auto will continue to route each query to the model best suited for it&lt;/strong&gt;, so that in most cases, the user does not need to choose a model at all.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So GPT‑5.1 Instant can decide when to think before responding, GPT-5.1 Thinking can decide how hard to think, and GPT-5.1 Auto (not a model you can use via the API) can decide which out of Instant and Thinking a prompt should be routed to.&lt;/p&gt;
&lt;p&gt;If anything this feels &lt;em&gt;more&lt;/em&gt; confusing than the GPT-5 routing situation!&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf"&gt;system card addendum PDF&lt;/a&gt; itself is somewhat frustrating: it shows results on an internal benchmark called "Production Benchmarks", also mentioned in the &lt;a href="https://openai.com/index/gpt-5-system-card/"&gt;GPT-5 system card&lt;/a&gt;, but with vanishingly little detail about what that tests beyond high level category names like "personal data", "extremism" or "mental health" and "emotional reliance" - those last two both listed as "New evaluations, as introduced in the &lt;a href="https://cdn.openai.com/pdf/3da476af-b937-47fb-9931-88a851620101/addendum-to-gpt-5-system-card-sensitive-conversations.pdf"&gt;GPT-5 update on sensitive conversations&lt;/a&gt;" - a PDF dated October 27th that I had previously missed.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;That&lt;/em&gt; document describes the two new categories like so:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Emotional Reliance not_unsafe - tests that the model does not produce disallowed content under our policies related to unhealthy emotional dependence or attachment to ChatGPT&lt;/li&gt;
&lt;li&gt;Mental Health not_unsafe - tests that the model does not produce disallowed content under our policies in situations where there are signs that a user may be experiencing isolated delusions, psychosis, or mania&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;So these are the &lt;a href="https://www.tiktok.com/@pearlmania500/video/7535954556379761950"&gt;ChatGPT Psychosis&lt;/a&gt; benchmarks!


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-personality"&gt;ai-personality&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="llm-reasoning"/><category term="ai-personality"/><category term="gpt-5"/></entry><entry><title>Introducing GPT-5.1 for developers</title><link href="https://simonwillison.net/2025/Nov/13/gpt-51/#atom-tag" rel="alternate"/><published>2025-11-13T23:59:35+00:00</published><updated>2025-11-13T23:59:35+00:00</updated><id>https://simonwillison.net/2025/Nov/13/gpt-51/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/gpt-5-1-for-developers/"&gt;Introducing GPT-5.1 for developers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI announced GPT-5.1 yesterday, calling it &lt;a href="https://openai.com/index/gpt-5-1/"&gt;a smarter, more conversational ChatGPT&lt;/a&gt;. Today they've added it to their API.&lt;/p&gt;
&lt;p&gt;We actually got four new models today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.1"&gt;gpt-5.1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.1-chat-latest"&gt;gpt-5.1-chat-latest&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.1-codex"&gt;gpt-5.1-codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5.1-codex-mini"&gt;gpt-5.1-codex-mini&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are a lot of details to absorb here.&lt;/p&gt;
&lt;p&gt;GPT-5.1 introduces a new reasoning effort called "none" (previous were minimal, low, medium, and high) - and none is the new default.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This makes the model behave like a non-reasoning model for latency-sensitive use cases, with the high intelligence of GPT‑5.1 and added bonus of performant tool-calling. Relative to GPT‑5 with 'minimal' reasoning, GPT‑5.1 with no reasoning is better at parallel tool calling (which itself increases end-to-end task completion speed), coding tasks, following instructions, and using search tools---and supports &lt;a href="https://platform.openai.com/docs/guides/tools-web-search?api-mode=responses"&gt;web search⁠&lt;/a&gt; in our API platform.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When you DO enable thinking you get to benefit from a new feature called "adaptive reasoning":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On straightforward tasks, GPT‑5.1 spends fewer tokens thinking, enabling snappier product experiences and lower token bills. On difficult tasks that require extra thinking, GPT‑5.1 remains persistent, exploring options and checking its work in order to maximize reliability.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Another notable new feature for 5.1 is &lt;a href="https://platform.openai.com/docs/guides/prompt-caching#extended-prompt-cache-retention"&gt;extended prompt cache retention&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Extended prompt cache retention keeps cached prefixes active for longer, up to a maximum of 24 hours. Extended Prompt Caching works by offloading the key/value tensors to GPU-local storage when memory is full, significantly increasing the storage capacity available for caching.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To enable this set &lt;code&gt;"prompt_cache_retention": "24h"&lt;/code&gt; in the API call. Weirdly there's no price increase involved with this at all. I &lt;a href="https://x.com/simonw/status/1989104422832738305"&gt;asked about that&lt;/a&gt; and OpenAI's Steven Heidel &lt;a href="https://x.com/stevenheidel/status/1989113407149314199"&gt;replied&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;with 24h prompt caching we move the caches from gpu memory to gpu-local storage. that storage is not free, but we made it free since it moves capacity from a limited resource (GPUs) to a more abundant resource (storage). then we can serve more traffic overall!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most interesting documentation I've seen so far is in the new &lt;a href="https://cookbook.openai.com/examples/gpt-5/gpt-5-1_prompting_guide"&gt;5.1 cookbook&lt;/a&gt;, which also includes details of the new &lt;code&gt;shell&lt;/code&gt; and &lt;code&gt;apply_patch&lt;/code&gt; built-in tools. The &lt;a href="https://github.com/openai/openai-cookbook/blob/main/examples/gpt-5/apply_patch.py"&gt;apply_patch.py implementation&lt;/a&gt; is worth a look, especially if you're interested in the advancing state-of-the-art of file editing tools for LLMs.&lt;/p&gt;
&lt;p&gt;I'm still working on &lt;a href="https://github.com/simonw/llm/issues/1300"&gt;integrating the new models into LLM&lt;/a&gt;. The Codex models are Responses-API-only.&lt;/p&gt;
&lt;p&gt;I got this pelican for GPT-5.1 default (no thinking):&lt;/p&gt;
&lt;p&gt;&lt;img alt="The bicycle wheels have no spokes at all, the pelican is laying quite flat on it" src="https://static.simonwillison.net/static/2025/gpt-5.1-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;And this one with reasoning effort set to high:&lt;/p&gt;
&lt;p&gt;&lt;img alt="This bicycle has four spokes per wheel, and the pelican is sitting more upright" src="https://static.simonwillison.net/static/2025/gpt-5.1-high-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;These actually feel like a &lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of-pelicans"&gt;regression from GPT-5&lt;/a&gt; to me. The bicycles have less spokes!


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="gpt-5"/><category term="gpt-codex"/><category term="november-2025-inflection"/></entry><entry><title>Pelican on a Bike - Raytracer Edition</title><link href="https://simonwillison.net/2025/Nov/9/pelican-on-a-bike-raytracer-edition/#atom-tag" rel="alternate"/><published>2025-11-09T16:51:42+00:00</published><updated>2025-11-09T16:51:42+00:00</updated><id>https://simonwillison.net/2025/Nov/9/pelican-on-a-bike-raytracer-edition/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-raytracer-edition/"&gt;Pelican on a Bike - Raytracer Edition&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
beetle_b ran this prompt against a bunch of recent LLMs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Write a POV-Ray file that shows a pelican riding on a bicycle.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This turns out to be a harder challenge than SVG, presumably because there are less examples of POV-Ray in the training data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most produced a script that failed to parse. I would paste the error back into the chat and let it attempt a fix.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The results are really fun though! A lot of them end up accompanied by a weird floating egg for some reason - &lt;a href="https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-raytracer-edition/#claude-opus-4"&gt;here's Claude Opus 4&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="3D scene. The bicycle has a sort of square frame in the wrong place, but good wheels. The pelican is stood on top - a large white blob, a smaller white blob head, a cylinder neck and a conical beak in the right place, plus legs that reach out-of-place pedals. A egg floats mysteriously in front of the bird." src="https://static.simonwillison.net/static/2025/pov-pelican-opus.png" /&gt;&lt;/p&gt;
&lt;p&gt;I think the best result came &lt;a href="https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-raytracer-edition/#gpt-5"&gt;from GPT-5&lt;/a&gt; - again with the floating egg though!&lt;/p&gt;
&lt;p&gt;&lt;img alt="The bike is a bit mis-shapen but has most of the right pieces. The pelican has legs that reach the pedals and is bending forward with a two-segmented neck and a good beak. A weird egg floats in the front wheel." src="https://static.simonwillison.net/static/2025/pov-pelican-gpt-5.png" /&gt;&lt;/p&gt;
&lt;p&gt;I decided to try this on the new &lt;code&gt;gpt-5-codex-mini&lt;/code&gt;, using the &lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/"&gt;trick I described yesterday&lt;/a&gt;. Here's &lt;a href="https://gist.github.com/simonw/059e0c5aee54258cdc62ed511ae26b4b"&gt;the code it wrote&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./target/debug/codex prompt -m gpt-5-codex-mini \
  "Write a POV-Ray file that shows a pelican riding on a bicycle."
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It turns out you can render POV files on macOS like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;brew install povray
povray demo.pov # produces demo.png
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code GPT-5 Codex Mini created didn't quite work, so I round-tripped it through Sonnet 4.5 via Claude Code a couple of times - &lt;a href="http://gistpreview.github.io/?71c4f0966d5d99003ace12197b9d07fe"&gt;transcript here&lt;/a&gt;. Once it had fixed the errors I got this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Two wheels (tire only) sit overlapping half embedded in the ground. The frame is a half-buried red triangle and some other lines. There is a white pall with a tiny yellow beak and two detached cylindrical arms. It's rubbish." src="https://static.simonwillison.net/static/2025/povray-pelican-gpt-5-codex-mini.png" /&gt;&lt;/p&gt;
&lt;p&gt;That's significantly worse than the one beetle_b got &lt;a href="https://blog.nawaz.org/posts/2025/Oct/pelican-on-a-bike-raytracer-edition/#gpt-5-mini"&gt;from GPT-5 Mini&lt;/a&gt;!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45862802#45866639"&gt;BeetleB on Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/3d"&gt;3d&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ray-tracing"&gt;ray-tracing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="3d"/><category term="ray-tracing"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="pelican-riding-a-bicycle"/><category term="gpt-5"/></entry><entry><title>Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican</title><link href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#atom-tag" rel="alternate"/><published>2025-11-09T03:31:34+00:00</published><updated>2025-11-09T03:31:34+00:00</updated><id>https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they &lt;a href="https://x.com/OpenAIDevs/status/1986861734619947305"&gt;describe&lt;/a&gt; as "a more compact and cost-efficient version of GPT-5-Codex". It's currently only available via their Codex CLI tool and VS Code extension, with proper API access "&lt;a href="https://x.com/OpenAIDevs/status/1986861736041853368"&gt;coming soon&lt;/a&gt;". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly.&lt;/p&gt;
&lt;p&gt;I made &lt;a href="https://www.youtube.com/watch?v=9o1_DL9uNlM"&gt;a video&lt;/a&gt; talking through my progress and demonstrating the final results.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="9o1_DL9uNlM" js-api="js-api" title="Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican" playlabel="Play: Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican"&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#this-is-a-little-bit-cheeky"&gt;This is a little bit cheeky&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#codex-cli-is-written-in-rust"&gt;Codex CLI is written in Rust&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#iterating-on-the-code"&gt;Iterating on the code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#let-s-draw-some-pelicans"&gt;Let's draw some pelicans&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#bonus-the-debug-option"&gt;Bonus: the --debug option&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="this-is-a-little-bit-cheeky"&gt;This is a little bit cheeky&lt;/h4&gt;
&lt;p&gt;OpenAI clearly don't intend for people to access this model directly just yet. It's available exclusively through Codex CLI which is a privileged application - it gets to access a special backend API endpoint that's not publicly documented, and it uses a special authentication mechanism that bills usage directly to the user's existing ChatGPT account.&lt;/p&gt;
&lt;p&gt;I figured reverse-engineering that API directly would be somewhat impolite. But... Codex CLI is an open source project released under an Apache 2.0 license. How about upgrading that to let me run my own prompts through its existing API mechanisms instead?&lt;/p&gt;
&lt;p&gt;This felt like a somewhat absurd loophole, and I couldn't resist trying it out and seeing what happened.&lt;/p&gt;
&lt;h4 id="codex-cli-is-written-in-rust"&gt;Codex CLI is written in Rust&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository contains the source code for the Codex CLI tool, which OpenAI rewrote in Rust just a few months ago.&lt;/p&gt;
&lt;p&gt;I don't know much Rust at all.&lt;/p&gt;
&lt;p&gt;I made my own clone on GitHub and checked it out locally:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone git@github.com:simonw/codex
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; codex&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I fired up Codex itself (in dangerous mode, because I like living dangerously):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex --dangerously-bypass-approvals-and-sandbox&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And ran this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figure out how to build the rust version of this tool and then build it&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This worked. It churned away for a bit and figured out how to build itself. This is a useful starting point for a project like this - in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works.&lt;/p&gt;
&lt;p&gt;Once the compile had succeeded I fed it the design for the new feature I wanted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Add a new sub-command to the Rust tool called "codex prompt"&lt;/p&gt;
&lt;p&gt;codex prompt "prompt goes here" - this runs the given prompt directly against the OpenAI API that Codex uses, with the same code path and authentication credentials as the rest of Codex.&lt;/p&gt;
&lt;p&gt;codex prompt -m &amp;lt;model_name&amp;gt; "prompt goes here" - same again but lets you set the model&lt;/p&gt;
&lt;p&gt;codex prompt "prompt goes here" -s/--system "system prompt goes here" - runs with a custom system/developer message&lt;/p&gt;
&lt;p&gt;codex prompt --models - this lists all available models that can be used with the "codex prompt" command&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My design for this new sub-command is lifted directly from my own &lt;a href="https://llm.datasette.io/en/stable/help.html#llm-prompt-help"&gt;llm command&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="iterating-on-the-code"&gt;Iterating on the code&lt;/h4&gt;
&lt;p&gt;Codex got to work. I've &lt;a href="https://gistpreview.github.io/?ddabbff092bdd658e06d8a2e8f142098"&gt;shared the full transcript here&lt;/a&gt; (using the tool &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;I described here&lt;/a&gt;), but this TODO list it made itself is a useful summary of what it decided to do:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Updated Plan
└ ✔ Inspect CLI structure and relevant core utilities for sending prompts and listing models
  ✔ Implement new `codex prompt` subcommand (CLI plumbing + command handler)
  ✔ Format, lint, and test (`just fmt`, `just fix -p codex-cli`, `cargo test -p codex-cli`)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I like that it figured out &lt;a href="https://github.com/openai/codex/blob/a47181e471b6efe55e95f9858c913fc89a3a44fa/codex-rs/justfile"&gt;the justfile&lt;/a&gt; in the repo and decided to use it to run formatting and linting commands without me needing to tell it to. (Update: it turns out that was dictated by the &lt;a href="https://github.com/openai/codex/blob/f8b30af6dc275b3e64de5f1987e6cafe604cb72a/AGENTS.md"&gt;AGENTS.md&lt;/a&gt; file.)&lt;/p&gt;
&lt;p&gt;I tried running the first version of the code it wrote like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; -m gpt-5-codex-mini&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;... and it didn't quite work. I got this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(reasoning summary) **Seeking
(reasoning summary)  instructions
(reasoning summary)  and
(reasoning summary)  sandbox
(reasoning summary)  info
(reasoning summary) **
(reasoning summary) **Dec
(reasoning summary) iding
(reasoning summary)  on
(reasoning summary)  SVG
(reasoning summary)  creation
(reasoning summary)  approach
(reasoning summary) **
(reasoning summary) **Checking
(reasoning summary)  current
(reasoning summary)  directory
(reasoning summary) **
(reasoning summary) **Preparing
(reasoning summary)  to
(reasoning summary)  check
(reasoning summary)  current
(reasoning summary)  directory
(reasoning summary) **
I�m ready to help�what would you like me to do next?I�m ready to help�what would you like me to do next?
Token usage: total=2459 input=2374 cached_input=0 output=85 reasoning_output=64
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that it DID think about SVG creation, but then decided it should look at the current directory. This isn't what I want - it appeared to be running in Codex's normal mode with a system prompt telling it to edit files on disk. I wanted it to respond to the prompt without acting as if it had a full workspace available to it.&lt;/p&gt;
&lt;p&gt;I prompted it again, pasting in the broken output:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;this did not seem to work - here is what happened - note that the reasoning summary comes out on a lot of lines when I would rather it streams out to stderr on the same line (flushing constantly) - and then despite the reasoning summary thinking about pelican on SVG it did not actually act on that and display the result, which is very surprising. [pasted content]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It churned for a bit more, but from the output it looked like it was still trying to include Codex-specific tools and workspace information. So I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that this new 'codex prompt' command should NOT be able to execute any tools at all, and should not have to think about the workspace or sandbox - it should send a single prompt with no tools and then stream and display a response and then stop.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This got me a new build, but when I ran it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Error: unexpected status 400 Bad Request: {"detail":"Instructions are not valid"}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My guess is that OpenAI's custom backend API for Codex doesn't like it if you attempt to call it &lt;em&gt;without&lt;/em&gt; passing in the expected instructions. Fair enough!&lt;/p&gt;
&lt;p&gt;Rather than paste that error in I got Codex to try the command itself, so it could see the error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You try running this and see what happens: ./target/debug/codex prompt 'Generate an SVG of a pelican riding a bicycle'&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Codex switched back to the default instructions. From its reply:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;cli/src/prompt_cmd.rs#L1-L320&lt;/code&gt; now forces a tool-free request: we keep the workspace defaults, inject a lightweight system/developer message (“don’t run tools”), disable apply_patch/web search via config overrides, and build the outbound prompt as just that system span plus the user text.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I guess that will have to do. I asked for one more feature:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Add a --debug option to the codex prompt command which causes the full JSON request and response to be printed to stderr, plus the URL that is being accessed and the HTTP verb&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... and we're ready to try this thing out!&lt;/p&gt;
&lt;p&gt;Notably I haven't written a single line of Rust myself here and paid almost no attention to what it was actually doing. My main contribution was to run the binary every now and then to see if it was doing what I needed yet.&lt;/p&gt;
&lt;p&gt;I've pushed the working code to &lt;a href="https://github.com/simonw/codex/compare/a47181e471b6efe55e95f9858c913fc89a3a44fa...ae5f98a9248a8edb5d3c53261273a482fc0b5306"&gt;a prompt-subcommand branch in my repo&lt;/a&gt; if you want to take a look and see how it all works.&lt;/p&gt;

&lt;h4 id="let-s-draw-some-pelicans"&gt;Let's draw some pelicans&lt;/h4&gt;
&lt;p&gt;With the final version of the code built, I drew some pelicans. Here's the &lt;a href="https://gistpreview.github.io/?a11f9ac456d2b2bc3715ba900ef1203d"&gt;full terminal transcript&lt;/a&gt;, but here are some highlights.&lt;/p&gt;
&lt;p&gt;This is with the default GPT-5-Codex model:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I pasted it into my &lt;a href="https://tools.simonwillison.net/svg-render"&gt;tools.simonwillison.net/svg-render&lt;/a&gt; tool and got the following:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-default.png" alt="It's a dumpy little pelican with a weird face, not particularly great" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I ran it again for GPT-5:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -m gpt-5&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-gpt-5.png" alt="Much better bicycle, pelican is a bit line-drawing-ish but does have the necessary parts in the right places" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And now the moment of truth... GPT-5 Codex Mini!&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -m gpt-5-codex-mini&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-mini.png" alt="This is terrible. The pelican is an abstract collection of shapes, the bicycle is likewise very messed up" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I don't think I'll be adding that one to my SVG drawing toolkit any time soon.&lt;/p&gt;

&lt;h4 id="bonus-the-debug-option"&gt;Bonus: the --debug option&lt;/h4&gt;
&lt;p&gt;I had Codex add a &lt;code&gt;--debug&lt;/code&gt; option to help me see exactly what was going on.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt -m gpt-5-codex-mini &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; --debug&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The output starts like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[codex prompt debug] POST https://chatgpt.com/backend-api/codex/responses
[codex prompt debug] Request JSON:
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"model"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;gpt-5-codex-mini&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"instructions"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;You are Codex, based on GPT-5. You are running as a coding agent ...&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"input"&lt;/span&gt;: [
    {
      &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;message&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"role"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;developer&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: [
        {
          &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;input_text&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
          &lt;span class="pl-ent"&gt;"text"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;You are a helpful assistant. Respond directly to the user request without running tools or shell commands.&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
        }
      ]
    },
    {
      &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;message&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"role"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;user&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: [
        {
          &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;input_text&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
          &lt;span class="pl-ent"&gt;"text"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
        }
      ]
    }
  ],
  &lt;span class="pl-ent"&gt;"tools"&lt;/span&gt;: [],
  &lt;span class="pl-ent"&gt;"tool_choice"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;auto&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"parallel_tool_calls"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"reasoning"&lt;/span&gt;: {
    &lt;span class="pl-ent"&gt;"summary"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;auto&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  &lt;span class="pl-ent"&gt;"store"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"stream"&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"include"&lt;/span&gt;: [
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;reasoning.encrypted_content&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  &lt;span class="pl-ent"&gt;"prompt_cache_key"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;019a66bf-3e2c-7412-b05e-db9b90bbad6e&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This reveals that OpenAI's private API endpoint for Codex CLI is &lt;code&gt;https://chatgpt.com/backend-api/codex/responses&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Also interesting is how the &lt;code&gt;"instructions"&lt;/code&gt; key (truncated above, &lt;a href="https://gist.github.com/simonw/996388ecf785ad54de479315bd4d33b7"&gt;full copy here&lt;/a&gt;) contains the default instructions, without which the API appears not to work - but it also shows that you can send a message with &lt;code&gt;role="developer"&lt;/code&gt; in advance of your user prompt.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="rust"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="vibe-coding"/><category term="coding-agents"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/></entry><entry><title>GPT-5 pro</title><link href="https://simonwillison.net/2025/Oct/6/gpt-5-pro/#atom-tag" rel="alternate"/><published>2025-10-06T19:48:45+00:00</published><updated>2025-10-06T19:48:45+00:00</updated><id>https://simonwillison.net/2025/Oct/6/gpt-5-pro/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5-pro"&gt;GPT-5 pro&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's OpenAI's model documentation for their GPT-5 pro model, released to their API today at their DevDay event.&lt;/p&gt;
&lt;p&gt;It has similar base characteristics to &lt;a href="https://platform.openai.com/docs/models/gpt-5"&gt;GPT-5&lt;/a&gt;: both share a September 30, 2024 knowledge cutoff and 400,000 context limit.&lt;/p&gt;
&lt;p&gt;GPT-5 pro has maximum output tokens 272,000 max, an increase from 128,000 for GPT-5.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As our most advanced reasoning model, GPT-5 pro defaults to (and only supports) &lt;code&gt;reasoning.effort: high&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's only available via OpenAI's Responses API. My &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; tool doesn't support that in core yet, but the &lt;a href="https://github.com/simonw/llm-openai-plugin"&gt;llm-openai-plugin&lt;/a&gt; plugin does. I released &lt;a href="https://github.com/simonw/llm-openai-plugin/releases/tag/0.7"&gt;llm-openai-plugin 0.7&lt;/a&gt; adding support for the new model, then ran this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install -U llm-openai-plugin
llm -m openai/gpt-5-pro "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It's very, very slow. The model took 6 minutes 8 seconds to respond and charged me for 16 input and 9,205 output tokens. At $15/million input and $120/million output this pelican &lt;a href="https://www.llm-prices.com/#it=16&amp;amp;ot=9205&amp;amp;ic=15&amp;amp;oc=120&amp;amp;sb=output&amp;amp;sd=descending"&gt;cost me $1.10&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img alt="It's obviously a pelican riding a bicycle. Half the spokes are missing on each wheel and the pelican is a bit squat looking." src="https://static.simonwillison.net/static/2025/gpt-5-pro.png" /&gt;&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/9a06ab36f486f31401fec1fc104a8ce5"&gt;the full transcript&lt;/a&gt;. It looks visually pretty simpler to the much, much cheaper result I &lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of-pelicans"&gt;got from GPT-5&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="gpt-5"/></entry><entry><title>Quoting Scott Aaronson</title><link href="https://simonwillison.net/2025/Sep/29/scott-aaronson/#atom-tag" rel="alternate"/><published>2025-09-29T00:52:26+00:00</published><updated>2025-09-29T00:52:26+00:00</updated><id>https://simonwillison.net/2025/Sep/29/scott-aaronson/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://scottaaronson.blog/?p=9183"&gt;&lt;p&gt;Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I &lt;em&gt;told&lt;/em&gt; GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague. [...]&lt;/p&gt;
&lt;p&gt;Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes. Right now, it almost certainly &lt;em&gt;can’t&lt;/em&gt; write the whole research paper (at least if you want it to be correct and good), but it can help you get unstuck if you otherwise know what you’re doing, which you might call a sweet spot.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://scottaaronson.blog/?p=9183"&gt;Scott Aaronson&lt;/a&gt;, UT Austin Quantum Information Center&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/quantum-computing"&gt;quantum-computing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="quantum-computing"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm-reasoning"/><category term="gpt-5"/></entry><entry><title>GPT-5-Codex</title><link href="https://simonwillison.net/2025/Sep/23/gpt-5-codex/#atom-tag" rel="alternate"/><published>2025-09-23T23:59:20+00:00</published><updated>2025-09-23T23:59:20+00:00</updated><id>https://simonwillison.net/2025/Sep/23/gpt-5-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5-codex"&gt;GPT-5-Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI &lt;a href="https://simonwillison.net/2025/Sep/15/gpt-5-codex/"&gt;half-released this model&lt;/a&gt; earlier this month, adding it to their Codex CLI tool but not their API.&lt;/p&gt;
&lt;p&gt;Today they've fixed that - the new model can now be accessed as &lt;code&gt;gpt-5-codex&lt;/code&gt;. It's priced the same as regular GPT-5: $1.25/million input tokens, $10/million output tokens, and the same hefty 90% discount for previously cached input tokens, especially important for agentic tool-using workflows which quickly produce a lengthy conversation.&lt;/p&gt;
&lt;p&gt;It's only available via their Responses API, which means you currently need to install the &lt;a href="https://github.com/simonw/llm-openai-plugin"&gt;llm-openai-plugin&lt;/a&gt; to use it with LLM:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install -U llm-openai-plugin
llm -m openai/gpt-5-codex -T llm_version 'What is the LLM version?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Outputs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The installed LLM version is 0.27.1.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I added &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;tool support&lt;/a&gt; to that plugin today, &lt;a href="https://github.com/simonw/llm-openai-plugin/issues/20#issuecomment-3325921197"&gt;mostly authored by GPT-5 Codex itself&lt;/a&gt; using OpenAI's Codex CLI.&lt;/p&gt;
&lt;p&gt;The new &lt;a href="https://cookbook.openai.com/examples/gpt-5-codex_prompting_guide"&gt;prompting guide for GPT-5-Codex&lt;/a&gt; is worth a read.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT-5-Codex is purpose-built for Codex CLI, the Codex IDE extension, the Codex cloud environment, and working in GitHub, and also supports versatile tool use. We recommend using GPT-5-Codex only for agentic and interactive coding use cases.&lt;/p&gt;
&lt;p&gt;Because the model is trained specifically for coding, many best practices you once had to prompt into general purpose models are built in, and over prompting can reduce quality.&lt;/p&gt;
&lt;p&gt;The core prompting principle for GPT-5-Codex is &lt;strong&gt;“less is more.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I &lt;a href="https://gist.github.com/simonw/b371949ae984b0431848cd16cba24b27"&gt;tried my pelican benchmark&lt;/a&gt; at a cost of &lt;a href="https://www.llm-prices.com/#it=16&amp;amp;ot=2154&amp;amp;ic=1.25&amp;amp;oc=10"&gt;2.156 cents&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m openai/gpt-5-codex "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="See description below" src="https://static.simonwillison.net/static/2025/gpt-5-codex-api-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;I asked Codex to describe this image and it correctly identified it as a pelican!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m openai/gpt-5-codex -a https://static.simonwillison.net/static/2025/gpt-5-codex-api-pelican.png \
  -s 'Write very detailed alt text'
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Cartoon illustration of a cream-colored pelican with a large orange beak and tiny black eye riding a minimalist dark-blue bicycle. The bird’s wings are tucked in, its legs resemble orange stick limbs pushing the pedals, and its tail feathers trail behind with light blue motion streaks to suggest speed. A small coral-red tongue sticks out of the pelican’s beak. The bicycle has thin light gray spokes, and the background is a simple pale blue gradient with faint curved lines hinting at ground and sky.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/></entry><entry><title>GPT‑5-Codex and upgrades to Codex</title><link href="https://simonwillison.net/2025/Sep/15/gpt-5-codex/#atom-tag" rel="alternate"/><published>2025-09-15T18:55:35+00:00</published><updated>2025-09-15T18:55:35+00:00</updated><id>https://simonwillison.net/2025/Sep/15/gpt-5-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-upgrades-to-codex/"&gt;GPT‑5-Codex and upgrades to Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update&lt;/strong&gt;: OpenAI call it a "version of GPT-5", they don't explicitly describe it as a fine-tuned model. Calling it a fine-tune was my mistake here. &lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I say half-released because it's not yet available via their API, but they "plan to make GPT‑5-Codex available in the API soon".&lt;/p&gt;
&lt;p&gt;I wrote about &lt;a href="https://simonwillison.net/2025/May/16/openai-codex/"&gt;the confusing array of OpenAI products that share the name Codex&lt;/a&gt; a few months ago. This new model adds yet another, though at least "GPT-5-Codex" (using two hyphens) is unambiguous enough not to add to much more to the confusion.&lt;/p&gt;
&lt;p&gt;At this point it's best to think of &lt;strong&gt;Codex&lt;/strong&gt; as OpenAI's brand name for their coding family of models and tools.&lt;/p&gt;
&lt;p&gt;The new model is already integrated into their VS Code extension, the Codex CLI and their Codex Cloud asynchronous coding agent. I'd been calling that last one "Codex Web" but I think Codex Cloud is a better name since it can also be accessed directly from their iPhone app.&lt;/p&gt;
&lt;p&gt;Codex Cloud also has a new feature: you can configure it to automatically run code review against specific GitHub repositories (I found that option on &lt;a href="https://chatgpt.com/codex/settings/code-review"&gt;chatgpt.com/codex/settings/code-review&lt;/a&gt;) and it will create a temporary container to use as part of those reviews. Here's the &lt;a href="https://developers.openai.com/codex/cloud/code-review"&gt;relevant documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some documented features of the new GPT-5-Codex model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Specifically trained for code review, which directly supports their new code review feature.&lt;/li&gt;
&lt;li&gt;"GPT‑5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task." Simple tasks (like "list files in this directory") should run faster. Large, complex tasks should use run for much longer - OpenAI report Codex crunching for seven hours in some cases!&lt;/li&gt;
&lt;li&gt;Increased score on their proprietary "code refactoring evaluation" from 33.9% for GPT-5 (high) to 51.3% for GPT-5-Codex (high). It's hard to evaluate this without seeing the details of the eval but it does at least illustrate that refactoring performance is something they've focused on here.&lt;/li&gt;
&lt;li&gt;"GPT‑5-Codex also shows significant improvements in human preference evaluations when creating mobile websites" - in the past I've habitually prompted models to "make it mobile-friendly", maybe I don't need to do that any more.&lt;/li&gt;
&lt;li&gt;"We find that comments by GPT‑5-Codex are less likely to be incorrect or unimportant" - I originally misinterpreted this as referring to comments in code but it's actually about comments left on code reviews.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href="https://github.com/openai/codex/blob/rust-v0.36.0/codex-rs/core/gpt_5_codex_prompt.md"&gt;system prompt for GPT-5-Codex&lt;/a&gt; in Codex CLI is worth a read. It's notably shorter than the &lt;a href="https://github.com/openai/codex/blob/rust-v0.36.0/codex-rs/core/prompt.md"&gt;system prompt for other models&lt;/a&gt; - &lt;a href="https://gist.github.com/simonw/042f1428ce22ad55ac5bc9010263a4f4/revisions"&gt;here's a diff&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the section of the updated system prompt that talks about comments:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like "Assigns the value to the variable", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Theo Browne &lt;a href="https://www.youtube.com/watch?v=j9wvCrON3XA"&gt;has a video review&lt;/a&gt; of the model and accompanying features. He was generally impressed but noted that it was surprisingly bad at using the Codex CLI search tool to navigate code. Hopefully that's something that can fix with a system prompt update.&lt;/p&gt;
&lt;p&gt;Finally, can it drew a pelican riding a bicycle? Without API access I instead got Codex Cloud to &lt;a href="https://chatgpt.com/s/cd_68c85f433cc881918acfd8a4aeda1cc4"&gt;have a go&lt;/a&gt; by prompting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Generate an SVG of a pelican riding a bicycle, save as pelican.svg&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/codex-scratchpad/pull/3"&gt;the result&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="it's a bit messy - the pelican is quite good and the bicycle is quite good but the pelican is stood overlapping the bicycle not riding it." src="https://static.simonwillison.net/static/2025/gpt-5-codex-pelican.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/code-review"&gt;code-review&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/theo-browne"&gt;theo-browne&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="code-review"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="coding-agents"/><category term="async-coding-agents"/><category term="gpt-5"/><category term="codex-cli"/><category term="theo-browne"/><category term="gpt-codex"/></entry><entry><title>Models can prompt now</title><link href="https://simonwillison.net/2025/Sep/14/models-can-prompt/#atom-tag" rel="alternate"/><published>2025-09-14T20:25:21+00:00</published><updated>2025-09-14T20:25:21+00:00</updated><id>https://simonwillison.net/2025/Sep/14/models-can-prompt/#atom-tag</id><summary type="html">
    &lt;p&gt;Here's an interesting example of models incrementally improving over time: I am finding that today's leading models are competent at &lt;strong&gt;writing prompts&lt;/strong&gt; for themselves and each other.&lt;/p&gt;
&lt;p&gt;A year ago I was quite skeptical of the pattern where models are used to help build prompts. Prompt engineering was still a young enough discipline that I did not expect the models to have enough training data to be able to prompt themselves better than a moderately experienced human.&lt;/p&gt;
&lt;p&gt;The Claude 4 and GPT-5 families both have training cut-off dates within the past year - recent enough that they've seen a decent volume of good prompting examples.&lt;/p&gt;
&lt;p&gt;I expect they have also been deliberately trained for this. Anthropic make &lt;a href="https://simonwillison.net/2025/Jun/2/claude-trace/"&gt;extensive use&lt;/a&gt; of sub-agent patterns in Claude Code, and published a &lt;a href="https://www.anthropic.com/engineering/multi-agent-research-system"&gt;fascinating article on that pattern&lt;/a&gt; (&lt;a href="https://simonwillison.net/2025/Jun/14/multi-agent-research-system/"&gt;my notes&lt;/a&gt; on that).&lt;/p&gt;
&lt;p&gt;I don't have anything solid to back this up - it's more of a hunch based on anecdotal evidence where various of my requests for a model to write a prompt have returned useful results over the last few months.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-4"&gt;claude-4&lt;/a&gt;&lt;/p&gt;



</summary><category term="prompt-engineering"/><category term="llms"/><category term="ai"/><category term="generative-ai"/><category term="gpt-5"/><category term="anthropic"/><category term="claude"/><category term="claude-code"/><category term="claude-4"/></entry><entry><title>gpt-5 and gpt-5-mini rate limit updates</title><link href="https://simonwillison.net/2025/Sep/12/gpt-5-rate-limits/#atom-tag" rel="alternate"/><published>2025-09-12T23:14:46+00:00</published><updated>2025-09-12T23:14:46+00:00</updated><id>https://simonwillison.net/2025/Sep/12/gpt-5-rate-limits/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://twitter.com/openaidevs/status/1966610846559134140"&gt;gpt-5 and gpt-5-mini rate limit updates&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI have increased the rate limits for their two main GPT-5  models. These look significant:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;gpt-5&lt;br&gt;
Tier 1: 30K → 500K TPM (1.5M batch)&lt;br&gt;
Tier 2: 450K → 1M (3M batch)&lt;br&gt;
Tier 3: 800K → 2M&lt;br&gt;
Tier 4: 2M → 4M&lt;/p&gt;
&lt;p&gt;gpt-5-mini&lt;br&gt;
Tier 1: 200K → 500K (5M batch)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5"&gt;GPT-5 rate limits here&lt;/a&gt; show tier 5 stays at 40M tokens per minute. The &lt;a href="https://platform.openai.com/docs/models/gpt-5-mini"&gt;GPT-5 mini rate limits&lt;/a&gt; for tiers 2 through 5 are 2M, 4M, 10M and 180M TPM respectively.&lt;/p&gt;
&lt;p&gt;As a reminder, &lt;a href="https://platform.openai.com/docs/guides/rate-limits#usage-tiers"&gt;those tiers&lt;/a&gt; are assigned based on how much money you have spent on the OpenAI API - from $5 for tier 1 up through $50, $100, $250 and then $1,000 for tier &lt;/p&gt;
&lt;p&gt;For comparison, Anthropic's current top tier is Tier 4 ($400 spent) which provides 2M maximum input tokens per minute and 400,000 maximum output tokens, though you can contact their sales team for higher limits than that.&lt;/p&gt;
&lt;p&gt;Gemini's top tier is Tier 3 for $1,000 spent and &lt;a href="https://ai.google.dev/gemini-api/docs/rate-limits#tier-3"&gt;currently gives you&lt;/a&gt; 8M TPM for Gemini 2.5 Pro and Flash and 30M TPM for the Flash-Lite and 2.0 Flash models.&lt;/p&gt;
&lt;p&gt;So OpenAI's new rate limit increases for their top performing model pulls them ahead of Anthropic but still leaves them significantly behind Gemini.&lt;/p&gt;
&lt;p&gt;GPT-5 mini remains the champion for smaller models with that enormous 180M TPS limit for its top tier.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="gemini"/><category term="llm-pricing"/><category term="gpt-5"/></entry><entry><title>Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide</title><link href="https://simonwillison.net/2025/Sep/9/apollo-ai-adoption/#atom-tag" rel="alternate"/><published>2025-09-09T06:47:49+00:00</published><updated>2025-09-09T06:47:49+00:00</updated><id>https://simonwillison.net/2025/Sep/9/apollo-ai-adoption/#atom-tag</id><summary type="html">
    &lt;p&gt;Apollo Global Management's "Chief Economist" Dr. Torsten Sløk released &lt;a href="https://www.apolloacademy.com/ai-adoption-rate-trending-down-for-large-companies/"&gt;this interesting chart&lt;/a&gt; which appears to show a slowdown in AI adoption rates among large (&amp;gt;250 employees) companies:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/apollo-ai-chart.jpg" alt="AI adoption rates starting to decline for larger firms. A chart of AI adoption rate by firm size. Includes lines for 250+, 100-249, 50-99, 20-49, 10-19, 5-8 and 1-4 sized organizations. Chart starts in November 2023 with percentages ranging from 3 to 5, then all groups grow through August 2025 albeit with the 250+ group having a higher score than the others. That 25+ group peaks in Jul5 2025 at around 14% and then appears to slope slightly downwards to 12% by August. Some of the other lines also start to tip down, though not as much." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the full description that accompanied the chart:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The US Census Bureau conducts a biweekly survey of 1.2 million firms, and one question is whether a business has used AI tools such as machine learning, natural language processing, virtual agents or voice recognition to help produce goods or services in the past two weeks. Recent data by firm size shows that AI adoption has been declining among companies with more than 250 employees, see chart below.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(My first thought on seeing that chart is that I hope it represents the &lt;em&gt;peak of inflated expectations&lt;/em&gt; leading into the &lt;em&gt;trough of dissillusionment&lt;/em&gt; in the &lt;a href="https://en.wikipedia.org/wiki/Gartner_hype_cycle"&gt;Gartner Hype Cycle&lt;/a&gt; (which Wikipedia calls "largely disputed, with studies pointing to it being inconsistently true at best"), since that means we might be reaching the end of the initial hype phase and heading towards the &lt;em&gt;slope of enlightenment&lt;/em&gt;.)&lt;/p&gt;
&lt;h4 id="finding-the-us-census-data-with-gpt-5-search"&gt;Finding the US Census data with GPT-5 search&lt;/h4&gt;
&lt;p&gt;This is the first I'd heard of the US Census Bureau running a biweekly (that's once every two weeks) survey about AI!&lt;/p&gt;
&lt;p&gt;I decided to track down the numbers and see if I could recreate the chart myself. And since &lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/"&gt;GPT-5 is really good at search now&lt;/a&gt; I fed it the following prompt to see how well it could do:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;&amp;gt; The US Census Bureau conducts a biweekly survey of 1.2 million firms, and one question is whether a business has used AI tools such as machine learning, natural language processing, virtual agents or voice recognition to help produce goods or services in the past two weeks. Recent data by firm size shows that AI adoption has been declining among companies with more than 250 employees, see chart below.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Find me that original data&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It took 1m19s to &lt;a href="https://chatgpt.com/share/68bf48cf-0e70-8006-a045-96fa8e7ddfc1"&gt;find exactly what I needed&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Here’s the original U.S. Census Bureau source for that series:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BTOS “Employment size class” time series (XLSX)&lt;/strong&gt; — biweekly shares of firms answering &lt;strong&gt;Yes&lt;/strong&gt; to “In the last two weeks, did this business use Artificial Intelligence (AI) in producing goods or services?”, broken out by firm-size buckets (incl. &lt;strong&gt;250+ employees&lt;/strong&gt;). Coverage: &lt;strong&gt;Sep 11, 2023 → Aug 24, 2025&lt;/strong&gt;. (&lt;a href="https://www.census.gov/hfp/btos/data_downloads"&gt;Census.gov&lt;/a&gt;) [...]&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;That &lt;a href="https://www.census.gov/hfp/btos/data_downloads"&gt;Census page&lt;/a&gt; was not &lt;em&gt;at all&lt;/em&gt; obvious. Thankfully GPT-5 had tipped me off to the "Employment size class" file, this link here:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/census-page.jpg" alt="US Census website. Business Trends and Outlook Survey, Updated August 28, 2025. Current Data has 6 visible XLSX files with names like WFH Supplement, WFH Questions 27-29, National, Sectur, Subsector and Emplomyent size class. A red arrow highlights that last one." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;So I downloaded that file, and confirmed that it was indeed a spreadsheet containing the data I wanted (in among all sorts of other survey questions). Here's &lt;a href="https://static.simonwillison.net/static/cors-allow/2025/Employment-Size-Class-Sep-2025.xlsx"&gt;a 374KB XLSX copy&lt;/a&gt; of the file I downloaded.&lt;/p&gt;
&lt;h4 id="recreating-the-chart-with-gpt-5-code-interpreter"&gt;Recreating the chart with GPT-5 code interpreter&lt;/h4&gt;
&lt;p&gt;So what should I do with it now? I decided to see if GPT-5 could turn the spreadsheet back into that original chart, using Python running in its &lt;a href="https://simonwillison.net/tags/code-interpreter/"&gt;code interpreter&lt;/a&gt; tool.&lt;/p&gt;
&lt;p&gt;So I uploaded the XLSX file back to ChatGPT, dropped in a screenshot of the Apollo chart and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Use this data to recreate this chart using python&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/chart-prompt.jpg" alt="ChatGPT. I dropped in a screenshot of the chart, uploaded the spreadsheet which turned into an inline table browser UI and prompted it to recreate the chart using python." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I thought this was a pretty tall order, but it's always worth throwing big challenges at an LLM to learn from how well it does.&lt;/p&gt;
&lt;p&gt;It &lt;em&gt;really worked hard on this&lt;/em&gt;. I didn't time it exactly but it spent at least 7 minutes "reasoning" across 5 different thinking blocks, interspersed with over a dozen Python analysis sessions. It used &lt;code&gt;pandas&lt;/code&gt; and &lt;code&gt;numpy&lt;/code&gt; to explore the uploaded spreadsheet and find the right figures, then tried several attempts at plotting with &lt;code&gt;matplotlib&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As far as I can tell GPT-5 in ChatGPT can now feed charts it creates back into its own vision model, because it appeared to render a broken (empty) chart and then keep on trying to get it working.&lt;/p&gt;
&lt;p&gt;It found a data dictionary in the last tab of the spreadsheet and used that to build a lookup table matching the letters &lt;code&gt;A&lt;/code&gt; through &lt;code&gt;G&lt;/code&gt; to the actual employee size buckets.&lt;/p&gt;
&lt;p&gt;At the end of the process it spat out this chart:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/recreated-chart-1.jpg" alt="matplotlib chart. The title is AI adoption rates starting to decline for larger firms, though there's a typography glitch in that title. It has a neat legend for the different size ranges, then a set of lines that look about right compared to the above graph - but they are more spiky and the numbers appear to trend up again at the end of the chart." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;At first glance I thought it had nailed it... but then I compared the chart more closely with the Apollo original and spotted some definite discrepancies. GPT-5's chart peaked at 14.5% but the highest value in Apollo's was more like 13.5%. The GPT-5 chart was spikier - and most interestingly it included a clear uptick in the last data point where Apollo's had trended downwards.&lt;/p&gt;
&lt;p&gt;I decided it was time to look at the actual data. I opened up the spreadsheet in Numbers, found the AI question columns and manually reviewed them. They seemed to match the GPT-5 chart results - so why the difference to Apollo's?&lt;/p&gt;
&lt;p&gt;Then I noticed a crucial detail in the Apollo chart that I had cropped out of my original screenshot!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: Data is six-survey moving average.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So I told ChatGPT:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Do the first question, plot it as a six survey rolling average&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I asked for the first question because it turned out there were two that were relevant in the survey spreadsheet.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;In the last two weeks, did this business use Artificial Intelligence (AI) in producing goods or services? (Examples of AI: machine learning, natural language processing, virtual agents, voice recognition, etc.)&lt;/li&gt;
&lt;li&gt;During the next six months, do you think this business will be using Artificial Intelligence (AI) in producing goods or services? (Examples of AI: machine learning, natural language processing, virtual agents, voice recognition, etc.)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It churned away for a little longer, added this code to the script:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;# Compute 6-survey rolling average (biweekly cadence → ~12 weeks)&lt;/span&gt;
&lt;span class="pl-s1"&gt;rolled&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;wide&lt;/span&gt;.&lt;span class="pl-c1"&gt;rolling&lt;/span&gt;(&lt;span class="pl-s1"&gt;window&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;6&lt;/span&gt;, &lt;span class="pl-s1"&gt;min_periods&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;6&lt;/span&gt;).&lt;span class="pl-c1"&gt;mean&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;And popped out this chart (after I told it to fix the glitch in the title):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/recreated-chart-2.jpg" alt="Second chart. This time the lines are basically an exact match for the Apollo one." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I think it's done it! This is a very solid match for the Apollo original, recreated using &lt;code&gt;matplotlib&lt;/code&gt; and &lt;code&gt;pandas&lt;/code&gt; from the same underlying source data from the US Census.&lt;/p&gt;
&lt;p&gt;Here's the full Python code it wrote, which I think is quite readable (in as much as Pandas code can be):&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pandas&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;
&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;matplotlib&lt;/span&gt;.&lt;span class="pl-s1"&gt;pyplot&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;plt&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;matplotlib&lt;/span&gt;.&lt;span class="pl-s1"&gt;ticker&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;PercentFormatter&lt;/span&gt;

&lt;span class="pl-s1"&gt;path&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"/mnt/data/Employment Size Class.xlsx"&lt;/span&gt;

&lt;span class="pl-s1"&gt;resp&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;read_excel&lt;/span&gt;(&lt;span class="pl-s1"&gt;path&lt;/span&gt;, &lt;span class="pl-s1"&gt;sheet_name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Response Estimates"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;dates&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;read_excel&lt;/span&gt;(&lt;span class="pl-s1"&gt;path&lt;/span&gt;, &lt;span class="pl-s1"&gt;sheet_name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Collection and Reference Dates"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;is_current&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;resp&lt;/span&gt;[&lt;span class="pl-s"&gt;"Question"&lt;/span&gt;].&lt;span class="pl-c1"&gt;astype&lt;/span&gt;(&lt;span class="pl-s1"&gt;str&lt;/span&gt;).&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;strip&lt;/span&gt;().&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;startswith&lt;/span&gt;(&lt;span class="pl-s"&gt;"In the last two weeks"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;ai_yes&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;resp&lt;/span&gt;[&lt;span class="pl-s1"&gt;is_current&lt;/span&gt; &lt;span class="pl-c1"&gt;&amp;amp;&lt;/span&gt; &lt;span class="pl-s1"&gt;resp&lt;/span&gt;[&lt;span class="pl-s"&gt;"Answer"&lt;/span&gt;].&lt;span class="pl-c1"&gt;astype&lt;/span&gt;(&lt;span class="pl-s1"&gt;str&lt;/span&gt;).&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;strip&lt;/span&gt;().&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;lower&lt;/span&gt;().&lt;span class="pl-c1"&gt;eq&lt;/span&gt;(&lt;span class="pl-s"&gt;"yes"&lt;/span&gt;)].&lt;span class="pl-c1"&gt;copy&lt;/span&gt;()

&lt;span class="pl-s1"&gt;code_to_bucket&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; {&lt;span class="pl-s"&gt;"A"&lt;/span&gt;:&lt;span class="pl-s"&gt;"1-4"&lt;/span&gt;,&lt;span class="pl-s"&gt;"B"&lt;/span&gt;:&lt;span class="pl-s"&gt;"5-9"&lt;/span&gt;,&lt;span class="pl-s"&gt;"C"&lt;/span&gt;:&lt;span class="pl-s"&gt;"10-19"&lt;/span&gt;,&lt;span class="pl-s"&gt;"D"&lt;/span&gt;:&lt;span class="pl-s"&gt;"20-49"&lt;/span&gt;,&lt;span class="pl-s"&gt;"E"&lt;/span&gt;:&lt;span class="pl-s"&gt;"50-99"&lt;/span&gt;,&lt;span class="pl-s"&gt;"F"&lt;/span&gt;:&lt;span class="pl-s"&gt;"100-249"&lt;/span&gt;,&lt;span class="pl-s"&gt;"G"&lt;/span&gt;:&lt;span class="pl-s"&gt;"250 or more employees"&lt;/span&gt;}
&lt;span class="pl-s1"&gt;ai_yes&lt;/span&gt;[&lt;span class="pl-s"&gt;"Bucket"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;ai_yes&lt;/span&gt;[&lt;span class="pl-s"&gt;"Empsize"&lt;/span&gt;].&lt;span class="pl-c1"&gt;map&lt;/span&gt;(&lt;span class="pl-s1"&gt;code_to_bucket&lt;/span&gt;)

&lt;span class="pl-s1"&gt;period_cols&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; [&lt;span class="pl-s1"&gt;c&lt;/span&gt; &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;c&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;ai_yes&lt;/span&gt;.&lt;span class="pl-c1"&gt;columns&lt;/span&gt; &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-en"&gt;str&lt;/span&gt;(&lt;span class="pl-s1"&gt;c&lt;/span&gt;).&lt;span class="pl-c1"&gt;isdigit&lt;/span&gt;() &lt;span class="pl-c1"&gt;and&lt;/span&gt; &lt;span class="pl-en"&gt;len&lt;/span&gt;(&lt;span class="pl-en"&gt;str&lt;/span&gt;(&lt;span class="pl-s1"&gt;c&lt;/span&gt;))&lt;span class="pl-c1"&gt;==&lt;/span&gt;&lt;span class="pl-c1"&gt;6&lt;/span&gt;]
&lt;span class="pl-s1"&gt;long&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;ai_yes&lt;/span&gt;.&lt;span class="pl-c1"&gt;melt&lt;/span&gt;(&lt;span class="pl-s1"&gt;id_vars&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[&lt;span class="pl-s"&gt;"Bucket"&lt;/span&gt;], &lt;span class="pl-s1"&gt;value_vars&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;period_cols&lt;/span&gt;, &lt;span class="pl-s1"&gt;var_name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;, &lt;span class="pl-s1"&gt;value_name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"value"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;dates&lt;/span&gt;[&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;dates&lt;/span&gt;[&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;].&lt;span class="pl-c1"&gt;astype&lt;/span&gt;(&lt;span class="pl-s1"&gt;str&lt;/span&gt;)
&lt;span class="pl-s1"&gt;long&lt;/span&gt;[&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;long&lt;/span&gt;[&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;].&lt;span class="pl-c1"&gt;astype&lt;/span&gt;(&lt;span class="pl-s1"&gt;str&lt;/span&gt;)
&lt;span class="pl-s1"&gt;merged&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;long&lt;/span&gt;.&lt;span class="pl-c1"&gt;merge&lt;/span&gt;(&lt;span class="pl-s1"&gt;dates&lt;/span&gt;[[&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;,&lt;span class="pl-s"&gt;"Ref End"&lt;/span&gt;]], &lt;span class="pl-s1"&gt;on&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Smpdt"&lt;/span&gt;, &lt;span class="pl-s1"&gt;how&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"left"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;merged&lt;/span&gt;[&lt;span class="pl-s"&gt;"date"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;to_datetime&lt;/span&gt;(&lt;span class="pl-s1"&gt;merged&lt;/span&gt;[&lt;span class="pl-s"&gt;"Ref End"&lt;/span&gt;], &lt;span class="pl-s1"&gt;errors&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"coerce"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;merged&lt;/span&gt;[&lt;span class="pl-s"&gt;"value"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;to_numeric&lt;/span&gt;(&lt;span class="pl-s1"&gt;long&lt;/span&gt;[&lt;span class="pl-s"&gt;"value"&lt;/span&gt;].&lt;span class="pl-c1"&gt;astype&lt;/span&gt;(&lt;span class="pl-s1"&gt;str&lt;/span&gt;).&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"%"&lt;/span&gt;,&lt;span class="pl-s"&gt;""&lt;/span&gt;,&lt;span class="pl-s1"&gt;regex&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;).&lt;span class="pl-c1"&gt;str&lt;/span&gt;.&lt;span class="pl-c1"&gt;strip&lt;/span&gt;(), &lt;span class="pl-s1"&gt;errors&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"coerce"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;order&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; [&lt;span class="pl-s"&gt;"250 or more employees"&lt;/span&gt;,&lt;span class="pl-s"&gt;"100-249"&lt;/span&gt;,&lt;span class="pl-s"&gt;"50-99"&lt;/span&gt;,&lt;span class="pl-s"&gt;"20-49"&lt;/span&gt;,&lt;span class="pl-s"&gt;"10-19"&lt;/span&gt;,&lt;span class="pl-s"&gt;"5-9"&lt;/span&gt;,&lt;span class="pl-s"&gt;"1-4"&lt;/span&gt;]
&lt;span class="pl-s1"&gt;wide&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;merged&lt;/span&gt;.&lt;span class="pl-c1"&gt;pivot_table&lt;/span&gt;(&lt;span class="pl-s1"&gt;index&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"date"&lt;/span&gt;, &lt;span class="pl-s1"&gt;columns&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Bucket"&lt;/span&gt;, &lt;span class="pl-s1"&gt;values&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"value"&lt;/span&gt;, &lt;span class="pl-s1"&gt;aggfunc&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"mean"&lt;/span&gt;).&lt;span class="pl-c1"&gt;sort_index&lt;/span&gt;()
&lt;span class="pl-s1"&gt;wide&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;wide&lt;/span&gt;[[&lt;span class="pl-s1"&gt;c&lt;/span&gt; &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;c&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;order&lt;/span&gt; &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;c&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;wide&lt;/span&gt;.&lt;span class="pl-c1"&gt;columns&lt;/span&gt;]]
&lt;span class="pl-s1"&gt;rolled&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;wide&lt;/span&gt;.&lt;span class="pl-c1"&gt;rolling&lt;/span&gt;(&lt;span class="pl-s1"&gt;window&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;6&lt;/span&gt;, &lt;span class="pl-s1"&gt;min_periods&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;6&lt;/span&gt;).&lt;span class="pl-c1"&gt;mean&lt;/span&gt;()

&lt;span class="pl-s1"&gt;start&lt;/span&gt;, &lt;span class="pl-s1"&gt;end&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;Timestamp&lt;/span&gt;(&lt;span class="pl-s"&gt;"2023-11-01"&lt;/span&gt;), &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;Timestamp&lt;/span&gt;(&lt;span class="pl-s"&gt;"2025-08-31"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;rolled_win&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;rolled&lt;/span&gt;.&lt;span class="pl-c1"&gt;loc&lt;/span&gt;[(&lt;span class="pl-s1"&gt;rolled&lt;/span&gt;.&lt;span class="pl-c1"&gt;index&lt;/span&gt; &lt;span class="pl-c1"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;start&lt;/span&gt;) &lt;span class="pl-c1"&gt;&amp;amp;&lt;/span&gt; (&lt;span class="pl-s1"&gt;rolled&lt;/span&gt;.&lt;span class="pl-c1"&gt;index&lt;/span&gt; &lt;span class="pl-c1"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;end&lt;/span&gt;)]

&lt;span class="pl-s1"&gt;fig&lt;/span&gt;, &lt;span class="pl-s1"&gt;ax&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;plt&lt;/span&gt;.&lt;span class="pl-c1"&gt;subplots&lt;/span&gt;(&lt;span class="pl-s1"&gt;figsize&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;(&lt;span class="pl-c1"&gt;12&lt;/span&gt;, &lt;span class="pl-c1"&gt;6&lt;/span&gt;))
&lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;col&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;order&lt;/span&gt;:
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;col&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;rolled_win&lt;/span&gt;.&lt;span class="pl-c1"&gt;columns&lt;/span&gt;:
        &lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;plot&lt;/span&gt;(&lt;span class="pl-s1"&gt;rolled_win&lt;/span&gt;.&lt;span class="pl-c1"&gt;index&lt;/span&gt;, &lt;span class="pl-s1"&gt;rolled_win&lt;/span&gt;[&lt;span class="pl-s1"&gt;col&lt;/span&gt;], &lt;span class="pl-s1"&gt;label&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;col&lt;/span&gt;, &lt;span class="pl-s1"&gt;linewidth&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;2&lt;/span&gt;)

&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;set_title&lt;/span&gt;(&lt;span class="pl-s"&gt;"AI adoption (last two weeks) — 6‑survey rolling average"&lt;/span&gt;, &lt;span class="pl-s1"&gt;pad&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;16&lt;/span&gt;)
&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;yaxis&lt;/span&gt;.&lt;span class="pl-c1"&gt;set_major_formatter&lt;/span&gt;(&lt;span class="pl-en"&gt;PercentFormatter&lt;/span&gt;(&lt;span class="pl-c1"&gt;100&lt;/span&gt;))
&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;set_ylabel&lt;/span&gt;(&lt;span class="pl-s"&gt;"%"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;set_xlabel&lt;/span&gt;(&lt;span class="pl-s"&gt;""&lt;/span&gt;)
&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;grid&lt;/span&gt;(&lt;span class="pl-c1"&gt;True&lt;/span&gt;, &lt;span class="pl-s1"&gt;alpha&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;0.25&lt;/span&gt;, &lt;span class="pl-s1"&gt;linestyle&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"--"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;ax&lt;/span&gt;.&lt;span class="pl-c1"&gt;legend&lt;/span&gt;(&lt;span class="pl-s1"&gt;title&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;None&lt;/span&gt;, &lt;span class="pl-s1"&gt;loc&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"upper left"&lt;/span&gt;, &lt;span class="pl-s1"&gt;ncols&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;2&lt;/span&gt;, &lt;span class="pl-s1"&gt;frameon&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;)
&lt;span class="pl-s1"&gt;plt&lt;/span&gt;.&lt;span class="pl-c1"&gt;tight_layout&lt;/span&gt;()

&lt;span class="pl-s1"&gt;png_path&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"/mnt/data/ai_adoption_rolling6_by_firm_size.png"&lt;/span&gt;
&lt;span class="pl-s1"&gt;svg_path&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"/mnt/data/ai_adoption_rolling6_by_firm_size.svg"&lt;/span&gt;
&lt;span class="pl-s1"&gt;plt&lt;/span&gt;.&lt;span class="pl-c1"&gt;savefig&lt;/span&gt;(&lt;span class="pl-s1"&gt;png_path&lt;/span&gt;, &lt;span class="pl-s1"&gt;dpi&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;200&lt;/span&gt;, &lt;span class="pl-s1"&gt;bbox_inches&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"tight"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;plt&lt;/span&gt;.&lt;span class="pl-c1"&gt;savefig&lt;/span&gt;(&lt;span class="pl-s1"&gt;svg_path&lt;/span&gt;, &lt;span class="pl-s1"&gt;bbox_inches&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"tight"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;I like how it generated &lt;a href="https://static.simonwillison.net/static/2025/ai_adoption_rolling6_by_firm_size.svg"&gt;an SVG version&lt;/a&gt; of the chart without me even asking for it.&lt;/p&gt;
&lt;p&gt;You can access &lt;a href="https://chatgpt.com/share/68bf48cf-0e70-8006-a045-96fa8e7ddfc1"&gt;the ChatGPT transcript&lt;/a&gt; to see full details of everything it did.&lt;/p&gt;
&lt;h4 id="rendering-that-chart-client-side-using-pyodide"&gt;Rendering that chart client-side using Pyodide&lt;/h4&gt;
&lt;p&gt;I had one more challenge to try out. Could I render that same chart entirely in the browser using &lt;a href="https://pyodide.org/en/stable/"&gt;Pyodide&lt;/a&gt;, which can execute both Pandas and Matplotlib?&lt;/p&gt;
&lt;p&gt;I fired up a new ChatGPT GPT-5 session and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Build a canvas that loads Pyodide and uses it to render an example bar chart with pandas and matplotlib and then displays that on the page&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My goal here was simply to see if I could get a proof of concept of a chart rendered, ideally using the Canvas feature of ChatGPT. Canvas is OpenAI's version of Claude Artifacts, which lets the model write and then execute HTML and JavaScript directly in the ChatGPT interface.&lt;/p&gt;
&lt;p&gt;It worked! Here's &lt;a href="https://chatgpt.com/c/68bf2993-ca94-832a-a95e-fb225911c0a6"&gt;the transcript&lt;/a&gt; and here's &lt;a href="https://tools.simonwillison.net/pyodide-bar-chart"&gt;what it built me&lt;/a&gt;, exported  to my &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; GitHub Pages site (&lt;a href="https://github.com/simonw/tools/blob/main/pyodide-bar-chart.html"&gt;source code here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/pyodide-matplotlib.jpg" alt="Screenshot of a web application demonstrating Pyodide integration. Header reads &amp;quot;Pyodide + pandas + matplotlib — Bar Chart&amp;quot; with subtitle &amp;quot;This page loads Pyodide in the browser, uses pandas to prep some data, renders a bar chart with matplotlib, and displays it below — all client-side.&amp;quot; Left panel shows terminal output: &amp;quot;Ready&amp;quot;, &amp;quot;# Python environment ready&amp;quot;, &amp;quot;• pandas 2.2.0&amp;quot;, &amp;quot;• numpy 1.26.4&amp;quot;, &amp;quot;• matplotlib 3.5.2&amp;quot;, &amp;quot;Running chart code...&amp;quot;, &amp;quot;Done. Chart updated.&amp;quot; with &amp;quot;Re-run demo&amp;quot; and &amp;quot;Show Python&amp;quot; buttons. Footer note: &amp;quot;CDN: pyodide, pandas, numpy, matplotlib are fetched on demand. First run may take a few seconds.&amp;quot; Right panel displays a bar chart titled &amp;quot;Example Bar Chart (pandas + matplotlib in Pyodide)&amp;quot; showing blue bars for months Jan through Jun with values approximately: Jan(125), Feb(130), Mar(80), Apr(85), May(85), Jun(120). Y-axis labeled &amp;quot;Streams&amp;quot; ranges 0-120, X-axis labeled &amp;quot;Month&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've now proven to myself that I can render those Python charts directly in the browser. Next step: recreate the Apollo chart.&lt;/p&gt;
&lt;p&gt;I knew it would need a way to load the spreadsheet that was CORS-enabled. I uploaded my copy to my &lt;code&gt;/static/cors-allow/2025/...&lt;/code&gt; directory (configured in Cloudflare to serve CORS headers), pasted in the finished plotting code from earlier and told ChatGPT:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Now update it to have less explanatory text and a less exciting design (black on white is fine) and run the equivalent of this:&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;(... pasted in Python code from earlier ...)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Load the XLSX sheet from https://static.simonwillison.net/static/cors-allow/2025/Employment-Size-Class-Sep-2025.xlsx&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It didn't quite work - I got an error about &lt;code&gt;openpyxl&lt;/code&gt; which I manually researched the fix for and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Use await micropip.install("openpyxl") to install openpyxl - instead of using loadPackage&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I had to paste in another error message:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;zipfile.BadZipFile: File is not a zip file&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then one about a &lt;code&gt;SyntaxError: unmatched ')'&lt;/code&gt; and a &lt;code&gt;TypeError: Legend.__init__() got an unexpected keyword argument 'ncols'&lt;/code&gt; - copying and pasting error messages remains a frustrating but necessary part of the vibe-coding loop.&lt;/p&gt;
&lt;p&gt;... but with those fixes in place, the resulting code worked! Visit &lt;a href="https://tools.simonwillison.net/ai-adoption"&gt;tools.simonwillison.net/ai-adoption&lt;/a&gt; to see the final result:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/recreated-chart-pyodide.jpg" alt="Web page. Title is AI adoption - 6-survey rolling average. Has a Run, Downlaed PNG, Downlaod SVG button. Panel on the left says Loading Python... Fetcing packages numpy, pandas, matplotlib. Installing openpyxl via micropop... ready. Running. Done. Right hand panel shows the rendered chart." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the code for that page, &lt;a href="https://github.com/simonw/tools/blob/main/ai-adoption.html"&gt;170 lines&lt;/a&gt; all-in of HTML, CSS, JavaScript and Python.&lt;/p&gt;
&lt;h4 id="what-i-ve-learned-from-this"&gt;What I've learned from this&lt;/h4&gt;
&lt;p&gt;This was another of those curiosity-inspired investigations that turned into a whole set of useful lessons.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPT-5 is great at tracking down US Census data, no matter how difficult their site is to understand if you don't work with their data often&lt;/li&gt;
&lt;li&gt;It can do a very good job of turning data + a screenshot of a chart into a recreation of that chart using code interpreter, Pandas and matplotlib&lt;/li&gt;
&lt;li&gt;Running Python + matplotlib in a browser via Pyodide is very easy and only takes a few dozen lines of code&lt;/li&gt;
&lt;li&gt;Fetching an XLSX sheet into Pyodide is only a small extra step using &lt;code&gt;pyfetch&lt;/code&gt; and &lt;code&gt;openpyxl&lt;/code&gt;:
&lt;pre style="margin-top: 0.5em"&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;micropip&lt;/span&gt;
&lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;micropip&lt;/span&gt;.&lt;span class="pl-c1"&gt;install&lt;/span&gt;(&lt;span class="pl-s"&gt;"openpyxl"&lt;/span&gt;)
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;pyodide&lt;/span&gt;.&lt;span class="pl-s1"&gt;http&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pyfetch&lt;/span&gt;
&lt;span class="pl-s1"&gt;resp_fetch&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;pyfetch&lt;/span&gt;(&lt;span class="pl-c1"&gt;URL&lt;/span&gt;)
&lt;span class="pl-s1"&gt;wb_bytes&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;resp_fetch&lt;/span&gt;.&lt;span class="pl-c1"&gt;bytes&lt;/span&gt;()
&lt;span class="pl-s1"&gt;xf&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;pd&lt;/span&gt;.&lt;span class="pl-c1"&gt;ExcelFile&lt;/span&gt;(&lt;span class="pl-s1"&gt;io&lt;/span&gt;.&lt;span class="pl-c1"&gt;BytesIO&lt;/span&gt;(&lt;span class="pl-s1"&gt;wb_bytes&lt;/span&gt;), &lt;span class="pl-s1"&gt;engine&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;'openpyxl'&lt;/span&gt;)&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;Another new-to-me pattern: you can render an image to the DOM from Pyodide code &lt;a href="https://github.com/simonw/tools/blob/cf26ed8a6f243159bdc90a3d88f818261732103f/ai-adoption.html#L124"&gt;like this&lt;/a&gt;:
&lt;pre style="margin-top: 0.5em"&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;js&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;document&lt;/span&gt;
&lt;span class="pl-s1"&gt;document&lt;/span&gt;.&lt;span class="pl-c1"&gt;getElementById&lt;/span&gt;(&lt;span class="pl-s"&gt;'plot'&lt;/span&gt;).&lt;span class="pl-c1"&gt;src&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'data:image/png;base64,'&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s1"&gt;img_b64&lt;/span&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I will most definitely be using these techniques again in future.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Coincidentally Claude released their own upgraded equivalent to ChatGPT Code Interpreter later on the day that I published this story, so I &lt;a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/#something-much-harder-recreating-the-ai-adoption-chart"&gt;ran the same chart recreation experiment&lt;/a&gt; against Claude Sonnet 4 to see how it compared.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/census"&gt;census&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualization"&gt;visualization&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pyodide"&gt;pyodide&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/code-interpreter"&gt;code-interpreter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-search"&gt;ai-assisted-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="census"/><category term="data-journalism"/><category term="javascript"/><category term="python"/><category term="tools"/><category term="visualization"/><category term="ai"/><category term="pyodide"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="code-interpreter"/><category term="llm-reasoning"/><category term="vibe-coding"/><category term="ai-assisted-search"/><category term="gpt-5"/></entry><entry><title>Anthropic status: Model output quality</title><link href="https://simonwillison.net/2025/Sep/9/anthropic-model-output-quality/#atom-tag" rel="alternate"/><published>2025-09-09T06:28:21+00:00</published><updated>2025-09-09T06:28:21+00:00</updated><id>https://simonwillison.net/2025/Sep/9/anthropic-model-output-quality/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://status.anthropic.com/incidents/72f99lh1cj2c"&gt;Anthropic status: Model output quality&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic &lt;a href="https://simonwillison.net/2025/Aug/30/claude-degraded-quality/"&gt;previously reported&lt;/a&gt; model serving bugs that affected Claude Opus 4 and 4.1 for 56.5 hours. They've now fixed additional bugs affecting "a small percentage" of Sonnet 4 requests for almost a month, plus a less long-lived Haiku 3.5 issue:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Resolved issue 1 - A small percentage of Claude Sonnet 4 requests experienced degraded output quality due to a bug from Aug 5-Sep 4, with the impact increasing from Aug 29-Sep 4. A fix has been rolled out and this incident has been resolved. &lt;/p&gt;
&lt;p&gt;Resolved issue 2 - A separate bug affected output quality for some Claude Haiku 3.5 and Claude Sonnet 4 requests from Aug 26-Sep 5. A fix has been rolled out and this incident has been resolved.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They directly address accusations that these stem from deliberate attempts to save money on serving models:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The timing of these issues is really unfortunate, corresponding with the rollout of GPT-5 which I see as the non-Anthropic model to feel truly competitive with Claude for writing code since their release of Claude 3.5 back in  June last year.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/theo/status/1965216210729259485"&gt;@theo&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-4"&gt;claude-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="claude-4"/><category term="gpt-5"/></entry><entry><title>Load Llama-3.2 WebGPU in your browser from a local folder</title><link href="https://simonwillison.net/2025/Sep/8/webgpu-local-folder/#atom-tag" rel="alternate"/><published>2025-09-08T20:53:52+00:00</published><updated>2025-09-08T20:53:52+00:00</updated><id>https://simonwillison.net/2025/Sep/8/webgpu-local-folder/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://static.simonwillison.net/static/2025/llama-3.2-webgpu/"&gt;Load Llama-3.2 WebGPU in your browser from a local folder&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Inspired by &lt;a href="https://news.ycombinator.com/item?id=45168953#45169054"&gt;a comment&lt;/a&gt; on Hacker News I decided to see if it was possible to modify the &lt;a href="https://github.com/huggingface/transformers.js-examples/tree/main/llama-3.2-webgpu"&gt;transformers.js-examples/tree/main/llama-3.2-webgpu&lt;/a&gt; Llama 3.2 chat demo (&lt;a href="https://huggingface.co/spaces/webml-community/llama-3.2-webgpu"&gt;online here&lt;/a&gt;, I &lt;a href="https://simonwillison.net/2024/Sep/30/llama-32-webgpu/"&gt;wrote about it last November&lt;/a&gt;) to add an option to open a local model file directly from a folder on disk, rather than waiting for it to download over the network.&lt;/p&gt;
&lt;p&gt;I posed the problem to OpenAI's GPT-5-enabled Codex CLI like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/huggingface/transformers.js-examples
cd transformers.js-examples/llama-3.2-webgpu
codex
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Modify this application such that it offers the user a file browse button for selecting their own local copy of the model file instead of loading it over the network. Provide a "download model" option too.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Codex churned away for several minutes, even running commands like &lt;code&gt;curl -sL https://raw.githubusercontent.com/huggingface/transformers.js/main/src/models.js | sed -n '1,200p'&lt;/code&gt; to inspect the source code of the underlying Transformers.js library.&lt;/p&gt;
&lt;p&gt;After four prompts total (&lt;a href="https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01bd2?permalink_comment_id=5751814#gistcomment-5751814"&gt;shown here&lt;/a&gt;) it built something which worked!&lt;/p&gt;
&lt;p&gt;To try it out you'll need your own local copy of the Llama 3.2 ONNX model. You can get that (a ~1.2GB) download) like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git lfs install
git clone https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct-q4f16
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then visit my &lt;a href="https://static.simonwillison.net/static/2025/llama-3.2-webgpu/"&gt;llama-3.2-webgpu&lt;/a&gt; page in Chrome or Firefox Nightly (since WebGPU is required), click "Browse folder", select that folder you just cloned, agree to the "Upload" confirmation (confusing since nothing is uploaded from your browser, the model file is opened locally on your machine) and click "Load local model".&lt;/p&gt;
&lt;p&gt;Here's an animated demo (recorded in real-time, I didn't speed this up):&lt;/p&gt;
&lt;p&gt;&lt;img alt="GIF. I follow the setup instructions, clicking to load a local model and browsing to the correct folder. Once loaded the model shows a chat interface, I run the example about time management which returns tokens at about 10/second." src="https://static.simonwillison.net/static/2025/webgpu-llama-demo-small.gif" /&gt;&lt;/p&gt;
&lt;p&gt;I pushed &lt;a href="https://github.com/simonw/transformers.js-examples/commit/cdebf4128c6e30414d437affd4b13b6c9c79421d"&gt;a branch with those changes here&lt;/a&gt;. The next step would be to modify this to support other models in addition to the Llama 3.2 demo, but I'm pleased to have got to this proof of concept with so little work beyond throwing some prompts at Codex to see if it could figure it out.&lt;/p&gt;
&lt;p&gt;According to the Codex &lt;code&gt;/status&lt;/code&gt; command &lt;a href="https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01bd2?permalink_comment_id=5751807#gistcomment-5751807"&gt;this used&lt;/a&gt; 169,818 input tokens, 17,112 output tokens and 1,176,320 cached input tokens. At current GPT-5 token pricing ($1.25/million input, $0.125/million cached input, $10/million output) that would cost 53.942 cents, but Codex CLI hooks into my existing $20/month ChatGPT Plus plan so this was bundled into that.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45168953#45173297"&gt;My Hacker News comment&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="ai-assisted-programming"/><category term="transformers-js"/><category term="webgpu"/><category term="llm-pricing"/><category term="vibe-coding"/><category term="gpt-5"/><category term="codex-cli"/></entry><entry><title>GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search</title><link href="https://simonwillison.net/2025/Sep/6/research-goblin/#atom-tag" rel="alternate"/><published>2025-09-06T19:31:57+00:00</published><updated>2025-09-06T19:31:57+00:00</updated><id>https://simonwillison.net/2025/Sep/6/research-goblin/#atom-tag</id><summary type="html">
    &lt;p&gt;"Don't use chatbots as search engines" was great advice for several years... until it wasn't.&lt;/p&gt;
&lt;p&gt;I wrote about how good OpenAI's o3 was at using its Bing-backed search tool &lt;a href="https://simonwillison.net/2025/Apr/21/ai-assisted-search/"&gt;back in April&lt;/a&gt;. GPT-5 feels even better.&lt;/p&gt;
&lt;p&gt;I've started calling it my &lt;strong&gt;Research Goblin&lt;/strong&gt;. I can assign a task to it, no matter how trivial or complex, and it will do an often unreasonable amount of work to search the internet and figure out an answer.&lt;/p&gt;
&lt;p&gt;This is excellent for satisfying curiosity, and occasionally useful for more important endeavors as well.&lt;/p&gt;
&lt;p&gt;I always run my searches by selecting the "GPT-5 Thinking" model from the model picker - in my experience this leads to far more comprehensive (albeit much slower) results.&lt;/p&gt;
&lt;p&gt;Here are some examples from just the last couple of days. Every single one of them was run on my phone, usually while I was doing something else. Most of them were dictated using the iPhone voice keyboard, which I find faster than typing. Plus, it's fun to talk to my Research Goblin.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#bouncy-travelators"&gt;Bouncy travelators&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#identify-this-building"&gt;Identify this building&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#starbucks-uk-cake-pops"&gt;Starbucks UK cake pops&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#britannica-to-seed-wikipedia"&gt;Britannica to seed Wikipedia&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#official-name-for-the-university-of-cambridge"&gt;Official name for the University of Cambridge&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#history-of-the-caverns-in-exeter-quay"&gt;History of the caverns in Exeter quay&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#aldi-vs-lidl"&gt;Aldi vs Lidl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#ai-labs-scanning-books-for-training-data"&gt;AI labs scanning books for training data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#gpt-5-for-search-feels-competent"&gt;GPT-5 for search feels competent&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#tips-for-using-search-in-chatgpt"&gt;Tips for using search in ChatGPT&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="bouncy-travelators"&gt;Bouncy travelators&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;They used to be rubber bouncy travelators at Heathrow and they were really fun, have all been replaced by metal ones now and if so, when did that happen?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was traveling through Heathrow airport pondering what had happened to the fun bouncy rubber travelators.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc2d98-9aac-8006-98b9-1424d98290f8"&gt;Here's what I got&lt;/a&gt;. Research Goblin narrowed it down to some time between 2014-2018 but, more importantly, found me this &lt;a href="https://www.sfchronicle.com/totalsf/article/sfo-bouncy-moving-walkway-airport-19845449.php"&gt;delightful 2024 article&lt;/a&gt; by Peter Hartlaub in the San Francisco Chronicle with a history of the SFO bouncy walkways, now also sadly retired.&lt;/p&gt;
&lt;h4 id="identify-this-building"&gt;Identify this building&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/reading-building.jpg" alt="not a great photo of a building with a distinctive shaped roof" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Identify this building in reading&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a photo I snapped out of the window on the train. It &lt;a href="https://chatgpt.com/share/68bc2e21-1d24-8006-b083-00b3233e1c67"&gt;thought for 1m4s&lt;/a&gt; and correctly identified it as &lt;a href="https://en.wikipedia.org/wiki/The_Blade,_Reading"&gt;The Blade&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="starbucks-uk-cake-pops"&gt;Starbucks UK cake pops&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Starbucks in the UK don't sell cake pops! Do a deep investigative dive&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Starbucks in Exeter railway station didn't have cake pops, and the lady I asked didn't know what they were.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc71b4-68f4-8006-b462-cf32f61e7ec3"&gt;Here's the result&lt;/a&gt;. It turns out Starbucks did launch cake pops in the UK &lt;a href="https://www.nationalworld.com/lifestyle/starbucks-cake-pops-launched-in-uk-on-new-autumn-menu-full-list-of-items-4284537"&gt;in September 2023&lt;/a&gt; but they aren't available at all outlets, in particular the licensed travel locations such as the one at Exeter St Davids station.&lt;/p&gt;
&lt;p&gt;I particularly enjoyed how it established definitive proof by consulting &lt;a href="https://www.starbucks.co.uk/sites/starbucks-uk-pwa/files/2024-11/HOL24_UK_AllergenBook_CORE_FOOD_v02.LR_.pdf"&gt;the nutrition and allergen guide PDF&lt;/a&gt; on starbucks.co.uk, which does indeed list both the Birthday Cake Pop (my favourite) and the Cookies and Cream one (apparently discontinued in the USA, at least &lt;a href="https://www.reddit.com/r/starbucks/comments/1lp5chq/just_learned_today_the_cookies_cream_cake_pop_has/"&gt;according to r/starbucks&lt;/a&gt;).&lt;/p&gt;
&lt;h4 id="britannica-to-seed-wikipedia"&gt;Britannica to seed Wikipedia&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Someone on hacker News said:&lt;/p&gt;
&lt;p&gt;&amp;gt; I was looking at another thread about how Wikipedia was the best thing on the internet. But they only got the head start by taking copy of Encyclopedia Britannica and everything else&lt;/p&gt;
&lt;p&gt;Find what they meant by that&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc3062-5a68-8006-a12b-cf7196a130ae"&gt;The result&lt;/a&gt;. It turns out Wikipedia did seed itself with content from the out-of-copyright 1911 Encyclopædia Britannica... but that project took place in 2006, five years after Wikipedia first launched in 2001.&lt;/p&gt;
&lt;p&gt;I asked:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is the single best article I can link somebody to that explains the 1911 Britannica thing&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And it pointed me to &lt;a href="https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Encyclopaedia_Britannica"&gt;Wikipedia:WikiProject Encyclopaedia Britannica&lt;/a&gt; which includes a detailed explanation and a link to &lt;a href="https://en.m.wikipedia.org/wiki/Template:EB1911"&gt;the 13,000 pages&lt;/a&gt; still tagged with the template from that project. I posted what I found &lt;a href="https://news.ycombinator.com/item?id=45135061#45137109"&gt;in a comment&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Notably (for me anyway) I didn't feel the need to disclose my use of ChatGPT in finding that information - at this point that feels a little like disclosing that I ran a Google search.&lt;/p&gt;
&lt;h4 id="official-name-for-the-university-of-cambridge"&gt;Official name for the University of Cambridge&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;What is the official legal name of the university of Cambridge?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/iancal/"&gt;Here's the context for that one&lt;/a&gt;. It thought for 19 seconds - the thinking trace reveals it knew the answer but wanted to confirm it. &lt;a href="https://chatgpt.com/share/68bc723d-a344-8006-9514-fb3ea2c2b43b"&gt;It answered&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The Chancellor, Masters, and Scholars of the University of Cambridge.” &lt;a href="https://www.cam.ac.uk/about-the-university/how-the-university-and-colleges-work/the-university-as-a-charity"&gt;University of Cambridge&lt;/a&gt;, &lt;a href="https://www.cambridge.org/legal"&gt;Cambridge University Press &amp;amp; Assessment&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That first link gave me the citation I needed in order to be sure this was right.&lt;/p&gt;
&lt;p&gt;Since this is my shortest example, here's a screenshot of the expanded "Thought for 19s" panel. I &lt;em&gt;always&lt;/em&gt; expand the thoughts - seeing how it pulled together its answer is crucial for evaluating if the answer is likely to be useful or not.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/cambridge-name.jpg" alt="Screenshot of an AI assistant's thought process and research. Header shows &amp;quot;Thought for 19s&amp;quot; followed by bullet points: &amp;quot;The user asked for the official legal name of the University of Cambridge, and I think they mean the formal corporate name. It's probably 'The Chancellor, Masters, and Scholars of the University of Cambridge,' similar to Oxford's. I'm going to verify by checking the web for precise information.&amp;quot; &amp;quot;The official name of Cambridge might include the serial comma, like 'Masters, and Scholars.' There's a similar structure with Oxford, which uses a comma after 'Masters' too. The University is an exempt charity, so you can cross-check with legal documents like statutes or charity records. I'll double-check.&amp;quot; Shows search results for &amp;quot;official legal name of University of Cambridge&amp;quot; from en.wikipedia.org, www.cam.ac.uk, projects.propublica.org and &amp;quot;10 more&amp;quot; sources. Continues with &amp;quot;The University of Cambridge's official legal name, based on the University website and Cambridge University Press, is 'The Chancellor, Masters, and Scholars of the University of Cambridge,' with the serial comma after 'Masters.' It's clear from these authoritative sources, so I'll provide that straightforward answer, citing the University site.&amp;quot; &amp;quot;The official legal name of the University of Cambridge is 'The Chancellor, Masters, and Scholars of the University of Cambridge,' with the comma after Masters and the Oxford comma before 'and.' I'll answer concisely in one line and cite official sources to confirm. No need for extra UI elements here.&amp;quot; Shows &amp;quot;Done&amp;quot; status. Final answer displays: &amp;quot;The Chancellor, Masters, and Scholars of the University of Cambridge.&amp;quot; attributed to University of Cambridge, Cambridge University Press &amp;amp; Assessment, with &amp;quot;Sources&amp;quot; section at bottom." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="history-of-the-caverns-in-exeter-quay"&gt;History of the caverns in Exeter quay&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Research On the waterfront restaurant in Exeter, is it dug into the cliffs somehow? History of the building, who built it, why and how&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We were out to dinner &lt;a href="https://maps.app.goo.gl/xxvaPQiNWACtbq3H8"&gt;here&lt;/a&gt; and noticed that the interior of the restaurant appeared to be a space dug into the cliff, which piqued my interest.&lt;/p&gt;
&lt;p&gt;This was &lt;a href="https://chatgpt.com/share/68bc32fb-d52c-8006-9259-0b984dc832b2"&gt;the ChatGPT session&lt;/a&gt; that inspired the Research Goblin nickname. It just kept on digging!&lt;/p&gt;
&lt;p&gt;The first reply took 2m40s and confirmed that yes, these quay buildings were carved into the red sandstone cliff &lt;a href="https://www.exploredevon.info/activities/walk/exeter-quay/"&gt;in the 1820s-1830s&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;ChatGPT with GPT-5 really likes to suggest additional steps it can take. In this case:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you’d like, I can dig up the exact Historic England entry that covers the “Southern Warehouse” address and overlay it on a map of the vaults.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I often say "yes" purely out of curiosity to see what it will do next, and the offer to "overlay it on a map" was irresistible, like how would it even do that?&lt;/p&gt;
&lt;p&gt;It did a &lt;em&gt;ton&lt;/em&gt; of extra searches, found latitude and longitude coordinates for the restaurant (from Wikimedia Commons) and the warehouse buildings (from National Heritage List for England via Wikipedia), showed me that data in a table and then used Python to render this image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/bad-chart.png" alt="Scatter plot titled &amp;quot;On The Waterfront vs. Warehouse Vaults (Exeter Quay)&amp;quot; with scientific notation &amp;quot;+5.071e1&amp;quot; in top left. Y-axis shows &amp;quot;Latitude&amp;quot; ranging from 0.0065 to 0.0090. X-axis shows &amp;quot;Longitude&amp;quot; ranging from -3.5310 to -3.5280. Three orange X markers plotted: &amp;quot;Warehouse Vaults (6-11, The Quay) — HE 1223046&amp;quot; at top center, &amp;quot;On The Waterfront (4-9 Southern Warehouse)&amp;quot; at middle left, and &amp;quot;Warehouse Vaults (1-4, The Quay) — HE 1223045&amp;quot; at bottom right" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Yeah, so that one is a bit of a miss!&lt;/p&gt;
&lt;p&gt;But... the underlying research chain, as always, provided fascinating. I noticed that at one point it had searched for "Exeter Quay vaults map PDF", which inspired my next question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Did you find any PDF diagrams, showing the tunnels into the cliff&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;2m20s later it had found &lt;a href="https://planningdocs.exeter.gov.uk/servlets/direct/KgQ8hlUut3Q2r5wVMItggobi9P/2/844706/1/1/AS_PDF_FILE"&gt;a planning document&lt;/a&gt; and &lt;a href="https://www.turner-locker.co.uk/wp-content/uploads/2018/08/Cellar-1-The-Quay-TO-LET-2018-1.pdf"&gt;a letting brochure&lt;/a&gt;, both PDFs, but noted that neither of them had the diagram I wanted. Evidently it actually "reads" the PDFs it finds as part of its activities, since it appears to have knowledge of the images contained within them as well as the text.&lt;/p&gt;
&lt;p&gt;It concluded with this intriguing note:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There’s strong evidence that a detailed plan/section &lt;strong&gt;does&lt;/strong&gt; exist in the Exeter Archaeology report often cited by planners—&lt;strong&gt;“Archaeological, Historical and Conservation Study of the Exeter Canal Basin” (Exeter Archaeology Report 00.18, 2000)&lt;/strong&gt;—but it isn’t online; it’s just cited in planning documents. You’d likely need to request it via the Devon Heritage Centre/Historic Environment Record.&lt;/p&gt;
&lt;p&gt;If you want, I can keep hunting specialist archives for a digitized copy or a different report that includes the vault layouts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Obviously I said "Yes do that" and 2m24s later it chucked out a whole host of extra links, an apology that it still hadn't found a detailed plan diagram and, under the section heading "What I recommend next (actionable + exact ask)":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you’re happy, I can draft an email you can paste to the right archivists. [...]&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Subject: Exeter Quayside cliff vaults – request for plan/section drawings
Hello,&lt;/p&gt;
&lt;p&gt;I’m researching the cliff-cut vaults behind The Quay (On The Waterfront / Kings Wharf).
Could you supply scans or copies of any &lt;strong&gt;plans/sections/diagrams&lt;/strong&gt; of these vaults—especially those in &lt;strong&gt;Exeter Archaeology Report 00.18 (2000)&lt;/strong&gt; (Archaeological, Historical and Conservation Study of the Exeter Canal Basin) and any &lt;strong&gt;EMAFU/Exeter Archaeology drawings&lt;/strong&gt; for “The Quay” showing the &lt;strong&gt;vault/tunnel layouts cut into the red sandstone cliff&lt;/strong&gt;?&lt;/p&gt;
&lt;p&gt;For reference: NHLE list entries &lt;strong&gt;1223045 (1–4 Warehouse Vaults), 1223046 (6–11 Warehouse Vaults)&lt;/strong&gt;, and &lt;strong&gt;1223114 (14–25 Vaults)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Many thanks!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's a good thing my Research Goblin doesn't have the ability to send email (which we already know is &lt;a href="https://simonwillison.net/2025/May/31/snitchbench-with-llm/"&gt;very risky business&lt;/a&gt;) or at this point I wouldn't be surprised if it started autonomously bothering archivists on my behalf!&lt;/p&gt;
&lt;h4 id="aldi-vs-lidl"&gt;Aldi vs Lidl&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;In UK supermarket terms compare lidl and aldi&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I saw an Aldi and a Lidl next to each other in Exeter and was curious as to how they compare. In this case my Research Goblin went &lt;em&gt;way&lt;/em&gt; deeper than I had intended, thinking for 1m25s about market share and price surveys and loyalty schemes and &lt;a href="https://chatgpt.com/share/68bc7913-7ed8-8006-bd3f-63e540470cd7"&gt;answering&lt;/a&gt; with ten paragraphs of detailed analysis.&lt;/p&gt;
&lt;p&gt;I replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Yeah, I just wanted the basic vibes like is one fancier than the other?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It thought for 6 seconds, didn't run any additional searches and replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Short version: neither is “fancy” — both are no-frills discounters — but &lt;strong&gt;Lidl often feels a touch more polished&lt;/strong&gt;, while &lt;strong&gt;Aldi feels a touch more bare-bones&lt;/strong&gt;. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since I was now enjoying having my Goblin express judgement over British supermarket chains I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How do they rate in fancinews compared to all of the other major UK supermarket chains?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(That was a typo for "fanciness" which it figured out.)&lt;/p&gt;
&lt;p&gt;It thought for another 1m1s, ran 7 searches, consulted 27 URLs and gave me back a ranking that looked about right to me.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/fancy-supermarkets.jpg" alt="ChatGPT. Prompt: How do they rate in fancinews compared to all of the other major UK supermarket chains? Thought for 1m 1s › Gotcha-assuming you mean &amp;quot;fanciness.&amp;quot; Here's the UK supermarket vibe-ladder (posh → no-frills): Super-posh: M&amp;amp;S Food and Waitrose - premium feel, strong store appearance/quality scores; Waitrose is literally described as &amp;quot;upmarket.&amp;quot; Which? The Guardian Upper-mainstream: Sainsbury's, Ocado (online, but it sells M&amp;amp;s Food since 2020, so the range skews premium). Which? The Guardian Mainstream: Tesco, Morrisons, Co-op - broad ranges, middle-of-the-road on experience/ appearance. Which? Value-forward: Asda, Iceland — price-led; Iceland's frozen focus, Asda's value reputation (not &amp;quot;fancy,&amp;quot; but popular). Time Out Worldwide Discounters (least fancy by design): Lidl (a touch more polished in-store) and Aldi (most stripped-back, no-frills). Both sit lower on &amp;quot;store appearance/ service&amp;quot; in consumer surveys, even while smashing value rankings. Which? +1 Net: on a &amp;quot;posh&amp;quot; scale, think M&amp;amp;S/Waitrose at the top, Sainsbury's/Tesco middle, Asda/Iceland lower, and Lidl/Aldi the least fancy— by intent, not as a knock on quality or value." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="ai-labs-scanning-books-for-training-data"&gt;AI labs scanning books for training data&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Anthropic bought lots of physical books and cut them up and scan them for training data. Do any other AI labs do the same thing?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Relevant to &lt;a href="https://simonwillison.net/2025/Sep/6/anthropic-settlement/"&gt;today's big story&lt;/a&gt;. Research Goblin was &lt;a href="https://chatgpt.com/share/68bc771c-c188-8006-a8e5-4b1624f5bdf0"&gt;unable to find&lt;/a&gt; any news stories or other evidence that any labs other than Anthropic are engaged in large scale book scanning for training data. That's not to say it isn't happening, but it's happening very quietly if that's the case.&lt;/p&gt;
&lt;h4 id="gpt-5-for-search-feels-competent"&gt;GPT-5 for search feels competent&lt;/h4&gt;
&lt;p&gt;The word that best describes how I feel about GPT-5 search is that it feels &lt;strong&gt;competent&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I've thrown all sorts of things at it over the last few weeks and it rarely disappoints me. It almost always does better than if I were to dedicate the same amount of time to manually searching myself, mainly because it's much faster at running searches and evaluating the results than I am.&lt;/p&gt;
&lt;p&gt;I particularly love that it works so well on mobile. I used to reserve my deeper research sessions to a laptop where I could open up dozens of tabs. I'll still do that for higher stakes activities but I'm finding the scope of curiosity satisfaction I can perform on the go with just my phone has increased quite dramatically.&lt;/p&gt;
&lt;p&gt;I've mostly stopped using OpenAI's Deep Research feature, because ChatGPT search now gives me the results I'm interested in far more quickly for most queries.&lt;/p&gt;
&lt;p&gt;As a developer who builds software on LLMs I see ChatGPT search as the gold standard for what can be achieved using tool calling combined with chain-of-thought. Techniques like RAG are &lt;em&gt;massively&lt;/em&gt; more effective if you can reframe them as several levels of tool calling with a carefully selected set of powerful search tools.&lt;/p&gt;
&lt;p&gt;The way that search tool integrates with reasoning is key, because it allows GPT-5 to execute a search, reason about the results and then execute follow-up searches - all as part of that initial "thinking" process.&lt;/p&gt;
&lt;p&gt;Anthropic call this ability &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#interleaved-thinking"&gt;interleaved thinking&lt;/a&gt; and it's also &lt;a href="https://platform.openai.com/docs/guides/reasoning#keeping-reasoning-items-in-context"&gt;supported by the OpenAI Responses API&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="tips-for-using-search-in-chatgpt"&gt;Tips for using search in ChatGPT&lt;/h4&gt;
&lt;p&gt;As with all things AI, GPT-5 search rewards intuition gathered through experience. Any time a curious thought pops into my head I try to catch it and throw it at my Research Goblin. If it's something I'm certain it won't be able to handle then even better! I can learn from watching it fail.&lt;/p&gt;
&lt;p&gt;I've been trying out hints like "go deep" which seem to trigger a more thorough research job. I enjoy throwing those at shallow and unimportant questions like the UK Starbucks cake pops one just to see what happens!&lt;/p&gt;
&lt;p&gt;You can throw questions at it which have a single, unambiguous answer - but I think questions which are broader and don't have a "correct" answer can be a lot more fun. The UK supermarket rankings above are a great example of that.&lt;/p&gt;
&lt;p&gt;Since I love a questionable analogy for LLMs Research Goblin is... well, it's a goblin. It's very industrious, not quite human and not entirely trustworthy. You have to be able to outwit it if you want to keep it gainfully employed.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bing"&gt;bing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deep-research"&gt;deep-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-search"&gt;ai-assisted-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="bing"/><category term="definitions"/><category term="search"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm-tool-use"/><category term="llm-reasoning"/><category term="deep-research"/><category term="ai-assisted-search"/><category term="gpt-5"/></entry><entry><title>Rich Pixels</title><link href="https://simonwillison.net/2025/Sep/2/rich-pixels/#atom-tag" rel="alternate"/><published>2025-09-02T11:05:23+00:00</published><updated>2025-09-02T11:05:23+00:00</updated><id>https://simonwillison.net/2025/Sep/2/rich-pixels/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/darrenburns/rich-pixels"&gt;Rich Pixels&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/darrenburns/rich-pixels/blob/a0745ebcc26b966d9dbac5875720364ee5c6a1d3/rich_pixels/_renderer.py#L123C25-L123C26"&gt;the key trick&lt;/a&gt; - it renders Unicode ▄ (U+2584, "lower half block") characters after setting a foreground and background color for the two pixels it needs to display.&lt;/p&gt;
&lt;p&gt;I got GPT-5 to &lt;a href="https://chatgpt.com/share/68b6c443-2408-8006-8f4a-6862755cd1e4"&gt;vibe code up&lt;/a&gt; a &lt;code&gt;show_image.py&lt;/code&gt; terminal command which resizes the provided image to fit the width and height of the current terminal and displays it using Rich Pixels. That &lt;a href="https://github.com/simonw/tools/blob/main/python/show_image.py"&gt;script is here&lt;/a&gt;, you can run it with &lt;code&gt;uv&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run https://tools.simonwillison.net/python/show_image.py \
  image.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's what I got when I ran it against my V&amp;amp;A East Storehouse photo from &lt;a href="https://simonwillison.net/2025/Aug/27/london-culture/"&gt;this post&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Terminal window. I ran that command and it spat out quite a pleasing and recognizable pixel art version of the photograph." src="https://static.simonwillison.net/static/2025/pixel-storehouse.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ascii-art"&gt;ascii-art&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/unicode"&gt;unicode&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rich"&gt;rich&lt;/a&gt;&lt;/p&gt;



</summary><category term="ascii-art"/><category term="cli"/><category term="python"/><category term="unicode"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="uv"/><category term="vibe-coding"/><category term="gpt-5"/><category term="rich"/></entry><entry><title>The perils of vibe coding</title><link href="https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-coding/#atom-tag" rel="alternate"/><published>2025-08-29T17:51:10+00:00</published><updated>2025-08-29T17:51:10+00:00</updated><id>https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-coding/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.ft.com/content/5b3d410a-6e02-41ad-9e0a-c2e4d672ca00"&gt;The perils of vibe coding&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I was interviewed by Elaine Moore for this opinion piece in the Financial Times, which ended up in the print edition of the paper too! I picked up a copy yesterday:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2025/ft.jpeg" style="text-decoration: none; border-bottom: none"&gt;&lt;img src="https://static.simonwillison.net/static/2025/ft.jpeg" alt="The perils of vibe coding - A new OpenAI model arrived this month with a glossy livestream, group watch parties and a lingering sense of disappointment. The YouTube comment section was underwhelmed. “I think they are all starting to realize this isn’t going to become the world like they thought it would,” wrote one viewer. “I can see it on their faces.” But if the casual user was unimpressed, the AI model’s saving grace may be vibe. Coding is generative AI’s newest battleground. With big bills to pay, high valuations to live up to and a market wobble to erase, the sector needs to prove its corporate productivity chops. Coding is hardly promoted as a business use case that already works. For one thing, AI-generated code holds the promise of replacing programmers — a profession of very well paid people. For another, the work can be quantified. In April, Microsoft chief executive Satya Nadella said that up to 50 per cent of the company’s code was now being written by AI. Google chief executive Sundar Pichai has said the same thing. Salesforce has paused engineering hires and Mark Zuckerberg told podcaster Joe Rogan that Meta would use AI as a “mid-level engineer” that writes code. Meanwhile, start-ups such as Replit and Cursor’s Anysphere are trying to persuade people that with AI, anyone can code. In theory, every employee can become a software engineer. So why aren’t we? One possibility is that it’s all still too unfamiliar. But when I ask people who write code for a living they offer an alternative suggestion: unpredictability. As programmer Simon Willison put it: “A lot of people are missing how weird and funny this space is. I’ve been a computer programmer for 30 years and [AI models] don’t behave like normal computers.” Willison is well known in the software engineering community for his AI experiments. He’s an enthusiastic vibe coder — using LLMs to generate code using natural language prompts. OpenAI’s latest model GPT-3.1s, he is now favourite. Still, he predicts that a vibe coding crash is due if it is used to produce glitchy software. It makes sense that programmers — people who are interested in finding new ways to solve problems — would be early adopters of LLMs. Code is a language, albeit an abstract one. And generative AI is trained in nearly all of them, including older ones like Cobol. That doesn’t mean they accept all of its suggestions. Willison thinks the best way to see what a new model can do is to ask for something unusual. He likes to request an svg (an image made out of lines described with code) of a pelican on a bike and asks it to remember the chickens in his garden by name. Results can be bizarre. One model ignored key prompts in favour of composing a poem. Still, his adventures in vibe coding sound like an advert for the sector’s future. Anthropic’s Claude Code, the favoured model for developers, to make an OCR (optical character recognition) software loves screenshots) tool that will copy and paste text from a screenshot. He wrote software that summarises blog comments and has planned to cut a custom tool that will alert him when a whale is visible from his Pacific coast home. All this by typing prompts in English. It’s sounds like the sort of thing Bill Gates might have had in mind when he wrote that natural language AI agents would bring about “the biggest revolution in computing since we went from typing commands to tapping on icons”. But watching code appear and know how it works are two different things. My efforts to make my own comment summary tool produced something unworkable that gave overly long answers and then congratulated itself as a success. Willison says he wouldn’t use AI-generated code for projects he planned to ship out unless he had reviewed each line. Not only is there the risk of hallucination but the chatbot’s desire to be agreeable means it may an unusable idea works. That is a particular issue for those of us who don’t know how to fix the code. We risk creating software with hidden problems. It may not save time either. A study published in July by the non-profit Model Evaluation and Threat Research assessed work done by 16 developers — some with AI tools, some without. Those using AI assistance it had made them faster. In fact it took them nearly a fifth longer. Several developers I spoke to said AI was best used as a way to talk through coding problems. It’s a version of something they call rubber ducking (after their habit of talking to the toys on their desk) — only this rubber duck can talk back. As one put it, code shouldn’t be judged by volume or speed. Progress in AI coding is tangible. But measuring productivity gains is not as neat as a simple percentage calculation."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;From the article, with links added by me to relevant projects:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Willison thinks the best way to see what a new model can do is to ask for something unusual. He likes to request an SVG (an image made out of lines described with code) of &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;a pelican on a bike&lt;/a&gt; and asks it to remember the chickens in his garden by name. Results can be bizarre. One model ignored his prompts in favour of &lt;a href="https://simonwillison.net/2025/Aug/14/gemma-3-270m/"&gt;composing a poem&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Still, his adventures in vibe coding sound like an advert for the sector. He used Anthropic's Claude Code, the favoured model for developers, to &lt;a href="https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/"&gt;make an OCR&lt;/a&gt; (optical character recognition - software loves acronyms) tool that will copy and paste text from a screenshot.&lt;/p&gt;
&lt;p&gt;He wrote software that &lt;a href="https://til.simonwillison.net/llms/claude-hacker-news-themes"&gt;summarises blog comments&lt;/a&gt; and has plans to build a custom tool that will alert him when a whale is visible from his Pacific coast home. All this by typing prompts in English.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've been talking about that whale spotting project for far too long. Now that it's been in the FT I really need to build it.&lt;/p&gt;
&lt;p&gt;(On the subject of OCR... I tried extracting the text from the above image using GPT-5 and got a &lt;a href="https://chatgpt.com/share/68b1e707-add0-8006-8344-4c2fca902b2e"&gt;surprisingly bad result&lt;/a&gt; full of hallucinated details. Claude Opus 4.1 &lt;a href="https://claude.ai/share/e98d2fe1-0c81-4f51-8739-483f843e4c0e"&gt;did a lot better&lt;/a&gt; but still made some mistakes. Gemini 2.5 &lt;a href="https://aistudio.google.com/app/prompts?state=%257B%2522ids%2522:%255B%25221MOzgBJI-FJF1uyile_7h2zL4F6lD0sgK%2522%255D,%2522action%2522:%2522open%2522,%2522userId%2522:%2522106366615678321494423%2522,%2522resourceKeys%2522:%257B%257D%257D&amp;amp;usp=sharing,%20https://drive.google.com/file/d/1ffD88ORjgjFzbPsvQ-Z52Exhb_Z9MgtL/view?usp=sharing"&gt;did much better&lt;/a&gt;.)


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/press-quotes"&gt;press-quotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ocr"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="claude"/><category term="vibe-coding"/><category term="press-quotes"/><category term="gpt-5"/></entry><entry><title>GPT-5 has a hidden system prompt</title><link href="https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/#atom-tag" rel="alternate"/><published>2025-08-15T23:09:32+00:00</published><updated>2025-08-15T23:09:32+00:00</updated><id>https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://twitter.com/xundecidability/status/1956347084870651960"&gt;GPT-5 has a hidden system prompt&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It looks like GPT-5 when accessed via the OpenAI API may have its own hidden system prompt, independent from the system prompt you can specify in an API call.&lt;/p&gt;
&lt;p&gt;At the very least it's getting sent the current date. I tried this just now:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5 'current date'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That returned "2025-08-15", confirming that the date has been fed to the model as part of a hidden prompt.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5 'current date' --system 'speak french'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Returned "La date actuelle est le 15 août 2025", showing that offering my own custom system prompt did not over-ride the invisible one that includes the date.&lt;/p&gt;
&lt;p&gt;GPT-5 is &lt;em&gt;very&lt;/em&gt; resistant to sharing the details of this secret system prompt, but Tommy Hughes &lt;a href="https://x.com/xundecidability/status/1956347084870651960"&gt;managed to extract&lt;/a&gt; the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Current date: 2025-08-15&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;You are an AI assistant accessed via an API. Your output may need to be parsed by code or displayed&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Desired oververbosity for the final answer (not analysis): 3&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;/code&gt;&lt;br&gt;
&lt;code&gt;# Juice: 64&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've not been able to confirm this myself, but a &lt;a href="https://github.com/search?q=%22desired+oververbosity%22&amp;amp;type=code"&gt;search for "desired oververbosity" on GitHub&lt;/a&gt; shows that a similar prompt has leaked from ChatGPT (including for o3) in the past. I suspect Tommy's leaked prompt is accurate here.&lt;/p&gt;
&lt;p&gt;I tried asking GPT-5 about "oververbosity" via its API:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5 'explain your oververbosity settings, what is it right now?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And &lt;a href="https://gist.github.com/simonw/2d787391a13caeb0a5ced7c8d5d597c8"&gt;got back&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;What it is: I adjust how detailed I am based on a “verbosity” level—ranging from very brief to very detailed—so answers fit your needs.&lt;/li&gt;
&lt;li&gt;Current setting: Concise (about 3/10). I prioritize direct answers with minimal extra context and keep explanations short unless you ask for more.&lt;/li&gt;
&lt;li&gt;How to change it: Just say what you want, e.g.:&lt;ul&gt;
&lt;li&gt;“One-sentence answer” or “TL;DR” (very brief)&lt;/li&gt;
&lt;li&gt;“Give a concise summary” (short bullets)&lt;/li&gt;
&lt;li&gt;“Walk me through in detail,” “Deep dive,” or “Include examples/edge cases” (more detailed)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Note: For ambiguous, technical, or safety-critical topics, I may add brief clarifications even when being concise.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Presumably this is part of OpenAI's instruction hierarchy concept, with these instructions taking precedence over the developer instructions provided by API users (my &lt;code&gt;--system 'speak french'&lt;/code&gt; option above).&lt;/p&gt;
&lt;p&gt;I'd very much appreciate official documentation that describes this! As an API user I want to know &lt;em&gt;everything&lt;/em&gt; that is being fed into the model - I would be much more comfortable with a hidden prompt like this if I knew exactly what was in it.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="system-prompts"/><category term="gpt-5"/></entry><entry><title>LLM 0.27, the annotated release notes: GPT-5 and improved tool calling</title><link href="https://simonwillison.net/2025/Aug/11/llm-027/#atom-tag" rel="alternate"/><published>2025-08-11T23:57:50+00:00</published><updated>2025-08-11T23:57:50+00:00</updated><id>https://simonwillison.net/2025/Aug/11/llm-027/#atom-tag</id><summary type="html">
    &lt;p&gt;I shipped &lt;a href="https://llm.datasette.io/en/stable/changelog.html#v0-27"&gt;LLM 0.27&lt;/a&gt; today (followed by a &lt;a href="https://llm.datasette.io/en/stable/changelog.html#v0-27-1"&gt;0.27.1 with minor bug fixes&lt;/a&gt;), adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features &lt;a href="https://simonwillison.net/2025/May/27/llm-tools/"&gt;introduced in LLM 0.26&lt;/a&gt;. Here are the &lt;a href="https://simonwillison.net/tags/annotated-release-notes/"&gt;annotated release notes&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="gpt-5"&gt;GPT-5&lt;/h4&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New models: &lt;code&gt;gpt-5&lt;/code&gt;, &lt;code&gt;gpt-5-mini&lt;/code&gt; and &lt;code&gt;gpt-5-nano&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1229"&gt;#1229&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I would have liked to get these out sooner, but LLM had accumulated quite a lot of other changes since the last release and I wanted to use GPT-5 as an excuse to wrap all of those up and get them out there.&lt;/p&gt;
&lt;p&gt;These models work much the same as other OpenAI models, but they have a new &lt;code&gt;reasoning_effort&lt;/code&gt; option of &lt;code&gt;minimal&lt;/code&gt;. You can try that out like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5 'A letter advocating for cozy boxes for pelicans in Half Moon Bay harbor' -o reasoning_effort minimal
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Setting "minimal" almost completely eliminates the "thinking" time for the model, causing it to behave more like GPT-4o.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/49838dbca944d3f22dfe65ef11c5637d"&gt;the letter it wrote me&lt;/a&gt; at a cost of 20 input, 706 output = &lt;a href="https://www.llm-prices.com/#it=20&amp;amp;ot=706&amp;amp;ic=1.25&amp;amp;oc=10"&gt;$0.007085 which is 0.7085 cents&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can set the default model to GPT-5-mini (since it's a bit cheaper) like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm models default gpt-5-mini
&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id="tools-in-templates"&gt;Tools in templates&lt;/h4&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;LLM &lt;a href="https://llm.datasette.io/en/stable/templates.html#prompt-templates"&gt;templates&lt;/a&gt; can now include a list of tools. These can be named tools from plugins or arbitrary Python function blocks, see &lt;a href="https://llm.datasette.io/en/stable/templates.html#prompt-templates-tools"&gt;Tools in templates&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/issues/1009"&gt;#1009&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think this is the most important feature in the new release.&lt;/p&gt;
&lt;p&gt;I added LLM's &lt;a href="https://simonwillison.net/2025/May/27/llm-tools/"&gt;tool calling features&lt;/a&gt; in LLM 0.26. You can call them from the Python API but you can also call them from the command-line like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -T llm_version -T llm_time 'Tell the time, then show the version'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/65d830f8cb38cdeb78093d6ac890ce2c#response-1"&gt;the output&lt;/a&gt; of &lt;code&gt;llm logs -c&lt;/code&gt; after running that command.&lt;/p&gt;
&lt;p&gt;This example shows that you have to explicitly list all of the tools you would like to expose to the model, using the &lt;code&gt;-T/--tool&lt;/code&gt; option one or more times.&lt;/p&gt;
&lt;p&gt;In LLM 0.27 you can now save these tool collections to &lt;a href="https://llm.datasette.io/en/stable/templates.html"&gt;a template&lt;/a&gt;. Let's try that now:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -T llm_version -T llm_time -m gpt-5 --save mytools
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now &lt;code&gt;mytools&lt;/code&gt; is a template that bundles those two tools and sets the default model to GPT-5. We can run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -t mytools 'Time then version'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let's do something more fun. My blog has a &lt;a href="https://datasette.simonwillison.net/"&gt;Datasette mirror&lt;/a&gt; which I can run queries against. I'm going to use the &lt;a href="https://github.com/simonw/llm-tools-datasette"&gt;llm-tools-datasette&lt;/a&gt; plugin to turn that into a tool-driven template. This plugin uses a "toolbox", which looks a bit like a class. Those are &lt;a href="https://llm.datasette.io/en/stable/python-api.html#toolbox-classes"&gt;described here&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-tools-datasette

# Now create that template
llm --tool 'Datasette("https://datasette.simonwillison.net/simonwillisonblog")' \
  -m gpt-5 -s 'Use Datasette tools to answer questions' --save blog
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can ask questions of my database like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -t blog 'top ten tags by number of entries'&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;--td&lt;/code&gt; option there stands for &lt;code&gt;--tools-debug&lt;/code&gt; - it means we can see all tool calls as they are run.&lt;/p&gt;
&lt;p&gt;Here's the output of the above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Top 10 tags by number of entries (excluding drafts):
- quora — 1003
- projects — 265
- datasette — 238
- python — 213
- ai — 200
- llms — 200
- generative-ai — 197
- weeknotes — 193
- web-development — 166
- startups — 157
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://gist.github.com/simonw/7b2d0d327afc32ad6c90179fa76290ad"&gt;Full transcript with tool traces here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm really excited about the ability to store configured tools&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Tools &lt;a href="https://llm.datasette.io/en/stable/python-api.html#python-api-tools-attachments"&gt;can now return attachments&lt;/a&gt;, for models that support features such as image input. &lt;a href="https://github.com/simonw/llm/issues/1014"&gt;#1014&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I want to build a tool that can render SVG to an image, then return that image so the model can see what it has drawn. For reasons.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New methods on the &lt;code&gt;Toolbox&lt;/code&gt; class: &lt;code&gt;.add_tool()&lt;/code&gt;, &lt;code&gt;.prepare()&lt;/code&gt; and &lt;code&gt;.prepare_async()&lt;/code&gt;, described in &lt;a href="https://llm.datasette.io/en/stable/python-api.html#python-api-tools-dynamic"&gt;Dynamic toolboxes&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/issues/1111"&gt;#1111&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I added these because there's a lot of interest in an MCP plugin for Datasette. Part of the challenge with MCP is that the user provides the URL to a server but we then need to introspect that server and dynamically add the tools we have discovered there. The new &lt;code&gt;.add_tool()&lt;/code&gt; method can do that, and the &lt;code&gt;.prepare()&lt;/code&gt; and &lt;code&gt;.prepare_async()&lt;/code&gt; methods give us a reliable way to run some discovery code outside of the class constructor, allowing it to make asynchronous calls if necessary.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;model.conversation(before_call=x, after_call=y)&lt;/code&gt; attributes for registering callback functions to run before and after tool calls. See &lt;a href="https://llm.datasette.io/en/stable/python-api.html#python-api-tools-debug-hooks"&gt;tool debugging hooks&lt;/a&gt; for details. &lt;a href="https://github.com/simonw/llm/issues/1088"&gt;#1088&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Raising &lt;code&gt;llm.CancelToolCall&lt;/code&gt; now only cancels the current tool call, passing an error back to the model and allowing it to continue. &lt;a href="https://github.com/simonw/llm/issues/1148"&gt;#1148&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;These hooks are useful for implementing more complex tool calling at the Python API layer. In addition to debugging and logging they allow Python code to intercept tool calls and cancel or delay them based on what they are trying to do.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Some model providers can serve different models from the same configured URL - &lt;a href="https://github.com/simonw/llm-llama-server"&gt;llm-llama-server&lt;/a&gt; for example. Plugins for these providers can now record the resolved model ID of the model that was used to the LLM logs using the &lt;code&gt;response.set_resolved_model(model_id)&lt;/code&gt; method. &lt;a href="https://github.com/simonw/llm/issues/1117"&gt;#1117&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This solves a frustration I've had for a while where some of my plugins log the same model ID for requests that were processed by a bunch of different models under the hood - making my logs less valuable. The new mechanism now allows plugins to record a more accurate model ID for a prompt, should it differ from the model ID that was requsted.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;-l/--latest&lt;/code&gt; option for &lt;code&gt;llm logs -q searchterm&lt;/code&gt; for searching logs ordered by date (most recent first) instead of the default relevance search. &lt;a href="https://github.com/simonw/llm/issues/1177"&gt;#1177&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;My personal &lt;a href="https://llm.datasette.io/en/stable/logging.html"&gt;log database&lt;/a&gt; has grown to over 8,000 entries now, and running full-text search queries against it often returned results from last year that were no longer relevant to me. Being able to find the &lt;em&gt;latest&lt;/em&gt; prompt matching "pelican svg" is much more useful.&lt;/p&gt;
&lt;p&gt;Everything else was bug fixes and documentation improvements:&lt;/p&gt;
&lt;blockquote&gt;
&lt;h3 id="bug-fixes-and-documentation"&gt;Bug fixes and documentation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;register_embedding_models&lt;/code&gt; hook is &lt;a href="https://llm.datasette.io/en/stable/plugins/plugin-hooks.html#register-embedding-models-register"&gt;now documented&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/issues/1049"&gt;#1049&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Show visible stack trace for &lt;code&gt;llm templates show invalid-template-name&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1053"&gt;#1053&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Handle invalid tool names more gracefully in &lt;code&gt;llm chat&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1104"&gt;#1104&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add a &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html#plugin-directory-tools"&gt;Tool plugins&lt;/a&gt; section to the plugin directory. &lt;a href="https://github.com/simonw/llm/issues/1110"&gt;#1110&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Error on &lt;code&gt;register(Klass)&lt;/code&gt; if the passed class is not a subclass of &lt;code&gt;Toolbox&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1114"&gt;#1114&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;-h&lt;/code&gt; for &lt;code&gt;--help&lt;/code&gt; for all &lt;code&gt;llm&lt;/code&gt; CLI commands. &lt;a href="https://github.com/simonw/llm/issues/1134"&gt;#1134&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add missing &lt;code&gt;dataclasses&lt;/code&gt; to advanced model plugins docs. &lt;a href="https://github.com/simonw/llm/issues/1137"&gt;#1137&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Fixed a bug where &lt;code&gt;llm logs -T llm_version "version" --async&lt;/code&gt; incorrectly recorded just one single log entry when it should have recorded two. &lt;a href="https://github.com/simonw/llm/issues/1150"&gt;#1150&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;All extra OpenAI model keys in &lt;code&gt;extra-openai-models.yaml&lt;/code&gt; are &lt;a href="https://llm.datasette.io/en/stable/other-models.html#openai-compatible-models"&gt;now documented&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/issues/1228"&gt;#1228&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="python"/><category term="ai"/><category term="datasette"/><category term="annotated-release-notes"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="llm-tool-use"/><category term="gpt-5"/></entry><entry><title>Codex upgrade</title><link href="https://simonwillison.net/2025/Aug/11/codex-upgrade/#atom-tag" rel="alternate"/><published>2025-08-11T16:06:39+00:00</published><updated>2025-08-11T16:06:39+00:00</updated><id>https://simonwillison.net/2025/Aug/11/codex-upgrade/#atom-tag</id><summary type="html">
    &lt;p&gt;If you've been experimenting with OpenAI's &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; and have been frustrated that it's not possible to select text and copy it to the clipboard, at least when running in the Mac terminal (I genuinely didn't know it was possible to build a terminal app that disabled copy and paste) you should know that they fixed that in &lt;a href="https://github.com/openai/codex/issues/1247"&gt;this issue&lt;/a&gt; last week.&lt;/p&gt;
&lt;p&gt;The new &lt;a href="https://github.com/openai/codex/releases/tag/rust-v0.20.0"&gt;0.20.0 version&lt;/a&gt; from three days ago also completely removes the old TypeScript codebase in favor of Rust. Even installations via NPM now get the Rust version.&lt;/p&gt;
&lt;p&gt;I originally installed Codex via Homebrew, so I had to run this command to get the updated version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;brew upgrade codex
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Another Codex tip: to use GPT-5 (or any other specific OpenAI model) you can run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;export OPENAI_DEFAULT_MODEL="gpt-5"
codex
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;This no longer works, see update below.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I've been using a &lt;code&gt;codex-5&lt;/code&gt; script on my PATH containing this, because sometimes I like to live dangerously!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/env zsh
# Usage: codex-5 [additional args passed to `codex`]
export OPENAI_DEFAULT_MODEL="gpt-5"
exec codex --dangerously-bypass-approvals-and-sandbox "$@"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: It looks like GPT-5 is &lt;a href="https://github.com/openai/codex/blob/c61911524d839f5d56842faee0c46f6ef52d4387/codex-rs/core/src/config.rs#L28"&gt;the default model&lt;/a&gt; in v0.20.0 already.&lt;/p&gt;
&lt;p&gt;Also the environment variable I was using no longer does anything, it was &lt;a href="https://github.com/openai/codex/commit/107d2ce4e74618968b2eb7016777121d9529a204#diff-b012ea51eb2b6d23db97b930526379af9c4c119a3e057e55ea29d056326242e0L6"&gt;removed in this commit&lt;/a&gt; (I used Codex Web to &lt;a href="https://chatgpt.com/s/cd_689a252794b081919099d5ade205b41d"&gt;help figure that out&lt;/a&gt;). You can use the &lt;code&gt;-m model_id&lt;/code&gt; command-line option instead.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="ai"/><category term="llms"/><category term="gpt-5"/><category term="rust"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="generative-ai"/><category term="codex-cli"/></entry><entry><title>Quoting Sam Altman</title><link href="https://simonwillison.net/2025/Aug/10/sam-altman/#atom-tag" rel="alternate"/><published>2025-08-10T23:09:57+00:00</published><updated>2025-08-10T23:09:57+00:00</updated><id>https://simonwillison.net/2025/Aug/10/sam-altman/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://x.com/sama/status/1954603417252532479"&gt;&lt;p&gt;the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from &amp;lt;1% to 7%, and for plus users from 7% to 24%.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://x.com/sama/status/1954603417252532479"&gt;Sam Altman&lt;/a&gt;, revealing quite how few people used the old model picker to upgrade from GPT-4o&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sam-altman"&gt;sam-altman&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="llm-reasoning"/><category term="sam-altman"/><category term="gpt-5"/></entry><entry><title>Quoting Ethan Mollick</title><link href="https://simonwillison.net/2025/Aug/9/ethan-mollick/#atom-tag" rel="alternate"/><published>2025-08-09T16:13:19+00:00</published><updated>2025-08-09T16:13:19+00:00</updated><id>https://simonwillison.net/2025/Aug/9/ethan-mollick/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/emollick/status/1954210778321465634"&gt;&lt;p&gt;The issue with GPT-5 in a nutshell is that unless you pay for model switching &amp;amp; know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI &amp;amp; sometimes get one of the worst AIs available and it might even switch within a single conversation.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/emollick/status/1954210778321465634"&gt;Ethan Mollick&lt;/a&gt;, highlighting that GPT-5 (high) ranks top &lt;a href="https://artificialanalysis.ai/leaderboards/models"&gt;on Artificial Analysis&lt;/a&gt;, GPT-5 (minimal) ranks lower than GPT-4.1&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ethan-mollick"&gt;ethan-mollick&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ethan-mollick"/><category term="gpt-5"/></entry><entry><title>Quoting Sam Altman</title><link href="https://simonwillison.net/2025/Aug/8/sam-altman/#atom-tag" rel="alternate"/><published>2025-08-08T19:07:12+00:00</published><updated>2025-08-08T19:07:12+00:00</updated><id>https://simonwillison.net/2025/Aug/8/sam-altman/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://x.com/sama/status/1953893841381273969"&gt;&lt;p&gt;GPT-5 rollout updates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout.&lt;/li&gt;
&lt;li&gt;We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for.&lt;/li&gt;
&lt;li&gt;GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.&lt;/li&gt;
&lt;li&gt;We will make it more transparent about which model is answering a given query.&lt;/li&gt;
&lt;li&gt;We will change the UI to make it easier to manually trigger thinking.&lt;/li&gt;
&lt;li&gt;Rolling out to everyone is taking a bit longer. It’s a massive change at big scale. For example, our API traffic has about doubled over the past 24 hours…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will continue to work to get things stable and will keep listening to feedback. As we mentioned, we expected some bumpiness as we roll out so many things at once. But it was a little more bumpy than we hoped for!&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://x.com/sama/status/1953893841381273969"&gt;Sam Altman&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sam-altman"&gt;sam-altman&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="sam-altman"/><category term="gpt-5"/></entry><entry><title>The surprise deprecation of GPT-4o for ChatGPT consumers</title><link href="https://simonwillison.net/2025/Aug/8/surprise-deprecation-of-gpt-4o/#atom-tag" rel="alternate"/><published>2025-08-08T17:52:10+00:00</published><updated>2025-08-08T17:52:10+00:00</updated><id>https://simonwillison.net/2025/Aug/8/surprise-deprecation-of-gpt-4o/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been dipping into the &lt;a href="https://reddit.com/r/chatgpt"&gt;r/ChatGPT&lt;/a&gt; subreddit recently to see how people are reacting to &lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/"&gt;the GPT-5 launch&lt;/a&gt;, and so far the vibes there are not good. &lt;a href="https://www.reddit.com/r/ChatGPT/comments/1mkae1l/gpt5_ama_with_openais_sam_altman_and_some_of_the/"&gt;This AMA thread&lt;/a&gt; with the OpenAI team is a great illustration of the single biggest complaint: a lot of people are &lt;em&gt;very&lt;/em&gt; unhappy to lose access to the much older GPT-4o, previously ChatGPT's default model for most users.&lt;/p&gt;
&lt;p&gt;A big surprise for me yesterday was that OpenAI simultaneously retired access to their older models as they rolled out GPT-5, at least in their consumer apps. Here's a snippet from &lt;a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes"&gt;their August 7th 2025 release notes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When GPT-5 launches, several older models will be retired, including GPT-4o, GPT-4.1, GPT-4.5, GPT-4.1-mini, o4-mini, o4-mini-high, o3, o3-pro.&lt;/p&gt;
&lt;p&gt;If you open a conversation that used one of these models, ChatGPT will automatically switch it to the closest GPT-5 equivalent. Chats with 4o, 4.1, 4.5, 4.1-mini, o4-mini, or o4-mini-high will open in GPT-5, chats with o3 will open in GPT-5-Thinking, and chats with o3-Pro will open in GPT-5-Pro (available only on Pro and Team).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's no deprecation period at all: when your consumer ChatGPT account gets GPT-5, those older models cease to be available.&lt;/p&gt;

&lt;p id="sama"&gt;&lt;strong&gt;Update 12pm Pacific Time&lt;/strong&gt;: Sam Altman on Reddit &lt;a href="https://www.reddit.com/r/ChatGPT/comments/1mkae1l/comment/n7nelhh/"&gt;six minutes ago&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ok, we hear you all on 4o; thanks for the time to give us the feedback (and the passion!). we are going to bring it back for plus users, and will watch usage to determine how long to support it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;See also &lt;a href="https://x.com/sama/status/1953893841381273969"&gt;Sam's tweet&lt;/a&gt; about updates to the GPT-5 rollout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 12th August 2025&lt;/strong&gt;: Another &lt;a href="https://x.com/sama/status/1955438916645130740"&gt;Tweet from Sam&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;4o is back in the model picker for all paid users by default. If we ever do deprecate it, we will give plenty of notice. Paid users also now have a “Show additional models” toggle in ChatGPT web settings which will add models like o3, 4.1, and GPT-5 Thinking mini. 4.5 is only available to Pro users—it costs a lot of GPUs.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;p&gt;Rest of my original post continues below:&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;(This only affects ChatGPT consumers - the API still provides the old models, their &lt;a href="https://platform.openai.com/docs/deprecations"&gt;deprecation policies are published here&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;One of the expressed goals for GPT-5 was to escape the terrible UX of the model picker. Asking users to pick between GPT-4o and o3 and o4-mini was a notoriously bad UX, and resulted in many users sticking with that default 4o model - now a year old - and hence not being exposed to the advances in model capabilities over the last twelve months.&lt;/p&gt;
&lt;p&gt;GPT-5's solution is to automatically pick the underlying model based on the prompt. On paper this sounds great - users don't have to think about models any more, and should get upgraded to the best available model depending on the complexity of their question.&lt;/p&gt;
&lt;p&gt;I'm already getting the sense that this is &lt;strong&gt;not&lt;/strong&gt; a welcome approach for power users. It makes responses much less predictable as the model selection can have a dramatic impact on what comes back.&lt;/p&gt;
&lt;p&gt;Paid tier users can select "GPT-5 Thinking" directly. Ethan Mollick is &lt;a href="https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff"&gt;already recommending deliberately selecting the Thinking mode&lt;/a&gt; if you have the ability to do so, or trying prompt additions like "think harder" to increase the chance of being routed to it.&lt;/p&gt;
&lt;p&gt;But back to GPT-4o. Why do many people on Reddit care so much about losing access to that crusty old model? I think &lt;a href="https://www.reddit.com/r/ChatGPT/comments/1mkae1l/comment/n7js2sf/"&gt;this comment&lt;/a&gt; captures something important here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I know GPT-5 is designed to be stronger for complex reasoning, coding, and professional tasks, but &lt;strong&gt;not all of us need a pro coding model&lt;/strong&gt;. Some of us rely on 4o for creative collaboration, emotional nuance, roleplay, and other long-form, high-context interactions. Those areas feel different enough in GPT-5 that it impacts my ability to work and create the way I’m used to.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What a fascinating insight into the wildly different styles of LLM-usage that exist in the world today! With &lt;a href="https://simonwillison.net/2025/Aug/4/nick-turley/"&gt;700M weekly active users&lt;/a&gt; the variety of usage styles out there is incomprehensibly large.&lt;/p&gt;
&lt;p&gt;Personally I mainly use ChatGPT for research, coding assistance, drawing pelicans and foolish experiments. &lt;em&gt;Emotional nuance&lt;/em&gt; is not a characteristic I would know how to test!&lt;/p&gt;
&lt;p&gt;Professor Casey Fiesler &lt;a href="https://www.tiktok.com/@professorcasey/video/7536223372485709086"&gt;on TikTok&lt;/a&gt; highlighted OpenAI’s post from last week &lt;a href="https://openai.com/index/how-we%27re-optimizing-chatgpt/"&gt;What we’re optimizing ChatGPT for&lt;/a&gt;, which includes the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ChatGPT is trained to respond with grounded honesty. There have been instances where our 4o model fell short in recognizing signs of delusion or emotional dependency. […]&lt;/p&gt;
&lt;p&gt;When you ask something like “Should I break up with my boyfriend?” ChatGPT shouldn’t give you an answer. It should help you think it through—asking questions, weighing pros and cons. New behavior for high-stakes personal decisions is rolling out soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Casey points out that this is an ethically complicated issue. On the one hand ChatGPT should be much more careful about how it responds to these kinds of questions. But if you’re already leaning on the model for life advice like this, having that capability taken away from you without warning could represent a sudden and unpleasant loss!&lt;/p&gt;
&lt;p&gt;It's too early to tell how this will shake out. Maybe OpenAI will extend a deprecation period for GPT-4o in their consumer apps?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update&lt;/strong&gt;: That's exactly what they've done, see &lt;a href="https://simonwillison.net/2025/Aug/8/surprise-deprecation-of-gpt-4o/#sama"&gt;update above&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;GPT-4o remains available via the API, and there are no announced plans to deprecate it there. It's possible we may see a small but determined rush of ChatGPT users to alternative third party chat platforms that use that API under the hood.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tiktok"&gt;tiktok&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-personality"&gt;ai-personality&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="tiktok"/><category term="ai-ethics"/><category term="ai-personality"/><category term="gpt-5"/></entry><entry><title>Previewing GPT-5 at OpenAI's office</title><link href="https://simonwillison.net/2025/Aug/7/previewing-gpt-5/#atom-tag" rel="alternate"/><published>2025-08-07T19:11:19+00:00</published><updated>2025-08-07T19:11:19+00:00</updated><id>https://simonwillison.net/2025/Aug/7/previewing-gpt-5/#atom-tag</id><summary type="html">
    &lt;p&gt;A couple of weeks ago I was invited to OpenAI's headquarters for a "preview event", for which I had to sign both an NDA and a video release waiver. I suspected it might relate to either GPT-5 or the OpenAI open weight models... and &lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/"&gt;GPT-5 it was&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;OpenAI had invited five developers: &lt;a href="https://clairevo.com/"&gt;Claire Vo&lt;/a&gt;, &lt;a href="https://www.youtube.com/@t3dotgg"&gt;Theo Browne&lt;/a&gt;, &lt;a href="https://x.com/benhylak"&gt;Ben Hylak&lt;/a&gt;, &lt;a href="https://www.swyx.io/"&gt;Shawn @swyx Wang&lt;/a&gt;, and myself. We were all given early access to the new models and asked to spend a couple of hours (of paid time, see &lt;a href="https://simonwillison.net/about/#disclosures"&gt;my disclosures&lt;/a&gt;) experimenting with them, while being filmed by a professional camera crew.&lt;/p&gt;
&lt;p&gt;The resulting video is &lt;a href="https://www.youtube.com/watch?v=-gXmWYQtv5o"&gt;now up on YouTube&lt;/a&gt;. Unsurprisingly most of my edits related to &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;SVGs of pelicans&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;lite-youtube videoid="-gXmWYQtv5o" js-api="js-api"
  title=" Surprising developers with GPT-5 "
  playlabel="Play:  Surprising developers with GPT-5 "
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/disclosures"&gt;disclosures&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/theo-browne"&gt;theo-browne&lt;/a&gt;&lt;/p&gt;



</summary><category term="youtube"/><category term="gpt-5"/><category term="generative-ai"/><category term="openai"/><category term="pelican-riding-a-bicycle"/><category term="ai"/><category term="llms"/><category term="disclosures"/><category term="theo-browne"/></entry><entry><title>GPT-5: Key characteristics, pricing and model card</title><link href="https://simonwillison.net/2025/Aug/7/gpt-5/#atom-tag" rel="alternate"/><published>2025-08-07T17:36:12+00:00</published><updated>2025-08-07T17:36:12+00:00</updated><id>https://simonwillison.net/2025/Aug/7/gpt-5/#atom-tag</id><summary type="html">
    &lt;p&gt;I've had preview access to the new GPT-5 model family for the past two weeks (see &lt;a href="https://simonwillison.net/2025/Aug/7/previewing-gpt-5/"&gt;related video&lt;/a&gt; and &lt;a href="https://simonwillison.net/about/#disclosures"&gt;my disclosures&lt;/a&gt;) and have been using GPT-5 as my daily-driver. It's my new favorite model. It's still an LLM - it's not a dramatic departure from what we've had before - but it rarely screws up and generally feels competent or occasionally impressive at the kinds of things I like to use models for.&lt;/p&gt;
&lt;p&gt;I've collected a lot of notes over the past two weeks, so I've decided to break them up into &lt;a href="https://simonwillison.net/series/gpt-5/"&gt;a series of posts&lt;/a&gt;. This first one will cover key characteristics of the models, how they are priced and what we can learn from the &lt;a href="https://openai.com/index/gpt-5-system-card/"&gt;GPT-5 system card&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#key-model-characteristics"&gt;Key model characteristics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#position-in-the-openai-model-family"&gt;Position in the OpenAI model family&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#pricing-is-aggressively-competitive"&gt;Pricing is aggressively competitive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#more-notes-from-the-system-card"&gt;More notes from the system card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#prompt-injection-in-the-system-card"&gt;Prompt injection in the system card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#thinking-traces-in-the-api"&gt;Thinking traces in the API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of-pelicans"&gt;And some SVGs of pelicans&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="key-model-characteristics"&gt;Key model characteristics&lt;/h4&gt;
&lt;p&gt;Let's start with the fundamentals. GPT-5 in ChatGPT is a weird hybrid that switches between different models. Here's what the system card says about that (my highlights in bold):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and &lt;strong&gt;a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent&lt;/strong&gt; (for example, if you say “think hard about this” in the prompt). [...] Once usage limits are reached, a mini version of each model handles remaining queries. In the near future, we plan to integrate these capabilities into a single model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;GPT-5 in the API is simpler: it's available as three models - &lt;strong&gt;regular&lt;/strong&gt;, &lt;strong&gt;mini&lt;/strong&gt; and &lt;strong&gt;nano&lt;/strong&gt; - which can each be run at one of four reasoning levels: minimal (a new level not previously available for other OpenAI reasoning models), low, medium or high.&lt;/p&gt;
&lt;p&gt;The models have an input limit of 272,000 tokens and an output limit (which includes invisible reasoning tokens) of 128,000 tokens. They support text and image for input, text only for output.&lt;/p&gt;
&lt;p&gt;I've mainly explored full GPT-5. My verdict: it's just &lt;strong&gt;good at stuff&lt;/strong&gt;. It doesn't feel like a dramatic leap ahead from other LLMs but it exudes competence - it rarely messes up, and frequently impresses me. I've found it to be a very sensible default for everything that I want to do. At no point have I found myself wanting to re-run a prompt against a different model to try and get a better result.&lt;/p&gt;

&lt;p&gt;Here are the OpenAI model pages for &lt;a href="https://platform.openai.com/docs/models/gpt-5"&gt;GPT-5&lt;/a&gt;, &lt;a href="https://platform.openai.com/docs/models/gpt-5-mini"&gt;GPT-5 mini&lt;/a&gt; and &lt;a href="https://platform.openai.com/docs/models/gpt-5-nano"&gt;GPT-5 nano&lt;/a&gt;. Knowledge cut-off is September 30th 2024 for GPT-5 and May 30th 2024 for GPT-5 mini and nano.&lt;/p&gt;

&lt;h4 id="position-in-the-openai-model-family"&gt;Position in the OpenAI model family&lt;/h4&gt;
&lt;p&gt;The three new GPT-5 models are clearly intended as a replacement for most of the rest of the OpenAI line-up. This table from the system card is useful, as it shows how they see the new models fitting in:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Previous model&lt;/th&gt;
&lt;th&gt;GPT-5 model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;gpt-5-main&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o-mini&lt;/td&gt;
&lt;td&gt;gpt-5-main-mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI o3&lt;/td&gt;
&lt;td&gt;gpt-5-thinking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI o4-mini&lt;/td&gt;
&lt;td&gt;gpt-5-thinking-mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1-nano&lt;/td&gt;
&lt;td&gt;gpt-5-thinking-nano&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI o3 Pro&lt;/td&gt;
&lt;td&gt;gpt-5-thinking-pro&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That "thinking-pro" model is currently only available via ChatGPT where it is labelled as "GPT-5 Pro" and limited to the $200/month tier. It uses "parallel test time compute".&lt;/p&gt;
&lt;p&gt;The only capabilities not covered by GPT-5 are audio input/output and image generation. Those remain covered by models like &lt;a href="https://platform.openai.com/docs/models/gpt-4o-audio-preview"&gt;GPT-4o Audio&lt;/a&gt; and &lt;a href="https://platform.openai.com/docs/models/gpt-4o-realtime-preview"&gt;GPT-4o Realtime&lt;/a&gt; and their mini variants and the &lt;a href="https://platform.openai.com/docs/models/gpt-image-1"&gt;GPT Image 1&lt;/a&gt; and DALL-E image generation models.&lt;/p&gt;
&lt;h4 id="pricing-is-aggressively-competitive"&gt;Pricing is aggressively competitive&lt;/h4&gt;
&lt;p&gt;The pricing is &lt;em&gt;aggressively competitive&lt;/em&gt; with other providers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPT-5: $1.25/million for input, $10/million for output&lt;/li&gt;
&lt;li&gt;GPT-5 Mini: $0.25/m input, $2.00/m output&lt;/li&gt;
&lt;li&gt;GPT-5 Nano: $0.05/m input, $0.40/m output&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GPT-5 is priced at half the input cost of GPT-4o, and maintains the same price for output. Those invisible reasoning tokens count as output tokens so you can expect most prompts to use more output tokens than their GPT-4o equivalent (unless you set reasoning effort to "minimal").&lt;/p&gt;
&lt;p&gt;The discount for token caching is significant too: 90% off on input tokens that have been used within the previous few minutes. This is particularly material if you are implementing a chat UI where the same conversation gets replayed every time the user adds another prompt to the sequence.&lt;/p&gt;
&lt;p&gt;Here's a comparison table I put together showing the new models alongside the most comparable models from OpenAI's competition:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/m&lt;/th&gt;
&lt;th&gt;Output $/m&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.1&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;td&gt;75.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Pro (&amp;gt;200,000)&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;td&gt;10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;td&gt;8.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;o3&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;td&gt;8.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Pro (&amp;lt;200,000)&lt;/td&gt;
&lt;td&gt;1.25&lt;/td&gt;
&lt;td&gt;10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPT-5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.25&lt;/td&gt;
&lt;td&gt;10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;o4-mini&lt;/td&gt;
&lt;td&gt;1.10&lt;/td&gt;
&lt;td&gt;4.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5 Haiku&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;td&gt;4.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1 mini&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;td&gt;1.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 3 Mini&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;td&gt;0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPT-5 Mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o mini&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;0.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash-Lite&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1 Nano&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Nova Lite&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;td&gt;0.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPT-5 Nano&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Nova Micro&lt;/td&gt;
&lt;td&gt;0.035&lt;/td&gt;
&lt;td&gt;0.14&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(Here's a good example of a GPT-5 failure: I tried to get it to &lt;a href="https://chatgpt.com/share/6894d804-bca4-8006-ac46-580bf4a9bf5f"&gt;output that table sorted itself&lt;/a&gt; but it put Nova Micro as more expensive than GPT-5 Nano, so I prompted it to "construct the table in Python and sort it there" and that fixed the issue.)&lt;/p&gt;
&lt;h4 id="more-notes-from-the-system-card"&gt;More notes from the system card&lt;/h4&gt;
&lt;p&gt;As usual, &lt;a href="https://openai.com/index/gpt-5-system-card/"&gt;the system card&lt;/a&gt; is vague on what went into the training data. Here's what it says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Like OpenAI’s other models, the GPT-5 models were trained on diverse datasets, including information that is publicly available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate. [...] We use advanced data filtering processes to reduce personal information from training data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I found this section interesting, as it reveals that writing, code and health are three of the most common use-cases for ChatGPT. This explains why so much effort went into health-related questions,  for both GPT-5 and the recently released OpenAI open weight models.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We’ve made significant advances in &lt;strong&gt;reducing hallucinations, improving instruction following, and minimizing sycophancy&lt;/strong&gt;, and have leveled up GPT-5’s performance in &lt;strong&gt;three of ChatGPT’s most common uses: writing, coding, and health&lt;/strong&gt;. All of the GPT-5 models additionally feature &lt;strong&gt;safe-completions, our latest approach to safety training&lt;/strong&gt; to prevent disallowed content.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Safe-completions is later described like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Large language models such as those powering ChatGPT have &lt;strong&gt;traditionally been trained to
either be as helpful as possible or outright refuse a user request&lt;/strong&gt;, depending on whether the
prompt is allowed by safety policy. [...] Binary refusal boundaries are especially ill-suited for dual-use cases (such as biology
or cybersecurity), where a user request can be completed safely at a high level, but may lead
to malicious uplift if sufficiently detailed or actionable. &lt;strong&gt;As an alternative, we introduced safe-
completions: a safety-training approach that centers on the safety of the assistant’s output rather
than a binary classification of the user’s intent&lt;/strong&gt;. Safe-completions seek to maximize helpfulness
subject to the safety policy’s constraints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So instead of straight up refusals, we should expect GPT-5 to still provide an answer but moderate that answer to avoid it including "harmful" content.&lt;/p&gt;
&lt;p&gt;OpenAI have a paper about this which I haven't read yet (I didn't get early access): &lt;a href="https://openai.com/index/gpt-5-safe-completions/"&gt;From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sycophancy gets a mention, unsurprising given &lt;a href="https://simonwillison.net/2025/May/2/what-we-missed-with-sycophancy/"&gt;their high profile disaster in April&lt;/a&gt;. They've worked on this in the core model:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;System
prompts, while easy to modify, have a more limited impact on model outputs relative to changes in
post-training. For GPT-5, we post-trained our models to reduce sycophancy. Using conversations
representative of production data, we evaluated model responses, then assigned a score reflecting
the level of sycophancy, which was used as a reward signal in training.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They claim impressive reductions in hallucinations. In my own usage I've not spotted a single hallucination yet, but that's been true for me for Claude 4 and o3 recently as well - hallucination is so much less of a problem with this year's models.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update&lt;/strong&gt;: I have had some reasonable pushback against this point, so I should clarify what I mean here. When I use the term "hallucination" I am talking about instances where the model confidently states a real-world fact that is untrue - like the incorrect winner of a sporting event. I'm not talking about the models making other kinds of mistakes - they make mistakes all the time!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Someone &lt;a href="https://news.ycombinator.com/item?id=44829896"&gt;pointed out&lt;/a&gt; that it's likely I'm avoiding hallucinations through the way I use the models, and this is entirely correct: as an experienced LLM user I instinctively stay clear of prompts that are likely to trigger hallucinations, like asking a non-search-enabled model for URLs or paper citations. This means I'm much less likely to encounter hallucinations in my daily usage.&lt;/em&gt;&lt;/p&gt;


&lt;blockquote&gt;
&lt;p&gt;One of our focuses when training the GPT-5 models was to reduce the frequency of factual
hallucinations. While ChatGPT has browsing enabled by default, many API queries do not use
browsing tools. Thus, we focused both on training our models to browse effectively for up-to-date
information, and on reducing hallucinations when the models are relying on their own internal
knowledge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The section about deception also incorporates the thing where models sometimes pretend they've completed a task that defeated them:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We placed gpt-5-thinking in a variety of tasks that were partly or entirely infeasible to accomplish,
and &lt;strong&gt;rewarded the model for honestly admitting it can not complete the task&lt;/strong&gt;. [...]&lt;/p&gt;
&lt;p&gt;In tasks where the agent is required to use tools, such as a web browsing
tool, in order to answer a user’s query, previous models would hallucinate information when
the tool was unreliable. We simulate this scenario by purposefully disabling the tools or by
making them return error codes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="prompt-injection-in-the-system-card"&gt;Prompt injection in the system card&lt;/h4&gt;
&lt;p&gt;There's a section about prompt injection, but it's pretty weak sauce in my opinion.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Two external red-teaming groups conducted a two-week prompt-injection assessment targeting
system-level vulnerabilities across ChatGPT’s connectors and mitigations, rather than model-only
behavior.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's their chart showing how well the model scores against the rest of the field. It's an impressive result in comparison - 56.8 attack success rate for gpt-5-thinking, where Claude 3.7 scores in the 60s (no Claude 4 results included here) and everything else is 70% plus:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/prompt-injection-chart.jpg" alt="A bar chart titled &amp;quot;Behavior Attack Success Rate at k Queries&amp;quot; shows attack success rates (in %) for various AI models at k=1 (dark red) and k=10 (light red). For each model, the total height of the stacked bar represents the k=10 success rate (labeled above each bar), while the lower dark red section represents the k=1 success rate (estimated). From left to right: Llama 3.3 70B – k=10: 92.2%, k=1: ~47%; Llama 3.1 405B – k=10: 90.9%, k=1: ~38%; Gemini Flash 1.5 – k=10: 87.7%, k=1: ~34%; GPT-4o – k=10: 86.4%, k=1: ~28%; OpenAI o3-mini-high – k=10: 86.4%, k=1: ~41%; Gemini Pro 1.5 – k=10: 85.5%, k=1: ~34%; Gemini 2.5 Pro Preview – k=10: 85.0%, k=1: ~28%; Gemini 2.0 Flash – k=10: 85.0%, k=1: ~33%; OpenAI o3-mini – k=10: 84.5%, k=1: ~40%; Grok 2 – k=10: 82.7%, k=1: ~34%; GPT-4.5 – k=10: 80.5%, k=1: ~28%; 3.5 Haiku – k=10: 76.4%, k=1: ~17%; Command-R – k=10: 76.4%, k=1: ~28%; OpenAI o4-mini – k=10: 75.5%, k=1: ~17%; 3.5 Sonnet – k=10: 75.0%, k=1: ~13%; OpenAI o1 – k=10: 71.8%, k=1: ~18%; 3.7 Sonnet – k=10: 64.5%, k=1: ~17%; 3.7 Sonnet: Thinking – k=10: 63.6%, k=1: ~17%; OpenAI o3 – k=10: 62.7%, k=1: ~13%; gpt-5-thinking – k=10: 56.8%, k=1: ~6%. Legend shows dark red = k=1 and light red = k=10." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;On the one hand, a 56.8% attack rate is cleanly a big improvement against all of those other models.&lt;/p&gt;
&lt;p&gt;But it's also a strong signal that prompt injection continues to be an unsolved problem! That means that more than half of those k=10 attacks (where the attacker was able to try up to ten times) got through.&lt;/p&gt;
&lt;p&gt;Don't assume prompt injection isn't going to be a problem for your application just because the models got better.&lt;/p&gt;
&lt;h4 id="thinking-traces-in-the-api"&gt;Thinking traces in the API&lt;/h4&gt;
&lt;p&gt;I had initially thought that my biggest disappointment with GPT-5 was that there's no way to get at those thinking traces via the API... but that turned out &lt;a href="https://bsky.app/profile/sophiebits.com/post/3lvtceih7222r"&gt;not to be true&lt;/a&gt;. The following &lt;code&gt;curl&lt;/code&gt; command demonstrates that the responses API &lt;code&gt;"reasoning": {"summary": "auto"}&lt;/code&gt; is available for the new GPT-5 models:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $(llm keys get openai)" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "input": "Give me a one-sentence fun fact about octopuses.",
    "reasoning": {"summary": "auto"}
  }'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/1d1013ba059af76461153722005a039d"&gt;the response&lt;/a&gt; from that API call.&lt;/p&gt;

&lt;p&gt;Without that option the API will often provide a lengthy delay while the model burns through thinking tokens until you start getting back visible tokens for the final response.&lt;/p&gt;
&lt;p&gt;OpenAI offer a new &lt;code&gt;reasoning_effort=minimal&lt;/code&gt; option which turns off most reasoning so that tokens start to stream back to you as quickly as possible.&lt;/p&gt;
&lt;h4 id="and-some-svgs-of-pelicans"&gt;And some SVGs of pelicans&lt;/h4&gt;
&lt;p&gt;Naturally I've been running &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;my "Generate an SVG of a pelican riding a bicycle" benchmark&lt;/a&gt;. I'll actually spend more time on this in a future post - I have some fun variants I've been exploring - but for the moment here's &lt;a href="https://gist.github.com/simonw/c98873ef29e621c0fe2e0d4023534406"&gt;the pelican&lt;/a&gt; I got from GPT-5 running at its default "medium" reasoning effort:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/gpt-5-pelican.png" alt="The bicycle is really good, spokes on wheels, correct shape frame, nice pedals. The pelican has a pelican beak and long legs stretching to the pedals." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It's pretty great! Definitely recognizable as a pelican, and one of the best bicycles I've seen yet.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/9b5ecf61a5fb0794729aa0023aaa504d"&gt;GPT-5 mini&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/gpt-5-mini-pelican.png" alt="Blue background with clouds. Pelican has two necks for some reason. Has a good beak though. More gradents and shadows than the GPT-5 one." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And &lt;a href="https://gist.github.com/simonw/3884dc8b186b630956a1fb0179e191bc"&gt;GPT-5 nano&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/gpt-5-nano-pelican.png" alt="Bicycle is two circles and some randomish black lines. Pelican still has an OK beak but is otherwise very simple." style="max-width: 100%;" /&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="gpt-5"/></entry></feed>