<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: jules</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/jules.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-11-06T15:53:23+00:00</updated><author><name>Simon Willison</name></author><entry><title>Code research projects with async coding agents like Claude Code and Codex</title><link href="https://simonwillison.net/2025/Nov/6/async-code-research/#atom-tag" rel="alternate"/><published>2025-11-06T15:53:23+00:00</published><updated>2025-11-06T15:53:23+00:00</updated><id>https://simonwillison.net/2025/Nov/6/async-code-research/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been experimenting with a pattern for LLM usage recently that's working out really well: &lt;strong&gt;asynchronous code research tasks&lt;/strong&gt;. Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#code-research"&gt;Code research&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#coding-agents"&gt;Coding agents&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#asynchronous-coding-agents"&gt;Asynchronous coding agents&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#give-them-a-dedicated-github-repository"&gt;Give them a dedicated GitHub repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#let-them-rip-with-unlimited-network-access"&gt;Let them rip with unlimited network access&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#my-simonw-research-collection"&gt;My simonw/research collection&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#this-is-total-slop-of-course"&gt;This is total slop, of course&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#try-it-yourself"&gt;Try it yourself&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="code-research"&gt;Code research&lt;/h4&gt;
&lt;p&gt;Software development benefits enormously from something I call &lt;strong&gt;code research&lt;/strong&gt;. The great thing about questions about code is that they can often be definitively answered by writing and executing code.&lt;/p&gt;
&lt;p&gt;I often see questions on forums which hint at a lack of understanding of this skill.&lt;/p&gt;
&lt;p&gt;"Could Redis work for powering the notifications feed for my app?" is a great example. The answer is &lt;em&gt;always&lt;/em&gt; "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work.&lt;/p&gt;
&lt;p&gt;I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible.&lt;/p&gt;
&lt;h4 id="coding-agents"&gt;Coding agents&lt;/h4&gt;
&lt;p&gt;It turns out &lt;strong&gt;coding agents&lt;/strong&gt; like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision.&lt;/p&gt;
&lt;p&gt;LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work.&lt;/p&gt;
&lt;p&gt;They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something &lt;em&gt;is&lt;/em&gt; possible in just a few minutes of crunching.&lt;/p&gt;
&lt;h4 id="asynchronous-coding-agents"&gt;Asynchronous coding agents&lt;/h4&gt;
&lt;p&gt;I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their &lt;strong&gt;asynchronous coding agent&lt;/strong&gt; family members instead.&lt;/p&gt;
&lt;p&gt;An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository.&lt;/p&gt;
&lt;p&gt;OpenAI's &lt;a href="https://chatgpt.com/codex"&gt;Codex Cloud&lt;/a&gt;, Anthropic's &lt;a href="https://claude.ai/code"&gt;Claude Code for web&lt;/a&gt;, Google Gemini's &lt;a href="https://jules.google/"&gt;Jules&lt;/a&gt;, and GitHub's &lt;a href="https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent?utm_source=chatgpt.com"&gt;Copilot coding agent&lt;/a&gt; are four prominent examples of this pattern.&lt;/p&gt;
&lt;p&gt;These are &lt;em&gt;fantastic&lt;/em&gt; tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with.&lt;/p&gt;
&lt;p&gt;I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results.&lt;/p&gt;
&lt;h4 id="give-them-a-dedicated-github-repository"&gt;Give them a dedicated GitHub repository&lt;/h4&gt;
&lt;p&gt;You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in.&lt;/p&gt;
&lt;p&gt;This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do.&lt;/p&gt;
&lt;p&gt;I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world.&lt;/p&gt;
&lt;h4 id="let-them-rip-with-unlimited-network-access"&gt;Let them rip with unlimited network access&lt;/h4&gt;
&lt;p&gt;The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do.&lt;/p&gt;
&lt;p&gt;Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/"&gt;lethal trifecta&lt;/a&gt; variety could easily be used to steal sensitive code or environment variables.&lt;/p&gt;
&lt;p&gt;If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer.&lt;/p&gt;
&lt;h4 id="my-simonw-research-collection"&gt;My simonw/research collection&lt;/h4&gt;
&lt;p&gt;Let's dive into some examples. My public research repository is at &lt;a href="https://github.com/simonw/research"&gt;simonw/research&lt;/a&gt; on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day!&lt;/p&gt;
&lt;p&gt;It also includes &lt;a href="https://github.com/simonw/research/blob/main/.github/workflows/update-readme.yml"&gt;a GitHub Workflow&lt;/a&gt; which uses &lt;a href="https://docs.github.com/en/github-models"&gt;GitHub Models&lt;/a&gt; to automatically update &lt;a href="https://github.com/simonw/research/blob/main/README.md"&gt;the README&lt;/a&gt; file with a summary of every new project, using &lt;a href="https://cog.readthedocs.io/"&gt;Cog&lt;/a&gt;, &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt;, &lt;a href="https://github.com/tonybaloney/llm-github-models"&gt;llm-github-models&lt;/a&gt; and &lt;a href="https://github.com/simonw/research/blob/b059108dfefeb05a48e1c27f7a127dc9fd648129/README.md#L9-L116"&gt;this snippet of Python&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here are a some example research projects from the repo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/node-pyodide"&gt;node-pyodide&lt;/a&gt;&lt;/strong&gt; shows an example of a &lt;a href="https://github.com/simonw/research/blob/main/node-pyodide/server-simple.js"&gt;Node.js script&lt;/a&gt; that runs the &lt;a href="https://pyodide.org/"&gt;Pyodide&lt;/a&gt; WebAssembly distribution of Python inside it - yet another of my &lt;a href="https://simonwillison.net/tags/sandboxing+python/"&gt;ongoing attempts&lt;/a&gt; to find a great way of running Python in a WebAssembly sandbox on a server.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/python-markdown-comparison"&gt;python-markdown-comparison&lt;/a&gt;&lt;/strong&gt; (&lt;a href="https://gistpreview.github.io/?fb07c2a3fd2d4cfb814a46696a58a00e"&gt;transcript&lt;/a&gt;) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across &lt;a href="https://pypi.org/project/cmarkgfm/"&gt;cmarkgfm&lt;/a&gt;, a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! &lt;code&gt;cmarkgfm&lt;/code&gt; came out on top by a significant margin:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/markdown-performance.png" alt="Bar chart titled &amp;quot;Relative Performance vs cmarkgfm (Large Document)&amp;quot; comparing relative speed of markdown libraries, with marko at 52.1x, markdown2 at 16.9x, mistletoe at 14.1x, markdown at 12.9x, commonmark at 12.1x, mistune at 10.0x, and cmarkgfm at 1.0x baseline marked by a red dashed line; x-axis labeled &amp;quot;Relative Speed (lower is better)&amp;quot; ranging from 0 to 50+" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the entire prompt I used for that project:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note that I didn't specify any Markdown libraries other than &lt;code&gt;cmarkgfm&lt;/code&gt; - Claude Code ran a search and found the other six by itself.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/cmarkgfm-in-pyodide"&gt;cmarkgfm-in-pyodide&lt;/a&gt;&lt;/strong&gt; is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get &lt;code&gt;cmarkgfm&lt;/code&gt; - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB &lt;code&gt;cmarkgfm_pyodide-2025.10.22-cp312-cp312-emscripten_3_1_46_wasm32.whl&lt;/code&gt; file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js.&lt;/p&gt;
&lt;p&gt;I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figure out how to get the cmarkgfm markdown lover &lt;em&gt;[typo in prompt, this should have been "library" but it figured it out anyway]&lt;/em&gt; for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory&lt;/p&gt;
&lt;p&gt;There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This one gave up half way through, complaining that emscripten would take too long. I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... and it did. You can &lt;a href="https://gistpreview.github.io/?6d778a8f9c4c2c005a189ff308c3bc47"&gt;see the full transcript here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/blog-tags-scikit-learn"&gt;blog-tags-scikit-learn&lt;/a&gt;&lt;/strong&gt;. Taking a short break from WebAssembly, I thought it would be fun to put &lt;a href="https://scikit-learn.org/stable/"&gt;scikit-learn&lt;/a&gt; through its paces on a text classification task against my blog:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Work in a new folder called blog-tags-scikit-learn&lt;/p&gt;
&lt;p&gt;Download &lt;code&gt;https://datasette.simonwillison.net/simonwillisonblog.db&lt;/code&gt; - a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later ones&lt;/p&gt;
&lt;p&gt;Use Python scikit learn and try several different strategies&lt;/p&gt;
&lt;p&gt;Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description&lt;/p&gt;
&lt;p&gt;Also include an HTML page with a nice visualization of the results that works by loading those JSON files.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This resulted in seven &lt;code&gt;.py&lt;/code&gt; files, four &lt;code&gt;.json&lt;/code&gt; results files and a detailed &lt;a href="https://github.com/simonw/research/blob/main/blog-tags-scikit-learn/README.md"&gt;report&lt;/a&gt;. (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone!&lt;/p&gt;
&lt;p&gt;That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded.&lt;/p&gt;
&lt;p&gt;More recently I added a short &lt;code&gt;AGENTS.md&lt;/code&gt; file to the repo with a few extra tips for my research agents. You can &lt;a href="https://github.com/simonw/research/blob/b059108dfefeb05a48e1c27f7a127dc9fd648129/AGENTS.md"&gt;read that here&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="this-is-total-slop-of-course"&gt;This is total slop, of course&lt;/h4&gt;
&lt;p&gt;My preferred definition of &lt;a href="https://simonwillison.net/2024/May/8/slop/"&gt;AI slop&lt;/a&gt; is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification.&lt;/p&gt;
&lt;p&gt;I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public &lt;code&gt;simonw/research&lt;/code&gt; repository.&lt;/p&gt;
&lt;p&gt;A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with &lt;code&gt;&amp;lt;meta name="robots" content="noindex"&amp;gt;&lt;/code&gt; tags. I still like to keep AI-generated content out of search, to avoid contributing more to the &lt;a href="https://en.wikipedia.org/wiki/Dead_Internet_theory"&gt;dead internet&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="try-it-yourself"&gt;Try it yourself&lt;/h4&gt;
&lt;p&gt;It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.&lt;/p&gt;
&lt;p&gt;You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.&lt;/p&gt;
&lt;p&gt;Claude Code for web offers &lt;a href="https://support.claude.com/en/articles/12690958-claude-code-promotion"&gt;a free $250 of credits&lt;/a&gt; for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has &lt;a href="https://jules.google/docs/usage-limits/"&gt;a free tier&lt;/a&gt;. There are plenty of other coding agents you can try out as well.&lt;/p&gt;
&lt;p&gt;Let me know if your research agents come back with anything interesting!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="slop"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="jules"/><category term="codex-cli"/></entry><entry><title>Embracing the parallel coding agent lifestyle</title><link href="https://simonwillison.net/2025/Oct/5/parallel-coding-agents/#atom-tag" rel="alternate"/><published>2025-10-05T12:06:55+00:00</published><updated>2025-10-05T12:06:55+00:00</updated><id>https://simonwillison.net/2025/Oct/5/parallel-coding-agents/#atom-tag</id><summary type="html">
    &lt;p&gt;For a while now I've been hearing from engineers who run multiple coding agents at once - firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or &lt;a href="https://docs.claude.com/en/docs/claude-code/common-workflows#run-parallel-claude-code-sessions-with-git-worktrees"&gt;git worktrees&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I was pretty skeptical about this at first. AI-generated code needs to be reviewed, which means the natural bottleneck on all of this is how fast I can review the results. It's tough keeping up with just a single LLM given how fast they can churn things out, where's the benefit from running more than one at a time if it just leaves me further behind?&lt;/p&gt;
&lt;p&gt;Despite my misgivings, over the past few weeks I've noticed myself quietly starting to embrace the parallel coding agent lifestyle.&lt;/p&gt;
&lt;p&gt;I can only focus on reviewing and landing one significant change at a time, but I'm finding an increasing number of tasks that can still be fired off in parallel without adding too much cognitive overhead to my primary work.&lt;/p&gt;
&lt;p&gt;Here are some patterns I've found for applying parallel agents effectively.&lt;/p&gt;
&lt;h4 id="research-poc"&gt;Research for proof of concepts&lt;/h4&gt;
&lt;p&gt;The first category of tasks I've been applying this pattern to is &lt;strong&gt;research&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Research tasks answer questions or provide recommendations without making modifications to a project that you plan to keep.&lt;/p&gt;
&lt;p&gt;A lot of software projects start with a proof of concept. Can &lt;a href="https://yjs.dev"&gt;Yjs&lt;/a&gt; be used to implement a simple collaborative note writing tool with a Python backend? The &lt;a href="https://github.com/y-crdt/pycrdt"&gt;libraries exist&lt;/a&gt;, but do they work when you wire them together?&lt;/p&gt;
&lt;p&gt;Today's coding agents can build a proof of concept with new libraries and resolve those kinds of basic questions. Libraries too new to be in the training data? Doesn't matter: tell them to checkout the repos for those new dependencies and read the code to figure out how to use them.&lt;/p&gt;
&lt;h4 id="how-does-that-work-again"&gt;How does that work again?&lt;/h4&gt;
&lt;p&gt;If you need a reminder about how a portion of your existing system works, modern "reasoning" LLMs can provide a detailed, actionable answer in just a minute or two.&lt;/p&gt;
&lt;p&gt;It doesn't matter how large your codebase is: coding agents are extremely effective with tools like grep and can follow codepaths through dozens of different files if they need to.&lt;/p&gt;
&lt;p&gt;Ask them to make notes on where your signed cookies are set and read, or how your application uses subprocesses and threads, or which aspects of your JSON API aren't yet covered by your documentation.&lt;/p&gt;
&lt;p&gt;These LLM-generated explanations are worth stashing away somewhere, because they can make excellent context to paste into further prompts in the future.&lt;/p&gt;
&lt;h4 id="small-maintenance-tasks"&gt;Small maintenance tasks&lt;/h4&gt;
&lt;p&gt;Now we're moving on to code edits that we intend to keep, albeit with &lt;em&gt;very&lt;/em&gt; low-stakes. It turns out there are a lot of problems that really just require a little bit of extra cognitive overhead which can be outsourced to a bot.&lt;/p&gt;
&lt;p&gt;Warnings are a great example. Is your test suite spitting out a warning that something you are using is deprecated? Chuck that at a bot - tell it to run the test suite and figure out how to fix the warning. No need to take a break from what you're doing to resolve minor irritations like that.&lt;/p&gt;
&lt;p&gt;There is a definite knack to spotting opportunities like this. As always, the best way to develop that instinct is to try things - any small maintenance task is something that's worth trying with a coding agent. You can learn from both their successes &lt;em&gt;and&lt;/em&gt; their failures.&lt;/p&gt;
&lt;h4 id="carefully-specified-and-directed-actual-work"&gt;Carefully specified and directed actual work&lt;/h4&gt;
&lt;p&gt;Reviewing code that lands on your desk out of nowhere is a &lt;em&gt;lot&lt;/em&gt; of work. First you have to derive the goals of the new implementation: what's it trying to achieve? Is this something the project needs? Is the approach taken the best for this current project, given other future planned changes? A lot of big questions before you can even start digging into the details of the code.&lt;/p&gt;
&lt;p&gt;Code that started from your own specification is a lot less effort to review. If you already decided what to solve, picked the approach and worked out a detailed specification for the work itself, confirming it was built to your needs can take a lot less time.&lt;/p&gt;
&lt;p&gt;I described my &lt;a href="https://simonwillison.net/2025/Mar/11/using-llms-for-code/#tell-them-exactly-what-to-do"&gt;more authoritarian approach&lt;/a&gt; to prompting models for code back in March. If I tell them &lt;em&gt;exactly&lt;/em&gt; how to build something the work needed to review the resulting changes is a whole lot less taxing.&lt;/p&gt;
&lt;h4 id="how-i-m-using-these-tools-today"&gt;How I'm using these tools today&lt;/h4&gt;
&lt;p&gt;My daily drivers are currently &lt;a href="https://www.claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt; (on Sonnet 4.5), &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; (on GPT-5-Codex), and &lt;a href="https://chatgpt.com/codex"&gt;Codex Cloud&lt;/a&gt; (for asynchronous tasks, frequently launched from my phone.)&lt;/p&gt;
&lt;p&gt;I'm also dabbling with &lt;a href="https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent"&gt;GitHub Copilot Coding Agent&lt;/a&gt; (the agent baked into the &lt;a href="https://github.com"&gt;GitHub.com&lt;/a&gt; web interface in various places) and &lt;a href="https://jules.google"&gt;Google Jules&lt;/a&gt;, Google's currently-free alternative to Codex Cloud.&lt;/p&gt;
&lt;p&gt;I'm still settling into patterns that work for me. I imagine I'll be iterating on my processes for a long time to come, especially as the landscape of coding agents continues to evolve.&lt;/p&gt;
&lt;p&gt;I frequently have multiple terminal windows open running different coding agents in different directories. These are currently a mixture of Claude Code and Codex CLI, running in &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#the-joy-of-yolo-mode"&gt;YOLO mode&lt;/a&gt; (no approvals) for tasks where I'm confident malicious instructions can't sneak into the context.&lt;/p&gt;
&lt;p&gt;(I need to start habitually running my local agents in Docker containers to further limit the blast radius if something goes wrong.)&lt;/p&gt;
&lt;p&gt;I haven't adopted git worktrees yet: if I want to run two agents in isolation against the same repo I do a fresh checkout, often into &lt;code&gt;/tmp&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For riskier tasks I'm currently using asynchronous coding agents - usually Codex Cloud - so if anything goes wrong the worst that can happen is my source code getting leaked (since &lt;a href="https://simonwillison.net/2025/Jun/3/codex-agent-internet-access/"&gt;I allow it to have network access&lt;/a&gt; while running). Most of what I work on is open source anyway so that's not a big concern for me.&lt;/p&gt;
&lt;p&gt;I occasionally use &lt;a href="https://github.com/features/codespaces"&gt;GitHub Codespaces&lt;/a&gt; to run VS Code's agent mode, which is surprisingly effective and runs directly in my browser. This is particularly great for workshops and demos since it works for anyone with GitHub account, no extra API key necessary.&lt;/p&gt;
&lt;h4 id="please-share-your-patterns-that-work"&gt;Please share your patterns that work&lt;/h4&gt;
&lt;p&gt;This category of coding agent software is still really new, and the models have only really got good enough to drive them effectively in the past few months - Claude 4 and GPT-5 in particular.&lt;/p&gt;
&lt;p&gt;I plan to write more as I figure out the ways of using them that are most effective. I encourage other practitioners to do the same!&lt;/p&gt;
&lt;h4 id="recommended-reading"&gt;Recommended reading&lt;/h4&gt;
&lt;p&gt;Jesse Vincent wrote &lt;a href="https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-in-september-2025/"&gt;How I'm using coding agents in September, 2025&lt;/a&gt; which describes his workflow for parallel agents in detail, including having an architect agent iterate on a plan which is then reviewed and implemented by fresh instances of Claude Code.&lt;/p&gt;
&lt;p&gt;In &lt;a href="https://sketch.dev/blog/seven-prompting-habits"&gt;The 7 Prompting Habits of Highly Effective Engineers&lt;/a&gt; Josh Bleecher Snyder describes several patterns for this kind of work. I particularly like this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Send out a scout&lt;/strong&gt;. Hand the AI agent a task just to find out where the sticky bits are, so you don’t have to make those mistakes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've tried this a few times with good results: give the agent a genuinely difficult task against a large codebase, with no intention of actually landing its code, just to get ideas from which files it modifies and how it approaches the problem.&lt;/p&gt;
&lt;p&gt;Peter Steinberger's &lt;a href="https://steipete.me/posts/just-talk-to-it"&gt;Just Talk To It - the no-bs Way of Agentic Engineering&lt;/a&gt; provides a very detailed description of his current process built around Codex CLI.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jesse-vincent"&gt;jesse-vincent&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/peter-steinberger"&gt;peter-steinberger&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="async-coding-agents"/><category term="jules"/><category term="codex-cli"/><category term="parallel-agents"/><category term="jesse-vincent"/><category term="peter-steinberger"/><category term="agentic-engineering"/></entry><entry><title>aavetis/PRarena</title><link href="https://simonwillison.net/2025/Oct/1/prarena/#atom-tag" rel="alternate"/><published>2025-10-01T23:59:40+00:00</published><updated>2025-10-01T23:59:40+00:00</updated><id>https://simonwillison.net/2025/Oct/1/prarena/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/aavetis/PRarena"&gt;aavetis/PRarena&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Albert Avetisian runs this repository on GitHub which uses the Github Search API to track the number of PRs that can be credited to a collection of different coding agents. The repo runs &lt;a href="https://github.com/aavetis/PRarena/blob/main/collect_data.py"&gt;this collect_data.py script&lt;/a&gt; every three hours &lt;a href="https://github.com/aavetis/PRarena/blob/main/.github/workflows/pr%E2%80%91stats.yml"&gt;using GitHub Actions&lt;/a&gt; to collect the data, then updates the &lt;a href="https://prarena.ai/"&gt;PR Arena site&lt;/a&gt; with a visual leaderboard.&lt;/p&gt;
&lt;p&gt;The result is this neat chart showing adoption of different agents over time, along with their PR success rate:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Line and bar chart showing PR metrics over time from 05/26 to 10/01. The left y-axis shows &amp;quot;Number of PRs&amp;quot; from 0 to 1,800,000, the right y-axis shows &amp;quot;Success Rate (%)&amp;quot; from 0% to 100%, and the x-axis shows &amp;quot;Time&amp;quot; with dates. Five line plots track success percentages: &amp;quot;Copilot Success % (Ready)&amp;quot; and &amp;quot;Copilot Success % (All)&amp;quot; (both blue, top lines around 90-95%), &amp;quot;Codex Success % (Ready)&amp;quot; and &amp;quot;Codex Success % (All)&amp;quot; (both brown/orange, middle lines declining from 80% to 60%), and &amp;quot;Cursor Success % (Ready)&amp;quot; and &amp;quot;Cursor Success % (All)&amp;quot; (both purple, middle lines around 75-85%), &amp;quot;Devin Success % (Ready)&amp;quot; and &amp;quot;Devin Success % (All)&amp;quot; (both teal/green, lower lines around 65%), and &amp;quot;Codegen Success % (Ready)&amp;quot; and &amp;quot;Codegen Success % (All)&amp;quot; (both brown, declining lines). Stacked bar charts show total and merged PRs for each tool: light blue and dark blue for Copilot, light red and dark red for Codex, light purple and dark purple for Cursor, light green and dark green for Devin, and light orange for Codegen. The bars show increasing volumes over time, with the largest bars appearing at 10/01 reaching approximately 1,700,000 total PRs." src="https://static.simonwillison.net/static/2025/ai-agents-chart.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I found this today while trying to pull off the exact same trick myself! I got as far as creating the following table before finding Albert's work and abandoning my own project.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Search term&lt;/th&gt;
&lt;th&gt;Total PRs&lt;/th&gt;
&lt;th&gt;Merged PRs&lt;/th&gt;
&lt;th&gt;% merged&lt;/th&gt;
&lt;th&gt;Earliest&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;is:pr in:body "Generated with Claude Code"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+in%3Abody+%22Generated+with+Claude+Code%22&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;146,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+in%3Abody+%22Generated+with+Claude+Code%22+is%3Amerged&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;123,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;84.2%&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/turlockmike/hataraku/pull/83"&gt;Feb 21st&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/features/copilot"&gt;GitHub Copilot&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;is:pr author:copilot-swe-agent[bot]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+author%3Acopilot-swe-agent%5Bbot%5D&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;247,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+author%3Acopilot-swe-agent%5Bbot%5D+is%3Amerged&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;152,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;61.5%&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/abbhardwa/Relational-Database-Query-Parser/pull/2"&gt;March 7th&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://developers.openai.com/codex/cloud/"&gt;Codex Cloud&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;is:pr in:body "chatgpt.com" label:codex&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+in%3Abody+%22chatgpt.com%22+label%3Acodex&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;1,900,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+in%3Abody+%22chatgpt.com%22+label%3Acodex+is%3Amerged&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;1,600,000&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;84.2%&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/adrianadiwidjaja/my-flask-app/pull/1"&gt;April 23rd&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://jules.google/"&gt;Google Jules&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;is:pr author:google-labs-jules[bot]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+author%3Agoogle-labs-jules%5Bbot%5D&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;35,400&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=is%3Apr+author%3Agoogle-labs-jules%5Bbot%5D+is%3Amerged&amp;amp;type=pullrequests&amp;amp;s=created&amp;amp;o=asc"&gt;27,800&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;78.5%&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/yukikurage/memento-proto/pull/2"&gt;May 22nd&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;(Those "earliest" links are a little questionable, I tried to filter out false positives and find the oldest one that appeared to really be from the agent in question.)&lt;/p&gt;
&lt;p&gt;It looks like OpenAI's Codex Cloud is &lt;em&gt;massively&lt;/em&gt; ahead of the competition right now in terms of numbers of PRs both opened and merged on GitHub.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: To clarify, these numbers are for the category of &lt;strong&gt;autonomous coding agents&lt;/strong&gt; - those systems where you assign a cloud-based agent a task or issue and the output is a PR against your repository. They do not (and cannot) capture the popularity of many forms of AI tooling that don't result in an easily identifiable pull request.&lt;/p&gt;
&lt;p&gt;Claude Code for example will be dramatically under-counted here because its version of an autonomous coding agent comes in the form of a somewhat obscure GitHub Actions workflow &lt;a href="https://docs.claude.com/en/docs/claude-code/github-actions"&gt;buried in the documentation&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git-scraping"&gt;git-scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="ai"/><category term="git-scraping"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="anthropic"/><category term="coding-agents"/><category term="claude-code"/><category term="async-coding-agents"/><category term="jules"/></entry><entry><title>Jules, our asynchronous coding agent, is now available for everyone</title><link href="https://simonwillison.net/2025/Aug/6/asynchronous-coding-agents/#atom-tag" rel="alternate"/><published>2025-08-06T19:36:24+00:00</published><updated>2025-08-06T19:36:24+00:00</updated><id>https://simonwillison.net/2025/Aug/6/asynchronous-coding-agents/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.google/technology/google-labs/jules-now-available/"&gt;Jules, our asynchronous coding agent, is now available for everyone&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I wrote about the Jules beta &lt;a href="https://simonwillison.net/2025/May/19/jules/"&gt;back in May&lt;/a&gt;. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today.&lt;/p&gt;
&lt;p&gt;I'm mainly linking to this now because I like the new term they are using in this blog entry: &lt;strong&gt;Asynchronous coding agent&lt;/strong&gt;. I like it so much I &lt;a href="https://simonwillison.net/tags/asynchronous-coding-agents/"&gt;gave it a tag&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I continue to avoid the term "agent" as infuriatingly vague, but I can grudgingly accept it when accompanied by a prefix that clarifies the type of agent we are talking about. "Asynchronous coding agent" feels just about obvious enough to me to be useful.&lt;/p&gt;
&lt;p&gt;... I just ran a Google search for &lt;code&gt;"asynchronous coding agent" -jules&lt;/code&gt; and came up with a few more notable examples of this name being used elsewhere:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/introducing-open-swe-an-open-source-asynchronous-coding-agent/"&gt;Introducing Open SWE: An Open-Source Asynchronous Coding Agent&lt;/a&gt; is an announcement from LangChain just this morning of their take on this pattern. They provide a hosted version (bring your own API keys) or you can run it yourself with &lt;a href="https://github.com/langchain-ai/open-swe"&gt;their MIT licensed code&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The press release for GitHub's own version of this &lt;a href="https://github.com/newsroom/press-releases/coding-agent-for-github-copilot"&gt;GitHub Introduces Coding Agent For GitHub Copilot&lt;/a&gt; states that "GitHub Copilot now includes an asynchronous coding agent".&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44813854"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agent-definitions"&gt;agent-definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="github"/><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="gemini"/><category term="agent-definitions"/><category term="async-coding-agents"/><category term="jules"/></entry><entry><title>PR #537: Fix Markdown in og descriptions</title><link href="https://simonwillison.net/2025/Jun/3/openai-codex-pr/#atom-tag" rel="alternate"/><published>2025-06-03T23:58:34+00:00</published><updated>2025-06-03T23:58:34+00:00</updated><id>https://simonwillison.net/2025/Jun/3/openai-codex-pr/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/simonwillisonblog/pull/537"&gt;PR #537: Fix Markdown in og descriptions&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Since &lt;a href="https://openai.com/index/introducing-codex/"&gt;OpenAI Codex&lt;/a&gt; is now available to us ChatGPT Plus subscribers I decided to try it out against my blog.&lt;/p&gt;
&lt;p&gt;It's a very nice implementation of the GitHub-connected coding "agent" pattern, as also seen in Google's &lt;a href="https://jules.google/"&gt;Jules&lt;/a&gt; and Microsoft's &lt;a href="https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/"&gt;Copilot Coding Agent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First I had to configure an environment for it. My Django blog uses PostgreSQL which isn't part of the &lt;a href="https://github.com/openai/codex-universal"&gt;default Codex container&lt;/a&gt;, so I had Claude Sonnet 4 &lt;a href="https://claude.ai/share/a5ce65c2-a9a4-4ae7-b645-71bd9fd6ea2c"&gt;help me&lt;/a&gt; come up with a startup recipe to get PostgreSQL working.&lt;/p&gt;
&lt;p&gt;I attached my &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;simonw/simonwillisonblog&lt;/a&gt; GitHub repo and used the following as the "setup script" for the environment:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Install PostgreSQL
apt-get update &amp;amp;&amp;amp; apt-get install -y postgresql postgresql-contrib

# Start PostgreSQL service
service postgresql start

# Create a test database and user
sudo -u postgres createdb simonwillisonblog
sudo -u postgres psql -c "CREATE USER testuser WITH PASSWORD 'testpass';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE simonwillisonblog TO testuser;"
sudo -u postgres psql -c "ALTER USER testuser CREATEDB;"

pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I left "Agent internet access" off for reasons &lt;a href="https://simonwillison.net/2025/Jun/3/codex-agent-internet-access/"&gt;described previously&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I prompted Codex with the following (after one previous experimental task to check that it could run my tests):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Notes and blogmarks can both use Markdown.&lt;/p&gt;
&lt;p&gt;They serve &lt;code&gt;meta property="og:description" content="&lt;/code&gt; tags on the page, but those tags include that raw Markdown which looks bad on social media previews.&lt;/p&gt;
&lt;p&gt;Fix it so they instead use just the text with markdown stripped - so probably render it to HTML and then strip the HTML tags.&lt;/p&gt;
&lt;p&gt;Include passing tests.&lt;/p&gt;
&lt;p&gt;Try to run the tests, the postgresql details are:&lt;/p&gt;
&lt;p&gt;database = simonwillisonblog
username = testuser
password = testpass&lt;/p&gt;
&lt;p&gt;Put those in the DATABASE_URL environment variable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I left it to churn away for a few minutes (4m12s, to be precise) and &lt;a href="https://chatgpt.com/s/cd_683f8b81657881919a8d1ce71978a2df"&gt;it came back&lt;/a&gt; with a fix that edited two templates and added one more (passing) test. Here's &lt;a href="https://github.com/simonw/simonwillisonblog/pull/537/files"&gt;that change in full&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And sure enough, the social media cards for my posts now look like this - no visible Markdown any more:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a web browser showing a blog post preview card on Bluesky. The URL in the address bar reads &amp;quot;https://simonwillison.net/2025/Jun/3/pr-537-fix-markdown-in-og-descriptions/&amp;quot;. The preview card shows the title &amp;quot;PR #537: Fix Markdown in og descriptions&amp;quot; and begins with the text &amp;quot;Since OpenAI Codex is now available to us ChatGPT Plus subscribers I decided to try it out against my blog. It's a very nice implementation of the GitHub-connected coding&amp;quot;. The domain &amp;quot;simonwillison.net&amp;quot; appears at the bottom of the card." src="https://static.simonwillison.net/static/2025/codex-fix.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="github"/><category term="postgresql"/><category term="testing"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="coding-agents"/><category term="async-coding-agents"/><category term="jules"/></entry><entry><title>Jules</title><link href="https://simonwillison.net/2025/May/19/jules/#atom-tag" rel="alternate"/><published>2025-05-19T21:40:11+00:00</published><updated>2025-05-19T21:40:11+00:00</updated><id>https://simonwillison.net/2025/May/19/jules/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jules.google.com/"&gt;Jules&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It seems like &lt;em&gt;everyone&lt;/em&gt; is rolling out AI coding assistants that attach to your GitHub account and submit PRs for you right now. We had &lt;a href="https://simonwillison.net/2025/May/16/openai-codex/"&gt;OpenAI Codex&lt;/a&gt; last week, today Microsoft announced &lt;a href="https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/"&gt;GitHub Copilot coding agent&lt;/a&gt; (confusingly not the same thing as &lt;a href="https://githubnext.com/projects/copilot-workspace"&gt;Copilot Workspace&lt;/a&gt;) and I found out just now that Google's Jules, &lt;a href="https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/"&gt;announced in December&lt;/a&gt;, is now in a beta preview.&lt;/p&gt;
&lt;p&gt;I'm flying home from PyCon but I managed to try out Jules from my phone. I took &lt;a href="https://github.com/datasette/datasette-chronicle/issues/3"&gt;this GitHub issue thread&lt;/a&gt;, converted it to copy-pasteable Markdown with &lt;a href="https://tools.simonwillison.net/github-issue-to-markdown"&gt;this tool&lt;/a&gt; and pasted it into Jules, with no further instructions.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/datasette/datasette-chronicle/pull/6"&gt;the resulting PR&lt;/a&gt; created from its branch. I haven't fully reviewed it yet and the tests aren't passing, so it's hard to evaluate from my phone how well it did. In a cursory first glance it looks like it's covered most of the requirements from the issue thread.&lt;/p&gt;
&lt;p&gt;My habit of &lt;a href="https://simonwillison.net/2022/Nov/26/productivity/#issue-thread"&gt;creating long issue threads&lt;/a&gt; where I talk to myself about the features I'm planning is proving to be a good fit for outsourcing implementation work to this new generation of coding assistants.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="gemini"/><category term="github-issues"/><category term="async-coding-agents"/><category term="jules"/></entry></feed>