Simon Willison's Weblog: python

datasette 1.0a27

2026-04-15T23:16:34+00:00

Two major changes in this new Datasette alpha. I covered the first of those in detail yesterday - Datasette no longer uses Django-style CSRF form tokens, instead using modern browser headers as described by Filippo Valsorda.

The second big change is that Datasette now fires a new RenameTableEvent any time a table is renamed during a SQLite transaction. This is useful because some plugins (like datasette-comments) attach additional data to table records by name, so a renamed table requires them to react in appropriate ways.

Here are the rest of the changes in the alpha:

New actor= parameter for datasette.client methods, allowing internal requests to be made as a specific actor. This is particularly useful for writing automated tests. (#2688)

New Database(is_temp_disk=True) option, used internally for the internal database. This helps resolve intermittent database locked errors caused by the internal database being in-memory as opposed to on-disk. (#2683) (#2684)

The /<database>/<table>/-/upsert API (docs) now rejects rows with null primary key values. (#1936)

Improved example in the API explorer for the /-/upsert endpoint (docs). (#1936)

The /<database>.json endpoint now includes an "ok": true key, for consistency with other JSON API responses.

call_with_supported_arguments() is now documented as a supported public API. (#2678)

Tags: annotated-release-notes, datasette, python

Gemma 4 audio with MLX

2026-04-12T23:57:53+00:00

Thanks to a tip from Rahim Nathwani, here's a uv run recipe for transcribing an audio file on macOS using the 10.28 GB Gemma 4 E2B model with MLX and mlx-vlm:

uv run --python 3.13 --with mlx_vlm --with torchvision --with gradio \
  mlx_vlm.generate \
  --model google/gemma-4-e2b-it \
  --audio file.wav \
  --prompt "Transcribe this audio" \
  --max-tokens 500 \
  --temperature 1.0

Your browser does not support the audio element.

I tried it on this 14 second .wav file and it output the following:

This front here is a quick voice memo. I want to try it out with MLX VLM. Just going to see if it can be transcribed by Gemma and how that works.

(That was supposed to be "This right here..." and "... how well that works" but I can hear why it misinterpreted that as "front" and "how that works".)

Tags: uv, mlx, ai, gemma, llms, speech-to-text, python, generative-ai

asgi-gzip 0.3

2026-04-09T03:54:40+00:00

Release: asgi-gzip 0.3

I ran into trouble deploying a new feature using SSE to a production Datasette instance, and it turned out that instance was using datasette-gzip which uses asgi-gzip which was incorrectly compressing event/text-stream responses.

asgi-gzip was extracted from Starlette, and has a GitHub Actions scheduled workflow to check Starlette for updates that need to be ported to the library... but that action had stopped running and hence had missed Starlette's own fix for this issue.

I ran the workflow and integrated the new fix, and now datasette-gzip and asgi-gzip both correctly handle text/event-stream in SSE responses.

Tags: gzip, asgi, python

llm-all-models-async 0.1

2026-03-31T20:52:02+00:00

Release: llm-all-models-async 0.1

LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin.

My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models.

So... I had Claude spin up this plugin that turns sync models into async models using a thread pool. This ended up needing an extra plugin hook mechanism in LLM itself, which I shipped just now in LLM 0.30.

Tags: llm, async, python

Python Vulnerability Lookup

2026-03-29T18:46:16+00:00

Tool: Python Vulnerability Lookup

I learned that the OSV.dev open source vulnerability database has an open CORS JSON API, so I had Claude Code build this HTML tool for pasting in a pyproject.toml or requirements.txt file (or name of a GitHub repo containing those) and seeing a list of all reported vulnerabilities from that API.

Tags: tools, python, supply-chain, vibe-coding, security

LiteLLM Hack: Were You One of the 47,000?

2026-03-25T17:21:04+00:00

LiteLLM Hack: Were You One of the 47,000?

Daniel Hnyk used the BigQuery PyPI dataset to determine how many downloads there were of the exploited LiteLLM packages during the 46 minute period they were live on PyPI. The answer was 46,996 across the two compromised release versions (1.82.7 and 1.82.8).

They also identified 2,337 packages that depended on LiteLLM - 88% of which did not pin versions in a way that would have avoided the exploited version.

Via @hnykda

Tags: supply-chain, security, pypi, python, packaging

Package Managers Need to Cool Down

2026-03-24T21:11:38+00:00

Package Managers Need to Cool Down

Today's LiteLLM supply chain attack inspired me to revisit the idea of dependency cooldowns, the practice of only installing updated dependencies once they've been out in the wild for a few days to give the community a chance to spot if they've been subverted in some way.

This recent piece (March 4th) piece by Andrew Nesbitt reviews the current state of dependency cooldown mechanisms across different packaging tools. It's surprisingly well supported! There's been a flurry of activity across major packaging tools, including:

pnpm 10.16 (September 2025) — minimumReleaseAge with minimumReleaseAgeExclude for trusted packages
Yarn 4.10.0 (September 2025) — npmMinimalAgeGate (in minutes) with npmPreapprovedPackages for exemptions
Bun 1.3 (October 2025) — minimumReleaseAge via bunfig.toml
Deno 2.6 (December 2025) — --minimum-dependency-age for deno update and deno outdated
uv 0.9.17 (December 2025) — added relative duration support to existing --exclude-newer, plus per-package overrides via exclude-newer-package
pip 26.0 (January 2026) — --uploaded-prior-to (absolute timestamps only; relative duration support requested)
npm 11.10.0 (February 2026) — min-release-age

pip currently only supports absolute rather than relative dates but Seth Larson has a workaround for that using a scheduled cron to update the absolute date in the pip.conf config file.

Tags: deno, uv, supply-chain, pip, pypi, npm, packaging, security, python, javascript

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

2026-03-24T15:07:31+00:00

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

The LiteLLM v1.82.8 package published to PyPI was compromised with a particularly nasty credential stealer hidden in base64 in a litellm_init.pth file, which means installing the package is enough to trigger it even without running import litellm.

(1.82.7 had the exploit as well but it was in the proxy/proxy_server.py file so the package had to be imported for it to take effect.)

This issue has a very detailed description of what the credential stealer does. There's more information about the timeline of the exploit over here.

PyPI has already quarantined the litellm package so the window for compromise was just a few hours, but if you DID install the package it would have hoovered up a bewildering array of secrets, including ~/.ssh/, ~/.gitconfig, ~/.git-credentials, ~/.aws/, ~/.kube/, ~/.config/, ~/.azure/, ~/.docker/, ~/.npmrc, ~/.vault-token, ~/.netrc, ~/.lftprc, ~/.msmtprc, ~/.my.cnf, ~/.pgpass, ~/.mongorc.js, ~/.bash_history, ~/.zsh_history, ~/.sh_history, ~/.mysql_history, ~/.psql_history, ~/.rediscli_history, ~/.bitcoin/, ~/.litecoin/, ~/.dogecoin/, ~/.zcash/, ~/.dashcore/, ~/.ripple/, ~/.bitmonero/, ~/.ethereum/, ~/.cardano/.

It looks like this supply chain attack started with the recent exploit against Trivy, ironically a security scanner tool that was used in CI by LiteLLM. The Trivy exploit likely resulted in stolen PyPI credentials which were then used to directly publish the vulnerable packages.

Tags: supply-chain, open-source, pypi, python, security

Experimenting with Starlette 1.0 with Claude skills

2026-03-22T23:57:44+00:00

Starlette 1.0 is out! This is a really big deal. I think Starlette may be the Python framework with the most usage compared to its relatively low brand recognition because Starlette is the foundation of FastAPI, which has attracted a huge amount of buzz that seems to have overshadowed Starlette itself.

Kim Christie started working on Starlette in 2018 and it quickly became my favorite out of the new breed of Python ASGI frameworks. The only reason I didn't use it as the basis for my own Datasette project was that it didn't yet promise stability, and I was determined to provide a stable API for Datasette's own plugins... albeit I still haven't been brave enough to ship my own 1.0 release (after 26 alphas and counting)!

Then in September 2025 Marcelo Trylesinski announced that Starlette and Uvicorn were transferring to their GitHub account, in recognition of their many years of contributions and to make it easier for them to receive sponsorship against those projects.

The 1.0 version has a few breaking changes compared to the 0.x series, described in the release notes for 1.0.0rc1 that came out in February.

The most notable of these is a change to how code runs on startup and shutdown. Previously that was handled by on_startup and on_shutdown parameters, but the new system uses a neat lifespan mechanism instead based around an async context manager:

@contextlib.asynccontextmanager
async def lifespan(app):
    async with some_async_resource():
        print("Run at startup!")
        yield
        print("Run on shutdown!")

app = Starlette(
    routes=routes,
    lifespan=lifespan
)

If you haven't tried Starlette before it feels to me like an asyncio-native cross between Flask and Django, unsurprising since creator Kim Christie is also responsible for Django REST Framework. Crucially, this means you can write most apps as a single Python file, Flask style.

This makes it really easy for LLMs to spit out a working Starlette app from a single prompt.

There's just one problem there: if 1.0 breaks compatibility with the Starlette code that the models have been trained on, how can we have them generate code that works with 1.0?

I decided to see if I could get this working with a Skill.

Building a Skill with Claude

Regular Claude Chat on claude.ai has skills, and one of those default skills is the skill-creator skill. This means Claude knows how to build its own skills.

So I started a chat session and told it:

Clone Starlette from GitHub - it just had its 1.0 release. Build a skill markdown document for this release which includes code examples of every feature.

I didn't even tell it where to find the repo, Starlette is widely enough known that I expected it could find it on its own.

It ran git clone https://github.com/encode/starlette.git which is actually the old repository name, but GitHub handles redirects automatically so this worked just fine.

The resulting skill document looked very thorough to me... and then I noticed a new button at the top I hadn't seen before labelled "Copy to your skills". So I clicked it:

And now my regular Claude chat has access to that skill!

A task management demo app

I started a new conversation and prompted:

Build a task management app with Starlette, it should have projects and tasks and comments and labels

And Claude did exactly that, producing a simple GitHub Issues clone using Starlette 1.0, a SQLite database (via aiosqlite) and a Jinja2 template.

Claude even tested the app manually like this:

cd /home/claude/taskflow && timeout 5 python -c "
import asyncio
from database import init_db
asyncio.run(init_db())
print('DB initialized successfully')
" 2>&1

pip install httpx --break-system-packages -q \
  && cd /home/claude/taskflow && \
  python -c "
from starlette.testclient import TestClient
from main import app

client = TestClient(app)

r = client.get('/api/stats')
print('Stats:', r.json())

r = client.get('/api/projects')
print('Projects:', len(r.json()), 'found')

r = client.get('/api/tasks')
print('Tasks:', len(r.json()), 'found')

r = client.get('/api/labels')
print('Labels:', len(r.json()), 'found')

r = client.get('/api/tasks/1')
t = r.json()
print(f'Task 1: \"{t[\"title\"]}\" - {len(t[\"comments\"])} comments, {len(t[\"labels\"])} labels')

r = client.post('/api/tasks', json={'title':'Test task','project_id':1,'priority':'high','label_ids':[1,2]})
print('Created task:', r.status_code, r.json()['title'])

r = client.post('/api/comments', json={'task_id':1,'content':'Test comment'})
print('Created comment:', r.status_code)

r = client.get('/')
print('Homepage:', r.status_code, '- length:', len(r.text))

print('\nAll tests passed!')
"

For all of the buzz about Claude Code, it's easy to overlook that Claude itself counts as a coding agent now, fully able to both write and then test the code that it is writing.

Here's what the resulting app looked like. The code is here in my research repository.

Tags: skills, claude, asgi, ai, llms, open-source, coding-agents, ai-assisted-programming, python, generative-ai, agentic-engineering, kim-christie, starlette

Thoughts on OpenAI acquiring Astral and uv/ruff/ty

2026-03-19T16:45:15+00:00

The big news this morning: Astral to join OpenAI (on the Astral blog) and OpenAI to acquire Astral (the OpenAI announcement). Astral are the company behind uv, ruff, and ty - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts!

The official line from OpenAI and Astral

The Astral team will become part of the Codex team at OpenAI.

Charlie Marsh has this to say:

Open source is at the heart of that impact and the heart of that story; it sits at the center of everything we do. In line with our philosophy and OpenAI's own announcement, OpenAI will continue supporting our open source tools after the deal closes. We'll keep building in the open, alongside our community -- and for the broader Python ecosystem -- just as we have from the start. [...]

After joining the Codex team, we'll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.

OpenAI's message has a slightly different focus (highlights mine):

As part of our developer-first philosophy, after closing OpenAI plans to support Astral’s open source products. By bringing Astral’s tooling and engineering expertise to OpenAI, we will accelerate our work on Codex and expand what AI can do across the software development lifecycle.

This is a slightly confusing message. The Codex CLI is a Rust application, and Astral have some of the best Rust engineers in the industry - BurntSushi alone (Rust regex, ripgrep, jiff) may be worth the price of acquisition!

So is this about the talent or about the product? I expect both, but I know from past experience that a product+talent acquisition can turn into a talent-only acquisition later on.

uv is the big one

Of Astral's projects the most impactful is uv. If you're not familiar with it, uv is by far the most convincing solution to Python's environment management problems, best illustrated by this classic XKCD:

Switch from python to uv run and most of these problems go away. I've been using it extensively for the past couple of years and it's become an essential part of my workflow.

I'm not alone in this. According to PyPI Stats uv was downloaded more than 126 million times last month! Since its release in February 2024 - just two years ago - it's become one of the most popular tools for running Python code.

Ruff and ty

Astral's two other big projects are ruff - a Python linter and formatter - and ty - a fast Python type checker.

These are popular tools that provide a great developer experience but they aren't load-bearing in the same way that uv is.

They do however resonate well with coding agent tools like Codex - giving an agent access to fast linting and type checking tools can help improve the quality of the code they generate.

I'm not convinced that integrating them into the coding agent itself as opposed to telling it when to run them will make a meaningful difference, but I may just not be imaginative enough here.

What of pyx?

Ever since uv started to gain traction the Python community has been worrying about the strategic risk of a single VC-backed company owning a key piece of Python infrastructure. I wrote about one of those conversations in detail back in September 2024.

The conversation back then focused on what Astral's business plan could be, which started to take form in August 2025 when they announced pyx, their private PyPI-style package registry for organizations.

I'm less convinced that pyx makes sense within OpenAI, and it's notably absent from both the Astral and OpenAI announcement posts.

Competitive dynamics

An interesting aspect of this deal is how it might impact the competition between Anthropic and OpenAI.

Both companies spent most of 2025 focused on improving the coding ability of their models, resulting in the November 2025 inflection point when coding agents went from often-useful to almost-indispensable tools for software development.

The competition between Anthropic's Claude Code and OpenAI's Codex is fierce. Those $200/month subscriptions add up to billions of dollars a year in revenue, for companies that very much need that money.

Anthropic acquired the Bun JavaScript runtime in December 2025, an acquisition that looks somewhat similar in shape to Astral.

Bun was already a core component of Claude Code and that acquisition looked to mainly be about ensuring that a crucial dependency stayed actively maintained. Claude Code's performance has increased significantly since then thanks to the efforts of Bun's Jarred Sumner.

One bad version of this deal would be if OpenAI start using their ownership of uv as leverage in their competition with Anthropic.

Astral's quiet series A and B

One detail that caught my eye from Astral's announcement, in the section thanking the team, investors, and community:

Second, to our investors, especially Casey Aylward from Accel, who led our Seed and Series A, and Jennifer Li from Andreessen Horowitz, who led our Series B. As a first-time, technical, solo founder, you showed far more belief in me than I ever showed in myself, and I will never forget that.

As far as I can tell neither the Series A nor the Series B were previously announced - I've only been able to find coverage of the original seed round from April 2023.

Those investors presumably now get to exchange their stake in Astral for a piece of OpenAI. I wonder how much influence they had on Astral's decision to sell.

Forking as a credible exit?

Armin Ronacher built Rye, which was later taken over by Astral and effectively merged with uv. In August 2024 he wrote about the risk involved in a VC-backed company owning a key piece of open source infrastructure and said the following (highlight mine):

However having seen the code and what uv is doing, even in the worst possible future this is a very forkable and maintainable thing. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.

Astral's own Douglas Creager emphasized this angle on Hacker News today:

All I can say is that right now, we're committed to maintaining our open-source tools with the same level of effort, care, and attention to detail as before. That does not change with this acquisition. No one can guarantee how motives, incentives, and decisions might change years down the line. But that's why we bake optionality into it with the tools being permissively licensed. That makes the worst-case scenarios have the shape of "fork and move on", and not "software disappears forever".

I like and trust the Astral team and I'm optimistic that their projects will be well-maintained in their new home.

OpenAI don't yet have much of a track record with respect to acquiring and maintaining open source projects. They've been on a bit of an acquisition spree over the past three months though, snapping up Promptfoo and OpenClaw (sort-of, they hired creator Peter Steinberger and are spinning OpenClaw off to a foundation), plus closed source LaTeX platform Crixet (now Prism).

If things do go south for uv and the other Astral projects we'll get to see how credible the forking exit strategy turns out to be.

Tags: ty, uv, openai, astral, ai, ruff, codex-cli, rust, coding-agents, python, charlie-marsh

Quoting Ken Jin

2026-03-17T21:48:26+00:00

Great news—we’ve hit our (very modest) performance goals for the CPython JIT over a year early for macOS AArch64, and a few months early for x86_64 Linux. The 3.15 alpha JIT is about 11-12% faster on macOS AArch64 than the tail calling interpreter, and 5-6%faster than the standard interpreter on x86_64 Linux.

— Ken Jin, Python 3.15’s JIT is now back on track

Tags: python

Coding agents for data analysis

2026-03-16T20:12:32+00:00

Coding agents for data analysis

Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data.

Here's the table of contents:

Coding agents

Warmup: ChatGPT and Claude

Setup Claude Code and Codex

Asking questions against a database

Exploring data with agents

Cleaning data: decoding neighborhood codes

Creating visualizations with agents

Scraping data with agents

I ran the workshop using GitHub Codespaces and OpenAI Codex, since it was easy (and inexpensive) to distribute a budget-restricted API key for Codex that attendees could use during the class. Participants ended up burning $23 of Codex tokens.

The exercises all used Python and SQLite and some of them used Datasette.

One highlight of the workshop was when we started running Datasette such that it served static content from a viz/ folder, then had Claude Code start vibe coding new interactive visualizations directly in that folder. Here's a heat map it created for my trees database using Leaflet and Leaflet.heat, source code here.

I designed the handout to also be useful for people who weren't able to attend the session in person. As is usually the case, material aimed at data journalists is equally applicable to anyone else with data to explore.

Tags: nicar, sqlite, ai, speaking, llms, coding-agents, generative-ai, data-journalism, github-codespaces, codex-cli, datasette, claude-code, python, leaflet, geospatial

Quoting Jannis Leidel

2026-03-14T18:41:25+00:00

GitHub’s slopocalypse – the flood of AI-generated spam PRs and issues – has made Jazzband’s model of open membership and shared push access untenable.

Jazzband was designed for a world where the worst case was someone accidentally merging the wrong PR. In a world where only 1 in 10 AI-generated PRs meets project standards, where curl had to shut down its bug bounty because confirmation rates dropped below 5%, and where GitHub’s own response was a kill switch to disable pull requests entirely – an organization that gives push access to everyone who joins simply can’t operate safely anymore.

— Jannis Leidel, Sunsetting Jazzband

Tags: ai-ethics, open-source, python, ai, github

An AI agent coding skeptic tries AI agent coding, in excessive detail

2026-02-27T20:43:41+00:00

An AI agent coding skeptic tries AI agent coding, in excessive detail

Another in the genre of "OK, coding agents got good in November" posts, this one is by Max Woolf and is very much worth your time. He describes a sequence of coding agent projects, each more ambitious than the last - starting with simple YouTube metadata scrapers and eventually evolving to this:

It would be arrogant to port Python's scikit-learn — the gold standard of data science and machine learning libraries — to Rust with all the features that implies.

But that's unironically a good idea so I decided to try and do it anyways. With the use of agents, I am now developing rustlearn (extreme placeholder name), a Rust crate that implements not only the fast implementations of the standard machine learning algorithms such as logistic regression and k-means clustering, but also includes the fast implementations of the algorithms above: the same three step pipeline I describe above still works even with the more simple algorithms to beat scikit-learn's implementations.

Max also captures the frustration of trying to explain how good the models have got to an existing skeptical audience:

The real annoying thing about Opus 4.6/Codex 5.3 is that it’s impossible to publicly say “Opus 4.5 (and the models that came after it) are an order of magnitude better than coding LLMs released just months before it” without sounding like an AI hype booster clickbaiting, but it’s the counterintuitive truth to my personal frustration. I have been trying to break this damn model by giving it complex tasks that would take me months to do by myself despite my coding pedigree but Opus and Codex keep doing them correctly.

A throwaway remark in this post inspired me to ask Claude Code to build a Rust word cloud CLI tool, which it happily did.

Tags: max-woolf, coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, ai, llms, november-2025-inflection, rust, python

Em dash

2026-02-15T21:40:46+00:00

I'm occasionally accused of using LLMs to write the content on my blog. I don't do that, and I don't think my writing has much of an LLM smell to it... with one notable exception:

    # Finally, do em dashes
    s = s.replace(' - ', u'\u2014')

That code to add em dashes to my posts dates back to at least 2015 when I ported my blog from an older version of Django (in a long-lost Mercurial repository) and started afresh on GitHub.

Tags: generative-ai, typography, blogging, ai, llms, python

cysqlite - a new sqlite driver

2026-02-11T17:34:40+00:00

cysqlite - a new sqlite driver

Charles Leifer has been maintaining pysqlite3 - a fork of the Python standard library's sqlite3 module that makes it much easier to run upgraded SQLite versions - since 2018.

He's been working on a ground-up Cython rewrite called cysqlite for almost as long, but it's finally at a stage where it's ready for people to try out.

The biggest change from the sqlite3 module involves transactions. Charles explains his discomfort with the sqlite3 implementation at length - that library provides two different variants neither of which exactly match the autocommit mechanism in SQLite itself.

I'm particularly excited about the support for custom virtual tables, a feature I'd love to see in sqlite3 itself.

cysqlite provides a Python extension compiled from C, which means it normally wouldn't be available in Pyodide. I set Claude Code on it (here's the prompt) and it built me cysqlite-0.1.4-cp311-cp311-emscripten_3_1_46_wasm32.whl, a 688KB wheel file with a WASM build of the library that can be loaded into Pyodide like this:

import micropip
await micropip.install(
    "https://simonw.github.io/research/cysqlite-wasm-wheel/cysqlite-0.1.4-cp311-cp311-emscripten_3_1_46_wasm32.whl"
)
import cysqlite
print(cysqlite.connect(":memory:").execute(
    "select sqlite_version()"
).fetchone())

(I also learned that wheels like this have to be built for the emscripten version used by that edition of Pyodide - my experimental wheel loads in Pyodide 0.25.1 but fails in 0.27.5 with a Wheel was built with Emscripten v3.1.46 but Pyodide was built with Emscripten v3.1.58 error.)

You can try my wheel in this new Pyodide REPL i had Claude build as a mobile-friendly alternative to Pyodide's own hosted console.

I also had Claude build this demo page that executes the original test suite in the browser and displays the results:

Via lobste.rs

Tags: charles-leifer, pyodide, webassembly, sqlite, python, ai-assisted-programming, claude-code

Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly

2026-02-06T22:31:31+00:00

There's a jargon-filled headline for you! Everyone's building sandboxes for running untrusted code right now, and Pydantic's latest attempt, Monty, provides a custom Python-like language (a subset of Python) in Rust and makes it available as both a Rust library and a Python package. I got it working in WebAssembly, providing a sandbox-in-a-sandbox.

Here's how they describe Monty:

Monty avoids the cost, latency, complexity and general faff of using full container based sandbox for running LLM generated code.

Instead, it let's you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.

What Monty can do:

Run a reasonable subset of Python code - enough for your agent to express what it wants to do

Completely block access to the host environment: filesystem, env variables and network access are all implemented via external function calls the developer can control

Call functions on the host - only functions you give it access to [...]

A quick way to try it out is via uv:

uv run --with pydantic-monty python -m asyncio

Then paste this into the Python interactive prompt - the -m asyncio enables top-level await:

import pydantic_monty
code = pydantic_monty.Monty('print("hello " + str(4 * 5))')
await pydantic_monty.run_monty_async(code)

Monty supports a very small subset of Python - it doesn't even support class declarations yet!

But, given its target use-case, that's not actually a problem.

The neat thing about providing tools like this for LLMs is that they're really good at iterating against error messages. A coding agent can run some Python code, get an error message telling it that classes aren't supported and then try again with a different approach.

I wanted to try this in a browser, so I fired up a code research task in Claude Code for web and kicked it off with the following:

Clone https://github.com/pydantic/monty to /tmp and figure out how to compile it into a python WebAssembly wheel that can then be loaded in Pyodide. The wheel file itself should be checked into the repo along with build scripts and passing pytest playwright test scripts that load Pyodide from a CDN and the wheel from a “python -m http.server” localhost and demonstrate it working

Then a little later:

I want an additional WASM file that works independently of Pyodide, which is also usable in a web browser - build that too along with playwright tests that show it working. Also build two HTML files - one called demo.html and one called pyodide-demo.html - these should work similar to https://tools.simonwillison.net/micropython (download that code with curl to inspect it) - one should load the WASM build, the other should load Pyodide and have it use the WASM wheel. These will be served by GitHub Pages so they can load the WASM and wheel from a relative path since the .html files will be served from the same folder as the wheel and WASM file

Here's the transcript, and the final research report it produced.

I now have the Monty Rust code compiled to WebAssembly in two different shapes - as a .wasm bundle you can load and call from JavaScript, and as a monty-wasm-pyodide/pydantic_monty-0.0.3-cp313-cp313-emscripten_4_0_9_wasm32.whl wheel file which can be loaded into Pyodide and then called from Python in Pyodide in WebAssembly in a browser.

Here are those two demos, hosted on GitHub Pages:

Monty WASM demo - a UI over JavaScript that loads the Rust WASM module directly.
Monty Pyodide demo - this one provides an identical interface but here the code is loading Pyodide and then installing the Monty WASM wheel.

As a connoisseur of sandboxes - the more options the better! - this new entry from Pydantic ticks a lot of my boxes. It's small, fast, widely available (thanks to Rust and WebAssembly) and provides strict limits on memory usage, CPU time and access to disk and network.

It was also a great excuse to spin up another demo showing how easy it is these days to turn compiled code like C or Rust into WebAssembly that runs in both a browser and a Pyodide environment.

Tags: pydantic, sandboxing, ai, claude-code, llms, rust, webassembly, python, javascript, ai-assisted-programming, coding-agents, generative-ai, pyodide

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

2026-02-04T14:59:47+00:00

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.

sqlite-scanner

sqlite-scanner is my new Go CLI tool for scanning a filesystem for SQLite database files.

It works by checking if the first 16 bytes of the file exactly match the SQLite magic number sequence SQLite format 3\x00. It can search one or more folders recursively, spinning up concurrent goroutines to accelerate the scan. It streams out results as it finds them in plain text, JSON or newline-delimited JSON. It can optionally display the file sizes as well.

To try it out you can download a release from the GitHub releases - and then jump through macOS hoops to execute an "unsafe" binary. Or you can clone the repo and compile it with Go. Or... you can run the binary like this:

uvx sqlite-scanner

By default this will search your current directory for SQLite databases. You can pass one or more directories as arguments:

uvx sqlite-scanner ~ /tmp

Add --json for JSON output, --size to include file sizes or --jsonl for newline-delimited JSON. Here's a demo:

uvx sqlite-scanner ~ --jsonl --size

If you haven't been uv-pilled yet you can instead install sqlite-scanner using pip install sqlite-scanner and then run sqlite-scanner.

To get a permanent copy with uv use uv tool install sqlite-scanner.

How the Python package works

The reason this is worth doing is that pip, uv and PyPI will work together to identify the correct compiled binary for your operating system and architecture.

This is driven by file names. If you visit the PyPI downloads for sqlite-scanner you'll see the following files:

sqlite_scanner-0.1.1-py3-none-win_arm64.whl
sqlite_scanner-0.1.1-py3-none-win_amd64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_x86_64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_aarch64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_x86_64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_aarch64.whl
sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl
sqlite_scanner-0.1.1-py3-none-macosx_10_9_x86_64.whl

When I run pip install sqlite-scanner or uvx sqlite-scanner on my Apple Silicon Mac laptop Python's packaging magic ensures I get that macosx_11_0_arm64.whl variant.

Here's what's in the wheel, which is a zip file with a .whl extension.

In addition to the bin/sqlite-scanner the most important file is sqlite_scanner/__init__.py which includes the following:

def get_binary_path():
    """Return the path to the bundled binary."""
    binary = os.path.join(os.path.dirname(__file__), "bin", "sqlite-scanner")
 
    # Ensure binary is executable on Unix
    if sys.platform != "win32":
        current_mode = os.stat(binary).st_mode
        if not (current_mode & stat.S_IXUSR):
            os.chmod(binary, current_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
 
    return binary
 
 
def main():
    """Execute the bundled binary."""
    binary = get_binary_path()
 
    if sys.platform == "win32":
        # On Windows, use subprocess to properly handle signals
        sys.exit(subprocess.call([binary] + sys.argv[1:]))
    else:
        # On Unix, exec replaces the process
        os.execvp(binary, [binary] + sys.argv[1:])

That main() method - also called from sqlite_scanner/__main__.py - locates the binary and executes it when the Python package itself is executed, using the sqlite-scanner = sqlite_scanner:main entry point defined in the wheel.

Which means we can use it as a dependency

Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.

I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.

That's genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.

To demonstrate this, I built datasette-scan - a new Datasette plugin which depends on sqlite-scanner and then uses that Go binary to scan a folder for SQLite databases and attach them to a Datasette instance.

Here's how to use that (without even installing anything first, thanks uv) to explore any SQLite databases in your Downloads folder:

uv run --with datasette-scan datasette scan ~/Downloads

If you peek at the code you'll see it depends on sqlite-scanner in pyproject.toml and calls it using subprocess.run() against sqlite_scanner.get_binary_path() in its own scan_directories() function.

I've been exploring this pattern for other, non-Go binaries recently - here's a recent script that depends on static-ffmpeg to ensure that ffmpeg is available for the script to use.

Building Python wheels from Go packages with go-to-wheel

After trying this pattern myself a couple of times I realized it would be useful to have a tool to automate the process.

I first brainstormed with Claude to check that there was no existing tool to do this. It pointed me to maturin bin which helps distribute Rust projects using Python wheels, and pip-binary-factory which bundles all sorts of other projects, but did not identify anything that addressed the exact problem I was looking to solve.

So I had Claude Code for web build the first version, then refined the code locally on my laptop with the help of more Claude Code and a little bit of OpenAI Codex too, just to mix things up.

The full documentation is in the simonw/go-to-wheel repository. I've published that tool to PyPI so now you can run it using:

uvx go-to-wheel --help

The sqlite-scanner package you can see on PyPI was built using go-to-wheel like this:

uvx go-to-wheel ~/dev/sqlite-scanner \
  --set-version-var main.version \
  --version 0.1.1 \
  --readme README.md \
  --author 'Simon Willison' \
  --url https://github.com/simonw/sqlite-scanner \
  --description 'Scan directories for SQLite databases'

This created a set of wheels in the dist/ folder. I tested one of them like this:

uv run --with dist/sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl \
  sqlite-scanner --version

When that spat out the correct version number I was confident everything had worked as planned, so I pushed the whole set of wheels to PyPI using twine upload like this:

uvx twine upload dist/*

I had to paste in a PyPI API token I had saved previously.

I expect to use this pattern a lot

sqlite-scanner is very clearly meant as a proof-of-concept for this wider pattern - Python is very much capable of recursively crawling a directory structure looking for files that start with a specific byte prefix on its own!

That said, I think there's a lot to be said for this pattern. Go is a great complement to Python - it's fast, compiles to small self-contained binaries, has excellent concurrency support and a rich ecosystem of libraries.

Go is similar to Python in that it has a strong standard library. Go is particularly good for HTTP tooling - I've built several HTTP proxies in the past using Go's excellent net/http/httputil.ReverseProxy handler.

I've also been experimenting with wazero, Go's robust and mature zero dependency WebAssembly runtime as part of my ongoing quest for the ideal sandbox for running untrusted code. Here's my latest experiment with that library.

Being able to seamlessly integrate Go binaries into Python projects without the end user having to think about Go at all - they pip install and everything Just Works - feels like a valuable addition to my toolbox.

Tags: uv, go, pypi, packaging, ai-assisted-programming, python, datasette, projects, sqlite

Introducing Deno Sandbox

2026-02-03T22:44:50+00:00

Introducing Deno Sandbox

Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hosted sandbox using their deno-sandbox Python library like this:

export DENO_DEPLOY_TOKEN="... API token ..."
uv run --with deno-sandbox python

Then:

from deno_sandbox import DenoDeploy

sdk = DenoDeploy()

with sdk.sandbox.create() as sb:
    # Run a shell command
    process = sb.spawn(
        "echo", args=["Hello from the sandbox!"]
    )
    process.wait()
    # Write and read files
    sb.fs.write_text_file(
        "/tmp/example.txt", "Hello, World!"
    )
    print(sb.fs.read_text_file(
        "/tmp/example.txt"
    ))

There’s a JavaScript client library as well. The underlying API isn’t documented yet but appears to use WebSockets.

There’s a lot to like about this system. Sandboxe instances can have up to 4GB of RAM, get 2 vCPUs, 10GB of ephemeral storage, can mount persistent volumes and can use snapshots to boot pre-configured custom images quickly. Sessions can last up to 30 minutes and are billed by CPU time, GB-h of memory and volume storage usage.

When you create a sandbox you can configure network domains it’s allowed to access.

My favorite feature is the way it handles API secrets.

with sdk.sandboxes.create(
    allowNet=["api.openai.com"],
    secrets={
        "OPENAI_API_KEY": {
            "hosts": ["api.openai.com"],
            "value": os.environ.get("OPENAI_API_KEY"),
        }
    },
) as sandbox:
    # ... $OPENAI_API_KEY is available

Within the container that $OPENAI_API_KEY value is set to something like this:

DENO_SECRET_PLACEHOLDER_b14043a2f578cba...

Outbound API calls to api.openai.com run through a proxy which is aware of those placeholders and replaces them with the original secret.

In this way the secret itself is not available to code within the sandbox, which limits the ability for malicious code (e.g. from a prompt injection) to exfiltrate those secrets.

From a comment on Hacker News I learned that Fly have a project called tokenizer that implements the same pattern. Adding this to my list of tricks to use with sandoxed environments!

Via Hacker News

Tags: fly, deno, security, python, sandboxing

sqlite-ast 0.1a0

2026-01-30T06:12:45+00:00

Release: sqlite-ast 0.1a0

I wanted a Python library that could parse SQLite SELECT statements, so I vibe coded this one up based on a specification I reverse-engineered from SQLite's own parser behavior.

There's an interactive playground here for trying it out in the browser (via Pyodide).

Tags: sqlite, vibe-coding, python

Datasette 1.0a24

2026-01-29T17:21:51+00:00

Datasette 1.0a24

New Datasette alpha this morning. Key new features:

Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to rows of data.
The recommended development environment for hacking on Datasette itself now uses uv. Crucially, you can clone Datasette and run uv run pytest to run the tests without needing to manually create a virtual environment or install dependencies first, thanks to the dev dependency group pattern.
A new ?_extra=render_cell parameter for both table and row JSON pages to return the results of executing the render_cell() plugin hook. This should unlock new JavaScript UI features in the future.

More details in the release notes. I also invested a bunch of work in eliminating flaky tests that were intermittently failing in CI - I think those are all handled now.

Tags: projects, datasette, python, uv, annotated-release-notes

Tips for getting coding agents to write good Python tests

2026-01-26T23:55:29+00:00

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

Tags: testing, coding-agents, python, generative-ai, ai, llms, hacker-news, pytest

Anthropic invests $1.5 million in the Python Software Foundation and open source security

2026-01-13T23:58:17+00:00

Anthropic invests $1.5 million in the Python Software Foundation and open source security

This is outstanding news, especially given our decision to withdraw from that NSF grant application back in October.

We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community.

Note that while security is a focus these funds will also support other aspects of the PSF's work:

Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants and other programs, running core infrastructure such as PyPI, and more.

Tags: open-source, anthropic, python, ai, psf

Quoting Linus Torvalds

2026-01-11T02:29:58+00:00

Also note that the python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.

— Linus Torvalds, Another silly guitar-pedal-related repo

Tags: ai, vibe-coding, linus-torvalds, python, llms, generative-ai

How uv got so fast

2025-12-26T23:43:15+00:00

How uv got so fast

Andrew Nesbitt provides an insightful teardown of why uv is so much faster than pip. It's not nearly as simple as just "they rewrote it in Rust" - uv gets to skip a huge amount of Python packaging history (which pip needs to implement for backwards compatibility) and benefits enormously from work over recent years that makes it possible to resolve dependencies across most packages without having to execute the code in setup.py using a Python interpreter.

Two notes that caught my eye that I hadn't understood before:

HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust.

[...]

Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. Over 90% of versions fit in one u64. This is micro-optimization that compounds across millions of comparisons.

I wanted to learn more about these tricks, so I fired up an asynchronous research task and told it to checkout the astral-sh/uv repo, find the Rust code for both of those features and try porting it to Python to help me understand how it works.

Here's the report that it wrote for me, the prompts I used and the Claude Code transcript.

You can try the script it wrote for extracting metadata from a wheel using HTTP range requests like this:

uv run --with httpx https://raw.githubusercontent.com/simonw/research/refs/heads/main/http-range-wheel-metadata/wheel_metadata.py https://files.pythonhosted.org/packages/8b/04/ef95b67e1ff59c080b2effd1a9a96984d6953f667c91dfe9d77c838fc956/playwright-1.57.0-py3-none-macosx_11_0_arm64.whl -v

The Playwright wheel there is ~40MB. Adding -v at the end causes the script to spit out verbose details of how it fetched the data - which looks like this.

Key extract from that output:

[1] HEAD request to get file size...
    File size: 40,775,575 bytes
[2] Fetching last 16,384 bytes (EOCD + central directory)...
    Received 16,384 bytes
[3] Parsed EOCD:
    Central directory offset: 40,731,572
    Central directory size: 43,981
    Total entries: 453
[4] Fetching complete central directory...
    ...
[6] Found METADATA: playwright-1.57.0.dist-info/METADATA
    Offset: 40,706,744
    Compressed size: 1,286
    Compression method: 8
[7] Fetching METADATA content (2,376 bytes)...
[8] Decompressed METADATA: 3,453 bytes

Total bytes fetched: 18,760 / 40,775,575 (100.0% savings)

The section of the report on compact version representation is interesting too. Here's how it illustrates sorting version numbers correctly based on their custom u64 representation:

Sorted order (by integer comparison of packed u64):
  1.0.0a1 (repr=0x0001000000200001)
  1.0.0b1 (repr=0x0001000000300001)
  1.0.0rc1 (repr=0x0001000000400001)
  1.0.0 (repr=0x0001000000500000)
  1.0.0.post1 (repr=0x0001000000700001)
  1.0.1 (repr=0x0001000100500000)
  2.0.0.dev1 (repr=0x0002000000100001)
  2.0.0 (repr=0x0002000000500000)

Tags: http-range-requests, sorting, uv, performance, rust, python, vibe-porting

uv-init-demos

2025-12-24T22:05:23+00:00

uv-init-demos

uv has a useful uv init command for setting up new Python projects, but it comes with a bunch of different options like --app and --package and --lib and I wasn't sure how they differed.

So I created this GitHub repository which demonstrates all of those options, generated using this update-projects.sh script (thanks, Claude) which will run on a schedule via GitHub Actions to capture any changes made by future releases of uv.

Tags: uv, git-scraping, python, projects, github-actions

MicroQuickJS

2025-12-23T20:53:40+00:00

MicroQuickJS

New project from programming legend Fabrice Bellard, of ffmpeg and QEMU and QuickJS and so much more fame:

MicroQuickJS (aka. MQuickJS) is a Javascript engine targetted at embedded systems. It compiles and runs Javascript programs with as low as 10 kB of RAM. The whole engine requires about 100 kB of ROM (ARM Thumb-2 code) including the C library. The speed is comparable to QuickJS.

It supports a subset of full JavaScript, though it looks like a rich and full-featured subset to me.

One of my ongoing interests is sandboxing: mechanisms for executing untrusted code - from end users or generated by LLMs - in an environment that restricts memory usage and applies a strict time limit and restricts file or network access. Could MicroQuickJS be useful in that context?

I fired up Claude Code for web (on my iPhone) and kicked off an asynchronous research project to see explore that question:

My full prompt is here. It started like this:

Clone https://github.com/bellard/mquickjs to /tmp

Investigate this code as the basis for a safe sandboxing environment for running untrusted code such that it cannot exhaust memory or CPU or access files or the network

First try building python bindings for this using FFI - write a script that builds these by checking out the code to /tmp and building against that, to avoid copying the C code in this repo permanently. Write and execute tests with pytest to exercise it as a sandbox

Then build a "real" Python extension not using FFI and experiment with that

Then try compiling the C to WebAssembly and exercising it via both node.js and Deno, with a similar suite of tests [...]

I later added to the interactive session:

Does it have a regex engine that might allow a resource exhaustion attack from an expensive regex?

(The answer was no - the regex engine calls the interrupt handler even during pathological expression backtracking, meaning that any configured time limit should still hold.)

Here's the full transcript and the final report.

Some key observations:

MicroQuickJS is very well suited to the sandbox problem. It has robust near and time limits baked in, it doesn't expose any dangerous primitive like filesystem of network access and even has a regular expression engine that protects against exhaustion attacks (provided you configure a time limit).
Claude span up and tested a Python library that calls a MicroQuickJS shared library (involving a little bit of extra C), a compiled a Python binding and a library that uses the original MicroQuickJS CLI tool. All of those approaches work well.
Compiling to WebAssembly was a little harder. It got a version working in Node.js and Deno and Pyodide, but the Python libraries wasmer and wasmtime proved harder, apparently because "mquickjs uses setjmp/longjmp for error handling". It managed to get to a working wasmtime version with a gross hack.

I'm really excited about this. MicroQuickJS is tiny, full featured, looks robust and comes from excellent pedigree. I think this makes for a very solid new entrant in the quest for a robust sandbox.

Update: I had Claude Code build tools.simonwillison.net/microquickjs, an interactive web playground for trying out the WebAssembly build of MicroQuickJS, adapted from my previous QuickJS plaground. My QuickJS page loads 2.28 MB (675 KB transferred). The MicroQuickJS one loads 303 KB (120 KB transferred).

Here are the prompts I used for that.

Tags: pyodide, deno, sandboxing, ai, llms, nodejs, python, javascript, generative-ai, c, claude-code, webassembly, fabrice-bellard

ty: An extremely fast Python type checker and LSP

2025-12-16T23:35:33+00:00

ty: An extremely fast Python type checker and LSP

The team at Astral have been working on this for quite a long time, and are finally releasing the first beta. They have some big performance claims:

Without caching, ty is consistently between 10x and 60x faster than mypy and Pyright. When run in an editor, the gap is even more dramatic. As an example, after editing a load-bearing file in the PyTorch repository, ty recomputes diagnostics in 4.7ms: 80x faster than Pyright (386ms) and 500x faster than Pyrefly (2.38 seconds). ty is very fast!

The easiest way to try it out is via uvx:

cd my-python-project/
uvx ty check

I tried it against sqlite-utils and it turns out I have quite a lot of work to do!

Astral also released a new VS Code extension adding ty-powered language server features like go to definition. I'm still getting my head around how this works and what it can do.

Via Hacker News

Tags: astral, python, vs-code, ty

Poe the Poet

2025-12-16T22:57:02+00:00

Poe the Poet

I was looking for a way to specify additional commands in my pyproject.toml file to execute using uv. There's an enormous issue thread on this in the uv issue tracker (300+ comments dating back to August 2024) and from there I learned of several options including this one, Poe the Poet.

It's neat. I added it to my s3-credentials project just now and the following now works for running the live preview server for the documentation:

uv run poe livehtml

Here's the snippet of TOML I added to my pyproject.toml:

[dependency-groups]
test = [
    "pytest",
    "pytest-mock",
    "cogapp",
    "moto>=5.0.4",
]
docs = [
    "furo",
    "sphinx-autobuild",
    "myst-parser",
    "cogapp",
]
dev = [
    {include-group = "test"},
    {include-group = "docs"},
    "poethepoet>=0.38.0",
]

[tool.poe.tasks]
docs = "sphinx-build -M html docs docs/_build"
livehtml = "sphinx-autobuild -b html docs docs/_build"
cog = "cog -r docs/*.md"

Since poethepoet is in the dev= dependency group any time I run uv run ... it will be available in the environment.

Tags: uv, packaging, python, s3-credentials

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

2025-12-15T23:58:38+00:00

I wrote about JustHTML yesterday - Emil Stenström's project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try porting JustHTML from Python to JavaScript with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.

TL;DR

I built simonw/justjshtml, a dependency-free HTML5 parsing library in JavaScript which passes 9,200 tests from the html5lib-tests suite and imitates the API design of Emil's JustHTML library.

It took two initial prompts and a few tiny follow-ups. GPT-5.2 running in Codex CLI ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.

Time elapsed from project idea to finished library: about 4 hours, during which I also bought and decorated a Christmas tree with family and watched the latest Knives Out movie.

Some background

One of the most important contributions of the HTML5 specification ten years ago was the way it precisely specified how invalid HTML should be parsed. The world is full of invalid documents and having a specification that covers those means browsers can treat them in the same way - there's no more "undefined behavior" to worry about when building parsing software.

Unsurprisingly, those invalid parsing rules are pretty complex! The free online book Idiosyncrasies of the HTML parser by Simon Pieters is an excellent deep dive into this topic, in particular Chapter 3. The HTML parser.

The Python html5lib project started the html5lib-tests repository with a set of implementation-independent tests. These have since become the gold standard for interoperability testing of HTML5 parsers, and are used by projects such as Servo which used them to help build html5ever, a "high-performance browser-grade HTML5 parser" written in Rust.

Emil Stenström's JustHTML project is a pure-Python implementation of an HTML5 parser that passes the full html5lib-tests suite. Emil spent a couple of months working on this as a side project, deliberately picking a problem with a comprehensive existing test suite to see how far he could get with coding agents.

At one point he had the agents rewrite it based on a close inspection of the Rust html5ever library. I don't know how much of this was direct translation versus inspiration (here's Emil's commentary on that) - his project has 1,215 commits total so it appears to have included a huge amount of iteration, not just a straight port.

My project is a straight port. I instructed Codex CLI to build a JavaScript version of Emil's Python code.

The process in detail

I started with a bit of mise en place. I checked out two repos and created an empty third directory for the new project:

cd ~/dev
git clone https://github.com/EmilStenstrom/justhtml
git clone https://github.com/html5lib/html5lib-tests
mkdir justjshtml
cd justjshtml

Then I started Codex CLI for GPT-5.2 like this:

codex --yolo -m gpt-5.2

That --yolo flag is a shortcut for --dangerously-bypass-approvals-and-sandbox, which is every bit as dangerous as it sounds.

My first prompt told Codex to inspect the existing code and use it to build a specification for the new JavaScript library:

We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. It is going to have a similar API to the Python library but in JavaScript. It will have no dependencies other than raw JavaScript, hence it will work great in the browser and node.js and other environments. Start by reading ~/dev/justhtml and designing the user-facing API for the new library - create a spec.md containing your plan.

I reviewed the spec, which included a set of proposed milestones, and told it to add another:

Add an early step to the roadmap that involves an initial version that parses a simple example document that is valid and returns the right results. Then add and commit the spec.md file.

Here's the resulting spec.md file. My request for that initial version became "Milestone 0.5" which looked like this:

Milestone 0.5 — End-to-end smoke parse (single valid document)

Implement the smallest end-to-end slice so the public API is real early:

new JustHTML("<html><head></head><body><p>Hello</p></body></html>") returns a tree with the expected tag structure and text nodes.

doc.toText() returns "Hello" and doc.errors is empty for this valid input.

Add scripts/smoke.js (no deps) that runs the example and asserts the expected structure/output.

Gate: node scripts/smoke.js passes.

Then I told it:

Implement Milestone 0.5

And off it went. The resulting code appeared to work so I said:

Configure GitHub Actions test.yml to run that on every commit, then commit and push

I ran git init and created a private GitHub repository for this project at this point, and set up the local directory to push to that remote.

Here's that initial push. Then I told it:

OK do the rest, commit and push often

And that was almost it!

I set my laptop to not fall asleep and left it to its devices while we went off to buy a Christmas tree. The "commit and push often" meant I could monitor its progress on my phone by refreshing the commit log on GitHub.

I was running this against my $20/month ChatGPT Plus account, which has a five hour token allowance window for Codex CLI. That ran out at 6:35pm and Codex paused, so I waited until the reset point at 7:14pm and typed:

Keep going

At 9:30pm it declared itself done with the following summary message:

• What’s Done

  - html5lib-tests now run in CI and pass: tokenizer 6810/6810, tree 1770/1782
    (12 skipped: #script-on), serializer 230/230, encoding 82/83 (1 skipped:
    scripted) (.github/workflows/test.yml:1).
  - Public API expanded: stream() (src/stream.js:1), CSS selectors query()/
    matches() (src/selector.js:1), and toMarkdown() (src/markdown.js:1), all
    exported via src/index.js:1.
  - Added no-deps unit scripts for selectors/stream/markdown (scripts/run-
    selector-tests.js:1, scripts/run-stream-tests.js:1, scripts/run-markdown-
    tests.js:1) and wired them into CI.
  - Fixed TreeBuilder so <br> no longer closes <p> (src/treebuilder.js:647).

As a finishing touch, I had it add a playground interface so I could try out the new library in my browser. I prompted:

Add a playground.html in the top level folder which loads the necessary ES modules from ./src/... and implements the exact same functionality as seen on https://tools.simonwillison.net/justhtml but using the JavaScript library instead of Pyodide

It fetched my existing JustHTML playground page (described here) using curl and built a new playground.html file that loaded the new JavaScript code instead. This worked perfectly.

I enabled GitHub Pages for my still-private repo which meant I could access the new playground at this URL:

https://simonw.github.io/justjshtml/playground.html

All it needed now was some documentation:

Add a comprehensive README with full usage instructions including attribution plus how this was built plus how to use in in HTML plus how to use it in Node.js

You can read the result here.

We are now at eight prompts total, running for just over four hours and I've decorated for Christmas and watched Wake Up Dead Man on Netflix.

According to Codex CLI:

Token usage: total=2,089,858 input=1,464,295 (+ 97,122,176 cached) output=625,563 (reasoning 437,010)

My llm-prices.com calculator estimates that at $29.41 if I was paying for those tokens at API prices, but they were included in my $20/month ChatGPT Plus subscription so the actual extra cost to me was zero.

What can we learn from this?

I'm sharing this project because I think it demonstrates a bunch of interesting things about the state of LLMs in December 2025.

Frontier LLMs really can perform complex, multi-hour tasks with hundreds of tool calls and minimal supervision. I used GPT-5.2 for this but I have no reason to believe that Claude Opus 4.5 or Gemini 3 Pro would not be able to achieve the same thing - the only reason I haven't tried is that I don't want to burn another 4 hours of time and several million tokens on more runs.
If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed. I called this designing the agentic loop a few months ago. I think it's the key skill to unlocking the potential of LLMs for complex tasks.
Porting entire open source libraries from one language to another via a coding agent works extremely well.
Code is so cheap it's practically free. Code that works continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.
We haven't even begun to unpack the etiquette and ethics around this style of development. Is it responsible and appropriate to churn out a direct port of a library like this in a few hours while watching a movie? What would it take for code built like this to be trusted in production?

I'll end with some open questions:

Does this library represent a legal violation of copyright of either the Rust library or the Python one?
Even if this is legal, is it ethical to build a library in this way?
Does this format of development hurt the open source ecosystem?
Can I even assert copyright over this, given how much of the work was produced by the LLM?
Is it responsible to publish software libraries built in this way?
How much better would this library be if an expert team hand crafted it over the course of several months?

Update 11th January 2026: I originally ended this post with just these open questions, but I've now provided my own answers to the questions in a new post.

Tags: ai, html, llms, gpt-5, codex-cli, javascript, python, generative-ai, ai-assisted-programming, november-2025-inflection, vibe-porting