Simon Willison's Weblog: uv

Gemma 4 audio with MLX

2026-04-12T23:57:53+00:00

Thanks to a tip from Rahim Nathwani, here's a uv run recipe for transcribing an audio file on macOS using the 10.28 GB Gemma 4 E2B model with MLX and mlx-vlm:

uv run --python 3.13 --with mlx_vlm --with torchvision --with gradio \
  mlx_vlm.generate \
  --model google/gemma-4-e2b-it \
  --audio file.wav \
  --prompt "Transcribe this audio" \
  --max-tokens 500 \
  --temperature 1.0

Your browser does not support the audio element.

I tried it on this 14 second .wav file and it output the following:

This front here is a quick voice memo. I want to try it out with MLX VLM. Just going to see if it can be transcribed by Gemma and how that works.

(That was supposed to be "This right here..." and "... how well that works" but I can hear why it misinterpreted that as "front" and "how that works".)

Tags: python, ai, generative-ai, llms, uv, mlx, gemma, speech-to-text

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

2026-03-30T14:28:34+00:00

Trip Venturella released Mr. Chatterbox, a language model trained entirely on out-of-copyright text from the British Library. Here's how he describes it in the model card:

Mr. Chatterbox is a language model trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library. The model has absolutely no training inputs from after 1899 — the vocabulary and ideas are formed exclusively from nineteenth-century literature.

Mr. Chatterbox's training corpus was 28,035 books, with an estimated 2.93 billion input tokens after filtering. The model has roughly 340 million paramaters, roughly the same size as GPT-2-Medium. The difference is, of course, that unlike GPT-2, Mr. Chatterbox is trained entirely on historical data.

Given how hard it is to train a useful LLM without using vast amounts of scraped, unlicensed data I've been dreaming of a model like this for a couple of years now. What would a model trained on out-of-copyright text be like to chat with?

Thanks to Trip we can now find out for ourselves!

The model itself is tiny, at least by Large Language Model standards - just 2.05GB on disk. You can try it out using Trip's HuggingFace Spaces demo:

Honestly, it's pretty terrible. Talking with it feels more like chatting with a Markov chain than an LLM - the responses may have a delightfully Victorian flavor to them but it's hard to get a response that usefully answers a question.

The 2022 Chinchilla paper suggests a ratio of 20x the parameter count to training tokens. For a 340m model that would suggest around 7 billion tokens, more than twice the British Library corpus used here. The smallest Qwen 3.5 model is 600m parameters and that model family starts to get interesting at 2b - so my hunch is we would need 4x or more the training data to get something that starts to feel like a useful conversational partner.

But what a fun project!

Running it locally with LLM

I decided to see if I could run the model on my own machine using my LLM framework.

I got Claude Code to do most of the work - here's the transcript.

Trip trained the model using Andrej Karpathy's nanochat, so I cloned that project, pulled the model weights and told Claude to build a Python script to run the model. Once we had that working (which ended up needing some extra details from the Space demo source code) I had Claude read the LLM plugin tutorial and build the rest of the plugin.

llm-mrchatterbox is the result. Install the plugin like this:

llm install llm-mrchatterbox

The first time you run a prompt it will fetch the 2.05GB model file from Hugging Face. Try that like this:

llm -m mrchatterbox "Good day, sir"

Or start an ongoing chat session like this:

llm chat -m mrchatterbox

If you don't have LLM installed you can still get a chat session started from scratch using uvx like this:

uvx --with llm-mrchatterbox llm chat -m mrchatterbox

When you are finished with the model you can delete the cached file using:

llm mrchatterbox delete-model

This is the first time I've had Claude Code build a full LLM model plugin from scratch and it worked really well. I expect I'll be using this method again in the future.

I continue to hope we can get a useful model from entirely public domain data. The fact that Trip was able to get this far using nanochat and 2.93 billion training tokens is a promising start.

Update 31st March 2026: I had missed this when I first published this piece but Trip has his own detailed writeup of the project which goes into much more detail about how he trained the model. Here's how the books were filtered for pre-training:

First, I downloaded the British Library dataset split of all 19th-century books. I filtered those down to books contemporaneous with the reign of Queen Victoria—which, unfortunately, cut out the novels of Jane Austen—and further filtered those down to a set of books with a optical character recognition (OCR) confidence of .65 or above, as listed in the metadata. This left me with 28,035 books, or roughly 2.93 billion tokes for pretraining data.

Getting it to behave like a conversational model was a lot harder. Trip started by trying to train on plays by Oscar Wilde and George Bernard Shaw, but found they didn't provide enough pairs. Then he tried extracting dialogue pairs from the books themselves with poor results. The approach that worked was to have Claude Haiku and GPT-4o-mini generate synthetic conversation pairs for the supervised fine tuning, which solved the problem but sadly I think dilutes the "no training inputs from after 1899" claim from the original model card.

Tags: ai, andrej-karpathy, generative-ai, local-llms, llms, ai-assisted-programming, hugging-face, llm, training-data, uv, ai-ethics, claude-code

Package Managers Need to Cool Down

2026-03-24T21:11:38+00:00

Package Managers Need to Cool Down

Today's LiteLLM supply chain attack inspired me to revisit the idea of dependency cooldowns, the practice of only installing updated dependencies once they've been out in the wild for a few days to give the community a chance to spot if they've been subverted in some way.

This recent piece (March 4th) piece by Andrew Nesbitt reviews the current state of dependency cooldown mechanisms across different packaging tools. It's surprisingly well supported! There's been a flurry of activity across major packaging tools, including:

pnpm 10.16 (September 2025) — minimumReleaseAge with minimumReleaseAgeExclude for trusted packages
Yarn 4.10.0 (September 2025) — npmMinimalAgeGate (in minutes) with npmPreapprovedPackages for exemptions
Bun 1.3 (October 2025) — minimumReleaseAge via bunfig.toml
Deno 2.6 (December 2025) — --minimum-dependency-age for deno update and deno outdated
uv 0.9.17 (December 2025) — added relative duration support to existing --exclude-newer, plus per-package overrides via exclude-newer-package
pip 26.0 (January 2026) — --uploaded-prior-to (absolute timestamps only; relative duration support requested)
npm 11.10.0 (February 2026) — min-release-age

pip currently only supports absolute rather than relative dates but Seth Larson has a workaround for that using a scheduled cron to update the absolute date in the pip.conf config file.

Tags: javascript, packaging, pip, pypi, python, security, npm, deno, supply-chain, uv

Thoughts on OpenAI acquiring Astral and uv/ruff/ty

2026-03-19T16:45:15+00:00

The big news this morning: Astral to join OpenAI (on the Astral blog) and OpenAI to acquire Astral (the OpenAI announcement). Astral are the company behind uv, ruff, and ty - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts!

The official line from OpenAI and Astral

The Astral team will become part of the Codex team at OpenAI.

Charlie Marsh has this to say:

Open source is at the heart of that impact and the heart of that story; it sits at the center of everything we do. In line with our philosophy and OpenAI's own announcement, OpenAI will continue supporting our open source tools after the deal closes. We'll keep building in the open, alongside our community -- and for the broader Python ecosystem -- just as we have from the start. [...]

After joining the Codex team, we'll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.

OpenAI's message has a slightly different focus (highlights mine):

As part of our developer-first philosophy, after closing OpenAI plans to support Astral’s open source products. By bringing Astral’s tooling and engineering expertise to OpenAI, we will accelerate our work on Codex and expand what AI can do across the software development lifecycle.

This is a slightly confusing message. The Codex CLI is a Rust application, and Astral have some of the best Rust engineers in the industry - BurntSushi alone (Rust regex, ripgrep, jiff) may be worth the price of acquisition!

So is this about the talent or about the product? I expect both, but I know from past experience that a product+talent acquisition can turn into a talent-only acquisition later on.

uv is the big one

Of Astral's projects the most impactful is uv. If you're not familiar with it, uv is by far the most convincing solution to Python's environment management problems, best illustrated by this classic XKCD:

Switch from python to uv run and most of these problems go away. I've been using it extensively for the past couple of years and it's become an essential part of my workflow.

I'm not alone in this. According to PyPI Stats uv was downloaded more than 126 million times last month! Since its release in February 2024 - just two years ago - it's become one of the most popular tools for running Python code.

Ruff and ty

Astral's two other big projects are ruff - a Python linter and formatter - and ty - a fast Python type checker.

These are popular tools that provide a great developer experience but they aren't load-bearing in the same way that uv is.

They do however resonate well with coding agent tools like Codex - giving an agent access to fast linting and type checking tools can help improve the quality of the code they generate.

I'm not convinced that integrating them into the coding agent itself as opposed to telling it when to run them will make a meaningful difference, but I may just not be imaginative enough here.

What of pyx?

Ever since uv started to gain traction the Python community has been worrying about the strategic risk of a single VC-backed company owning a key piece of Python infrastructure. I wrote about one of those conversations in detail back in September 2024.

The conversation back then focused on what Astral's business plan could be, which started to take form in August 2025 when they announced pyx, their private PyPI-style package registry for organizations.

I'm less convinced that pyx makes sense within OpenAI, and it's notably absent from both the Astral and OpenAI announcement posts.

Competitive dynamics

An interesting aspect of this deal is how it might impact the competition between Anthropic and OpenAI.

Both companies spent most of 2025 focused on improving the coding ability of their models, resulting in the November 2025 inflection point when coding agents went from often-useful to almost-indispensable tools for software development.

The competition between Anthropic's Claude Code and OpenAI's Codex is fierce. Those $200/month subscriptions add up to billions of dollars a year in revenue, for companies that very much need that money.

Anthropic acquired the Bun JavaScript runtime in December 2025, an acquisition that looks somewhat similar in shape to Astral.

Bun was already a core component of Claude Code and that acquisition looked to mainly be about ensuring that a crucial dependency stayed actively maintained. Claude Code's performance has increased significantly since then thanks to the efforts of Bun's Jarred Sumner.

One bad version of this deal would be if OpenAI start using their ownership of uv as leverage in their competition with Anthropic.

Astral's quiet series A and B

One detail that caught my eye from Astral's announcement, in the section thanking the team, investors, and community:

Second, to our investors, especially Casey Aylward from Accel, who led our Seed and Series A, and Jennifer Li from Andreessen Horowitz, who led our Series B. As a first-time, technical, solo founder, you showed far more belief in me than I ever showed in myself, and I will never forget that.

As far as I can tell neither the Series A nor the Series B were previously announced - I've only been able to find coverage of the original seed round from April 2023.

Those investors presumably now get to exchange their stake in Astral for a piece of OpenAI. I wonder how much influence they had on Astral's decision to sell.

Forking as a credible exit?

Armin Ronacher built Rye, which was later taken over by Astral and effectively merged with uv. In August 2024 he wrote about the risk involved in a VC-backed company owning a key piece of open source infrastructure and said the following (highlight mine):

However having seen the code and what uv is doing, even in the worst possible future this is a very forkable and maintainable thing. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.

Astral's own Douglas Creager emphasized this angle on Hacker News today:

All I can say is that right now, we're committed to maintaining our open-source tools with the same level of effort, care, and attention to detail as before. That does not change with this acquisition. No one can guarantee how motives, incentives, and decisions might change years down the line. But that's why we bake optionality into it with the tools being permissively licensed. That makes the worst-case scenarios have the shape of "fork and move on", and not "software disappears forever".

I like and trust the Astral team and I'm optimistic that their projects will be well-maintained in their new home.

OpenAI don't yet have much of a track record with respect to acquiring and maintaining open source projects. They've been on a bit of an acquisition spree over the past three months though, snapping up Promptfoo and OpenClaw (sort-of, they hired creator Peter Steinberger and are spinning OpenClaw off to a foundation), plus closed source LaTeX platform Crixet (now Prism).

If things do go south for uv and the other Astral projects we'll get to see how credible the forking exit strategy turns out to be.

Tags: python, ai, rust, openai, ruff, uv, astral, charlie-marsh, coding-agents, codex-cli, ty

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

2026-02-04T14:59:47+00:00

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.

sqlite-scanner

sqlite-scanner is my new Go CLI tool for scanning a filesystem for SQLite database files.

It works by checking if the first 16 bytes of the file exactly match the SQLite magic number sequence SQLite format 3\x00. It can search one or more folders recursively, spinning up concurrent goroutines to accelerate the scan. It streams out results as it finds them in plain text, JSON or newline-delimited JSON. It can optionally display the file sizes as well.

To try it out you can download a release from the GitHub releases - and then jump through macOS hoops to execute an "unsafe" binary. Or you can clone the repo and compile it with Go. Or... you can run the binary like this:

uvx sqlite-scanner

By default this will search your current directory for SQLite databases. You can pass one or more directories as arguments:

uvx sqlite-scanner ~ /tmp

Add --json for JSON output, --size to include file sizes or --jsonl for newline-delimited JSON. Here's a demo:

uvx sqlite-scanner ~ --jsonl --size

If you haven't been uv-pilled yet you can instead install sqlite-scanner using pip install sqlite-scanner and then run sqlite-scanner.

To get a permanent copy with uv use uv tool install sqlite-scanner.

How the Python package works

The reason this is worth doing is that pip, uv and PyPI will work together to identify the correct compiled binary for your operating system and architecture.

This is driven by file names. If you visit the PyPI downloads for sqlite-scanner you'll see the following files:

sqlite_scanner-0.1.1-py3-none-win_arm64.whl
sqlite_scanner-0.1.1-py3-none-win_amd64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_x86_64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_aarch64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_x86_64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_aarch64.whl
sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl
sqlite_scanner-0.1.1-py3-none-macosx_10_9_x86_64.whl

When I run pip install sqlite-scanner or uvx sqlite-scanner on my Apple Silicon Mac laptop Python's packaging magic ensures I get that macosx_11_0_arm64.whl variant.

Here's what's in the wheel, which is a zip file with a .whl extension.

In addition to the bin/sqlite-scanner the most important file is sqlite_scanner/__init__.py which includes the following:

def get_binary_path():
    """Return the path to the bundled binary."""
    binary = os.path.join(os.path.dirname(__file__), "bin", "sqlite-scanner")
 
    # Ensure binary is executable on Unix
    if sys.platform != "win32":
        current_mode = os.stat(binary).st_mode
        if not (current_mode & stat.S_IXUSR):
            os.chmod(binary, current_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
 
    return binary
 
 
def main():
    """Execute the bundled binary."""
    binary = get_binary_path()
 
    if sys.platform == "win32":
        # On Windows, use subprocess to properly handle signals
        sys.exit(subprocess.call([binary] + sys.argv[1:]))
    else:
        # On Unix, exec replaces the process
        os.execvp(binary, [binary] + sys.argv[1:])

That main() method - also called from sqlite_scanner/__main__.py - locates the binary and executes it when the Python package itself is executed, using the sqlite-scanner = sqlite_scanner:main entry point defined in the wheel.

Which means we can use it as a dependency

Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.

I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.

That's genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.

To demonstrate this, I built datasette-scan - a new Datasette plugin which depends on sqlite-scanner and then uses that Go binary to scan a folder for SQLite databases and attach them to a Datasette instance.

Here's how to use that (without even installing anything first, thanks uv) to explore any SQLite databases in your Downloads folder:

uv run --with datasette-scan datasette scan ~/Downloads

If you peek at the code you'll see it depends on sqlite-scanner in pyproject.toml and calls it using subprocess.run() against sqlite_scanner.get_binary_path() in its own scan_directories() function.

I've been exploring this pattern for other, non-Go binaries recently - here's a recent script that depends on static-ffmpeg to ensure that ffmpeg is available for the script to use.

Building Python wheels from Go packages with go-to-wheel

After trying this pattern myself a couple of times I realized it would be useful to have a tool to automate the process.

I first brainstormed with Claude to check that there was no existing tool to do this. It pointed me to maturin bin which helps distribute Rust projects using Python wheels, and pip-binary-factory which bundles all sorts of other projects, but did not identify anything that addressed the exact problem I was looking to solve.

So I had Claude Code for web build the first version, then refined the code locally on my laptop with the help of more Claude Code and a little bit of OpenAI Codex too, just to mix things up.

The full documentation is in the simonw/go-to-wheel repository. I've published that tool to PyPI so now you can run it using:

uvx go-to-wheel --help

The sqlite-scanner package you can see on PyPI was built using go-to-wheel like this:

uvx go-to-wheel ~/dev/sqlite-scanner \
  --set-version-var main.version \
  --version 0.1.1 \
  --readme README.md \
  --author 'Simon Willison' \
  --url https://github.com/simonw/sqlite-scanner \
  --description 'Scan directories for SQLite databases'

This created a set of wheels in the dist/ folder. I tested one of them like this:

uv run --with dist/sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl \
  sqlite-scanner --version

When that spat out the correct version number I was confident everything had worked as planned, so I pushed the whole set of wheels to PyPI using twine upload like this:

uvx twine upload dist/*

I had to paste in a PyPI API token I had saved previously.

I expect to use this pattern a lot

sqlite-scanner is very clearly meant as a proof-of-concept for this wider pattern - Python is very much capable of recursively crawling a directory structure looking for files that start with a specific byte prefix on its own!

That said, I think there's a lot to be said for this pattern. Go is a great complement to Python - it's fast, compiles to small self-contained binaries, has excellent concurrency support and a rich ecosystem of libraries.

Go is similar to Python in that it has a strong standard library. Go is particularly good for HTTP tooling - I've built several HTTP proxies in the past using Go's excellent net/http/httputil.ReverseProxy handler.

I've also been experimenting with wazero, Go's robust and mature zero dependency WebAssembly runtime as part of my ongoing quest for the ideal sandbox for running untrusted code. Here's my latest experiment with that library.

Being able to seamlessly integrate Go binaries into Python projects without the end user having to think about Go at all - they pip install and everything Just Works - feels like a valuable addition to my toolbox.

Tags: go, packaging, projects, pypi, python, sqlite, datasette, ai-assisted-programming, uv

Datasette 1.0a24

2026-01-29T17:21:51+00:00

Datasette 1.0a24

New Datasette alpha this morning. Key new features:

Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to rows of data.
The recommended development environment for hacking on Datasette itself now uses uv. Crucially, you can clone Datasette and run uv run pytest to run the tests without needing to manually create a virtual environment or install dependencies first, thanks to the dev dependency group pattern.
A new ?_extra=render_cell parameter for both table and row JSON pages to return the results of executing the render_cell() plugin hook. This should unlock new JavaScript UI features in the future.

More details in the release notes. I also invested a bunch of work in eliminating flaky tests that were intermittently failing in CI - I think those are all handled now.

Tags: projects, python, datasette, annotated-release-notes, uv

Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation

2026-01-22T17:42:34+00:00

Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation

I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got really good, so I can't speak for how notable this new release from Qwen is.

From the accompanying paper:

In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.

To give an idea of size, Qwen/Qwen3-TTS-12Hz-1.7B-Base is 4.54GB on Hugging Face and Qwen/Qwen3-TTS-12Hz-0.6B-Base is 2.52GB.

The Hugging Face demo lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:

I tried this out by recording myself reading my about page and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:

Your browser does not support the audio element.

It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.

Update: Prince Canuma got this working with his mlx-audio library. I had Claude turn that into a CLI tool which you can run with uv ike this:

uv run https://tools.simonwillison.net/python/q3_tts.py \
  'I am a pirate, give me your gold!' \
  -i 'gruff voice' -o pirate.wav

The -i option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.

Via Hacker News

Tags: text-to-speech, ai, generative-ai, hugging-face, uv, qwen, mlx, prince-canuma, ai-in-china

How uv got so fast

2025-12-26T23:43:15+00:00

How uv got so fast

Andrew Nesbitt provides an insightful teardown of why uv is so much faster than pip. It's not nearly as simple as just "they rewrote it in Rust" - uv gets to skip a huge amount of Python packaging history (which pip needs to implement for backwards compatibility) and benefits enormously from work over recent years that makes it possible to resolve dependencies across most packages without having to execute the code in setup.py using a Python interpreter.

Two notes that caught my eye that I hadn't understood before:

HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust.

[...]

Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. Over 90% of versions fit in one u64. This is micro-optimization that compounds across millions of comparisons.

I wanted to learn more about these tricks, so I fired up an asynchronous research task and told it to checkout the astral-sh/uv repo, find the Rust code for both of those features and try porting it to Python to help me understand how it works.

Here's the report that it wrote for me, the prompts I used and the Claude Code transcript.

You can try the script it wrote for extracting metadata from a wheel using HTTP range requests like this:

uv run --with httpx https://raw.githubusercontent.com/simonw/research/refs/heads/main/http-range-wheel-metadata/wheel_metadata.py https://files.pythonhosted.org/packages/8b/04/ef95b67e1ff59c080b2effd1a9a96984d6953f667c91dfe9d77c838fc956/playwright-1.57.0-py3-none-macosx_11_0_arm64.whl -v

The Playwright wheel there is ~40MB. Adding -v at the end causes the script to spit out verbose details of how it fetched the data - which looks like this.

Key extract from that output:

[1] HEAD request to get file size...
    File size: 40,775,575 bytes
[2] Fetching last 16,384 bytes (EOCD + central directory)...
    Received 16,384 bytes
[3] Parsed EOCD:
    Central directory offset: 40,731,572
    Central directory size: 43,981
    Total entries: 453
[4] Fetching complete central directory...
    ...
[6] Found METADATA: playwright-1.57.0.dist-info/METADATA
    Offset: 40,706,744
    Compressed size: 1,286
    Compression method: 8
[7] Fetching METADATA content (2,376 bytes)...
[8] Decompressed METADATA: 3,453 bytes

Total bytes fetched: 18,760 / 40,775,575 (100.0% savings)

The section of the report on compact version representation is interesting too. Here's how it illustrates sorting version numbers correctly based on their custom u64 representation:

Sorted order (by integer comparison of packed u64):
  1.0.0a1 (repr=0x0001000000200001)
  1.0.0b1 (repr=0x0001000000300001)
  1.0.0rc1 (repr=0x0001000000400001)
  1.0.0 (repr=0x0001000000500000)
  1.0.0.post1 (repr=0x0001000000700001)
  1.0.1 (repr=0x0001000100500000)
  2.0.0.dev1 (repr=0x0002000000100001)
  2.0.0 (repr=0x0002000000500000)

Tags: performance, python, sorting, rust, uv, http-range-requests, vibe-porting

uv-init-demos

2025-12-24T22:05:23+00:00

uv-init-demos

uv has a useful uv init command for setting up new Python projects, but it comes with a bunch of different options like --app and --package and --lib and I wasn't sure how they differed.

So I created this GitHub repository which demonstrates all of those options, generated using this update-projects.sh script (thanks, Claude) which will run on a schedule via GitHub Actions to capture any changes made by future releases of uv.

Tags: projects, python, github-actions, git-scraping, uv

Poe the Poet

2025-12-16T22:57:02+00:00

Poe the Poet

I was looking for a way to specify additional commands in my pyproject.toml file to execute using uv. There's an enormous issue thread on this in the uv issue tracker (300+ comments dating back to August 2024) and from there I learned of several options including this one, Poe the Poet.

It's neat. I added it to my s3-credentials project just now and the following now works for running the live preview server for the documentation:

uv run poe livehtml

Here's the snippet of TOML I added to my pyproject.toml:

[dependency-groups]
test = [
    "pytest",
    "pytest-mock",
    "cogapp",
    "moto>=5.0.4",
]
docs = [
    "furo",
    "sphinx-autobuild",
    "myst-parser",
    "cogapp",
]
dev = [
    {include-group = "test"},
    {include-group = "docs"},
    "poethepoet>=0.38.0",
]

[tool.poe.tasks]
docs = "sphinx-build -M html docs docs/_build"
livehtml = "sphinx-autobuild -b html docs docs/_build"
cog = "cog -r docs/*.md"

Since poethepoet is in the dev= dependency group any time I run uv run ... it will be available in the environment.

Tags: packaging, python, s3-credentials, uv

LLM 0.28

2025-12-12T20:20:14+00:00

LLM 0.28

I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:

New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 and gpt-5.2-chat-latest. #1300, #1317

When fetching URLs as fragments using llm -f URL, the request now includes a custom user-agent header: llm/VERSION (https://llm.datasette.io/). #1309

Fixed a bug where fragments were not correctly registered with their source when using llm chat. Thanks, Giuseppe Rota. #1316

Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313

Type annotations for the OpenAI Chat, AsyncChat and Completion execute() methods. Thanks, Arjan Mossel. #1315

The project now uses uv and dependency groups for development. See the updated contributing documentation. #1318

That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:

git clone https://github.com/simonw/llm
cd llm
uv run pytest

The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.

Tags: projects, python, ai, annotated-release-notes, generative-ai, llms, llm, uv

TIL: Dependency groups and uv run

2025-12-03T05:55:23+00:00

TIL: Dependency groups and uv run

I wrote up the new pattern I'm using for my various Python project repos to make them as easy to hack on with uv as possible. The trick is to use a PEP 735 dependency group called dev, declared in pyproject.toml like this:

[dependency-groups]
dev = ["pytest"]

With that in place, running uv run pytest will automatically install that development dependency into a new virtual environment and use it to run your tests.

This means you can get started hacking on one of my projects (here datasette-extract) with just these steps:

git clone https://github.com/datasette/datasette-extract
cd datasette-extract
uv run pytest

I also split my uv TILs out into a separate folder. This meant I had to setup redirects for the old paths, so I had Claude Code help build me a new plugin called datasette-redirects and then apply it to my TIL site, including updating the build script to correctly track the creation date of files that had since been renamed.

Tags: packaging, python, ai, til, generative-ai, llms, ai-assisted-programming, uv, coding-agents, claude-code

sqlite-utils 3.39

2025-11-24T18:59:14+00:00

sqlite-utils 3.39

I got a report of a bug in sqlite-utils concerning plugin installation - if you installed the package using uv tool install further attempts to install plugins with sqlite-utils install X would fail, because uv doesn't bundle pip by default. I had the same bug with Datasette a while ago, turns out I forgot to apply the fix to sqlite-utils.

Since I was pushing a new dot-release I decided to integrate some of the non-breaking changes from the 4.0 alpha I released last night.

I tried to have Claude Code do the backporting for me:

create a new branch called 3.x starting with the 3.38 tag, then consult https://github.com/simonw/sqlite-utils/issues/688 and cherry-pick the commits it lists in the second comment, then review each of the links in the first comment and cherry-pick those as well. After each cherry-pick run the command "just test" to confirm the tests pass and fix them if they don't. Look through the commit history on main since the 3.38 tag to help you with this task.

This worked reasonably well - here's the terminal transcript. It successfully argued me out of two of the larger changes which would have added more complexity than I want in a small dot-release like this.

I still had to do a bunch of manual work to get everything up to scratch, which I carried out in this PR - including adding comments there and then telling Claude Code:

Apply changes from the review on this PR https://github.com/simonw/sqlite-utils/pull/689

Here's the transcript from that.

The release is now out with the following release notes:

Fixed a bug with sqlite-utils install when the tool had been installed using uv. (#687)

The --functions argument now optionally accepts a path to a Python file as an alternative to a string full of code, and can be specified multiple times - see Defining custom SQL functions. (#659)

sqlite-utils now requires on Python 3.10 or higher.

Tags: projects, sqlite, sqlite-utils, annotated-release-notes, uv, coding-agents, claude-code

llm-anthropic 0.22

2025-11-15T20:48:38+00:00

llm-anthropic 0.22

New release of my llm-anthropic plugin:

Support for Claude's new structured outputs feature for Sonnet 4.5 and Opus 4.1. #54

Support for the web search tool using -o web_search 1 - thanks Nick Powell and Ian Langworth. #30

The plugin previously powered LLM schemas using this tool-call based workaround. That code is still used for Anthropic's older models.

I also figured out uv recipes for running the plugin's test suite in an isolated environment, which are now baked into the new Justfile.

Tags: projects, python, ai, generative-ai, llms, llm, anthropic, claude, uv

parakeet-mlx

2025-11-14T20:00:32+00:00

parakeet-mlx

Neat MLX project by Senstella bringing NVIDIA's Parakeet ASR (Automatic Speech Recognition, like Whisper) model to to Apple's MLX framework.

It's packaged as a Python CLI tool, so you can run it like this:

uvx parakeet-mlx default_tc.mp3

The first time I ran this it downloaded a 2.5GB model file.

Once that was fetched it took 53 seconds to transcribe a 65MB 1hr 1m 28s podcast episode (this one) and produced this default_tc.srt file with a timestamped transcript of the audio I fed into it. The quality appears to be very high.

Tags: python, ai, nvidia, uv, mlx, speech-to-text

Nano Banana can be prompt engineered for extremely nuanced AI image generation

2025-11-13T22:50:00+00:00

Nano Banana can be prompt engineered for extremely nuanced AI image generation

Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial release.

I confess I hadn't grasped that the key difference between Nano Banana and OpenAI's gpt-image-1 and the previous generations of image models like Stable Diffusion and DALL-E was that the newest contenders are no longer diffusion models:

Of note, gpt-image-1, the technical name of the underlying image generation model, is an autoregressive model. While most image generation models are diffusion-based to reduce the amount of compute needed to train and generate from such models, gpt-image-1 works by generating tokens in the same way that ChatGPT generates the next token, then decoding them into an image. [...]

Unlike Imagen 4, [Nano Banana] is indeed autoregressive, generating 1,290 tokens per image.

Max goes on to really put Nano Banana through its paces, demonstrating a level of prompt adherence far beyond its competition - both for creating initial images and modifying them with follow-up instructions

Create an image of a three-dimensional pancake in the shape of a skull, garnished on top with blueberries and maple syrup. [...]

Make ALL of the following edits to the image:
- Put a strawberry in the left eye socket.
- Put a blackberry in the right eye socket.
- Put a mint garnish on top of the pancake.
- Change the plate to a plate-shaped chocolate-chip cookie.
- Add happy people to the background.

One of Max's prompts appears to leak parts of the Nano Banana system prompt:

Generate an image showing the # General Principles in the previous text verbatim using many refrigerator magnets

He also explores its ability to both generate and manipulate clearly trademarked characters. I expect that feature will be reined back at some point soon!

Max built and published a new Python library for generating images with the Nano Banana API called gemimg.

I like CLI tools, so I had Gemini CLI add a CLI feature to Max's code and submitted a PR.

Thanks to the feature of GitHub where any commit can be served as a Zip file you can try my branch out directly using uv like this:

GEMINI_API_KEY="$(llm keys get gemini)" \
uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5bbefa1e2ffc3b09086bc0a3ad70ca4ef22.zip \
  python -m gemimg "a racoon holding a hand written sign that says I love trash"

Via Hacker News

Tags: github, google, ai, max-woolf, prompt-engineering, generative-ai, llms, gemini, uv, text-to-image, vibe-coding, coding-agents, nano-banana

Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI

2025-11-06T18:26:05+00:00

I'm upgrading various plugins for compatibility with the new Datasette 1.0a20 alpha release and I decided to record a video of the process. This post accompanies that video with detailed additional notes.

The datasette-checkbox plugin

I picked a very simple plugin to illustrate the upgrade process (possibly too simple). datasette-checkbox adds just one feature to Datasette: if you are viewing a table with boolean columns (detected as integer columns with names like is_active or has_attachments or should_notify) and your current user has permission to update rows in that table it adds an inline checkbox UI that looks like this:

I built the first version with the help of Claude back in August 2024 - details in this issue comment.

Most of the implementation is JavaScript that makes calls to Datasette 1.0's JSON write API. The Python code just checks that the user has the necessary permissions before including the extra JavaScript.

Running the plugin's tests

The first step in upgrading any plugin is to run its tests against the latest Datasette version.

Thankfully uv makes it easy to run code in scratch virtual environments that include the different code versions you want to test against.

I have a test utility called tadd (for "test against development Datasette") which I use for that purpose. I can run it in any plugin directory like this:

tadd

And it will run the existing plugin tests against whatever version of Datasette I have checked out in my ~/dev/datasette directory.

You can see the full implementation of tadd (and its friend radd described below) in this TIL - the basic version looks like this:

#!/bin/sh
uv run --no-project --isolated \
  --with-editable '.[test]' --with-editable ~/dev/datasette \
  python -m pytest "$@"

I started by running tadd in the datasette-checkbox directory, and got my first failure... but it wasn't due to permissions, it was because the pyproject.toml for the plugin was pinned to a specific mismatched version of Datasette:

dependencies = [
    "datasette==1.0a19"
]

I fixed this problem by swapping == to >= and ran the tests again... and they passed! Which was a problem because I was expecting permission-related failures.

It turns out when I first wrote the plugin I was lazy with the tests - they weren't actually confirming that the table page loaded without errors.

I needed to actually run the code myself to see the expected bug.

First I created myself a demo database using sqlite-utils create-table:

sqlite-utils create-table demo.db \
  demo id integer is_checked integer --pk id

Then I ran it with Datasette against the plugin's code like so:

radd demo.db

Sure enough, visiting /demo/demo produced a 500 error about the missing Datasette.permission_allowed() method.

The next step was to update the test to also trigger this error:

@pytest.mark.asyncio
async def test_plugin_adds_javascript():
    datasette = Datasette()
    db = datasette.add_memory_database("demo")
    await db.execute_write(
        "CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, is_active INTEGER)"
    )
    await datasette.invoke_startup()
    response = await datasette.client.get("/demo/test")
    assert response.status_code == 200

And now tadd fails as expected.

Upgrading the plugin with Codex

It this point I could have manually fixed the plugin itself - which would likely have been faster given the small size of the fix - but instead I demonstrated a bash one-liner I've been using to apply these kinds of changes automatically:

codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"

codex exec runs OpenAI Codex in non-interactive mode - it will loop until it has finished the prompt you give it.

I tell it to consult the subset of the Datasette upgrade documentation that talks about Datasette permissions and then get the tadd command to pass its tests.

This is an example of what I call designing agentic loops - I gave Codex the tools it needed (tadd) and a clear goal and let it get to work on my behalf.

The remainder of the video covers finishing up the work - testing the fix manually, commiting my work using:

git commit -a -m "$(basename "$PWD") for datasette>=1.0a20" \
  -m "Refs https://github.com/simonw/datasette/issues/2577"

Then shipping a 0.1a4 release to PyPI using the pattern described in this TIL. Finally, I demonstrated that the shipped plugin worked in a fresh environment using uvx like this:

uvx --prerelease=allow --with datasette-checkbox \
  datasette --root ~/dev/ecosystem/datasette-checkbox/demo.db

Executing this command installs and runs a fresh Datasette instance with a fresh copy of the new alpha plugin (--prerelease=allow). It's a neat way of confirming that freshly released software works as expected.

A colophon for the video

This video was shot in a single take using Descript, with no rehearsal and perilously little preparation in advance. I recorded through my AirPods and applied the "Studio Sound" filter to clean up the audio. I pasted in a simonwillison.net closing slide from my previous video and exported it locally at 1080p, then uploaded it to YouTube.

Something I learned from the Software Carpentry instructor training course is that making mistakes in front of an audience is actively helpful - it helps them see a realistic version of how software development works and they can learn from watching you recover. I see this as a great excuse for not editing out all of my mistakes!

I'm trying to build new habits around video content that let me produce useful videos while minimizing the amount of time I spend on production.

I plan to iterate more on the format as I get more comfortable with the process. I'm hoping I can find the right balance between production time and value to viewers.

Tags: plugins, python, youtube, ai, datasette, generative-ai, llms, ai-assisted-programming, uv, coding-agents, codex-cli

A new SQL-powered permissions system in Datasette 1.0a20

2025-11-04T21:34:42+00:00

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.

Understanding the permissions system

Datasette's permissions system exists to answer the following question:

Is this actor allowed to perform this action, optionally against this particular resource?

An actor is usually a user, but might also be an automation operating via the Datasette API.

An action is a thing they need to do - things like view-table, execute-sql, insert-row.

A resource is the subject of the action - the database you are executing SQL against, the table you want to insert a row into.

Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data.

Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes a write API with a need to configure who can insert, update, and delete rows or create new tables.

Actors can be authenticated in a number of different ways provided by plugins using the actor_from_request() plugin hook. datasette-auth-passwords and datasette-auth-github and datasette-auth-existing-cookies are examples of authentication plugins.

Permissions systems need to be able to efficiently list things

The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result.

This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view?

I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables.

Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is really good at filtering large numbers of records.

The new permission_resources_sql() plugin hook

The biggest change in the new release is that I've replaced the previous permission_allowed(actor, action, resource) plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new permission_resources_sql(actor, action) plugin hook.

Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against.

Here's an example, lifted from the documentation:

from datasette import hookimpl
from datasette.permissions import PermissionSQL


@hookimpl
def permission_resources_sql(datasette, actor, action):
    if action != "view-table":
        return None
    if not actor or actor.get("id") != "alice":
        return None

    return PermissionSQL(
        sql="""
            SELECT
                'accounting' AS parent,
                'sales' AS child,
                1 AS allow,
                'alice can view accounting/sales' AS reason
        """,
    )

This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database.

The PermissionSQL object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging.

When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against the internal catalog tables and efficiently lists all the resources the actor can access.

This query can then be limited or paginated to avoid loading too many results at once.

Hierarchies, plugins, vetoes, and restrictions

Datasette has several additional requirements that make the permissions system more complicated.

Datasette permissions can optionally act against a two-level hierarchy. You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in every database in that Datasette instance.

Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example.

Datasette currently has ten default actions but plugins that add additional features can register new actions to better participate in the permission systems.

Datasette's permission system has a mechanism to veto permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level.

Finally, Datasette includes a mechanism for applying additional restrictions to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are described in more detail in the documentation.

That's a lot of different moving parts for the new implementation to cover.

New debugging tools

Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible.

The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user:

And this page listing the rules that apply to that question - since different plugins may return different rules which get combined together:

This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the --root option.)

Those rules are defined in the datasette/default_permissions.py Python module.

The missing feature: list actors who can act on this resource

There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource.

It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as datasette-auth-github mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before.

API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system.

This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons.

Upgrading plugins for Datasette 1.0a20

Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out.

The Datasette upgrade guide includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document.

I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working extremely well. I've even had Claude Code update those notes itself with things it learned during an upgrade process!

This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect.

I've also been leaning heavily on uv to help with the upgrade process. I wrote myself two new helper scripts - tadd and radd - to help test the new plugins.

tadd = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to pytest so I can run tadd -k test_name or tadd -x --pdb as needed.
radd = "run against datasette dev" - it runs the latest dev datasette command with the plugin installed.

The tadd and radd implementations can be found in this TIL.

Some of my plugin upgrades have become a one-liner to the codex exec command, which runs OpenAI Codex CLI with a prompt without entering interactive mode:

codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"

There are still a bunch more to go - there's a list in this tracking issue - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place.

Using Claude Code to implement this change

This change to Datasette core by far the most ambitious piece of work I've ever attempted using a coding agent.

Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window.

Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable.

I no longer see codebase size as a limiting factor for how useful they can be.

I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right.

For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI.

I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification.

Starting with a proof-of-concept

I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My research/sqlite-permissions-poc project was the one that finally convinced me of a viable approach,

That one started as a free ranging conversation with Claude, at the end of which I told it to generate a specification which I then fed into GPT-5 to implement. You can see that specification at the end of the README.

I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment.

This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months.

Miscellaneous tips I picked up along the way

When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The tadd and radd shortcuts were invaluable for productively working on those plugins while I made changes to core.
Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.
Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.
Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes".
Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch.
A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: Use gh CLI to fetch comments on URL-to-PR and make the requested changes. This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc.
The code I write with LLMs is higher quality code. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low.
Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.
That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as Claude Skills. I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.

What's next?

Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same.

The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week.

This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year!

Tags: plugins, projects, python, sql, sqlite, datasette, annotated-release-notes, uv, coding-agents, claude-code, codex-cli

nanochat

2025-10-13T20:29:58+00:00

nanochat

Really interesting new project from Andrej Karpathy, described at length in this discussion post.

It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be trained for as little as $100:

This repo is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase.

It's around 8,000 lines of code, mostly Python (using PyTorch) plus a little bit of Rust for training the tokenizer.

Andrej suggests renting a 8XH100 NVIDA node for around $24/ hour to train the model. 4 hours (~$100) is enough to get a model that can hold a conversation - almost coherent example here. Run it for 12 hours and you get something that slightly outperforms GPT-2. I'm looking forward to hearing results from longer training runs!

The resulting model is ~561M parameters, so it should run on almost anything. I've run a 4B model on my iPhone, 561M should easily fit on even an inexpensive Raspberry Pi.

The model defaults to training on ~24GB from karpathy/fineweb-edu-100b-shuffle derived from FineWeb-Edu, and then midtrains on 568K examples from SmolTalk (460K), MMLU auxiliary train (100K), and GSM8K (8K), followed by supervised finetuning on 21.4K examples from ARC-Easy (2.3K), ARC-Challenge (1.1K), GSM8K (8K), and SmolTalk (10K).

Here's the code for the web server, which is fronted by this pleasantly succinct vanilla JavaScript HTML+JavaScript frontend.

Update: Sam Dobson pushed a build of the model to sdobson/nanochat on Hugging Face. It's designed to run on CUDA but I pointed Claude Code at a checkout and had it hack around until it figured out how to run it on CPU on macOS, which eventually resulted in this script which I've published as a Gist. You should be able to try out the model using uv like this:

cd /tmp
git clone https://huggingface.co/sdobson/nanochat
uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
--model-dir /tmp/nanochat \
--prompt "Tell me about dogs."

I got this (truncated because it ran out of tokens):

I'm delighted to share my passion for dogs with you. As a veterinary doctor, I've had the privilege of helping many pet owners care for their furry friends. There's something special about training, about being a part of their lives, and about seeing their faces light up when they see their favorite treats or toys.

I've had the chance to work with over 1,000 dogs, and I must say, it's a rewarding experience. The bond between owner and pet

Via @karpathy

Tags: python, ai, rust, pytorch, andrej-karpathy, generative-ai, llms, training-data, uv, gpus, claude-code

TIL: Testing different Python versions with uv with-editable and uv-test

2025-10-09T03:37:06+00:00

TIL: Testing different Python versions with uv with-editable and uv-test

While tinkering with upgrading various projects to handle Python 3.14 I finally figured out a universal uv recipe for running the tests for the current project in any specified version of Python:

uv run --python 3.14 --isolated --with-editable '.[test]' pytest

This should work in any directory with a pyproject.toml (or even a setup.py) that defines a test set of extra dependencies and uses pytest.

The --with-editable '.[test]' bit ensures that changes you make to that directory will be picked up by future test runs. The --isolated flag ensures no other environments will affect your test run.

I like this pattern so much I built a little shell script that uses it, shown here. Now I can change to any Python project directory and run:

uv-test

Or for a different Python version:

uv-test -p 3.11

I can pass additional pytest options too:

uv-test -p 3.11 -k permissions

Tags: python, testing, pytest, til, uv

Claude can write complete Datasette plugins now

2025-10-08T23:43:43+00:00

This isn't necessarily surprising, but it's worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now.

I've seen models complete aspects of this in the past, but today is the first time I've shipped a new plugin where every line of code and test was written by Claude, with minimal prompting from myself.

The plugin is called datasette-os-info. It's a simple debugging tool - all it does is add a /-/os JSON page which dumps out as much information as it can about the OS it's running on. Here's a live demo on my TIL website.

I built it to help experiment with changing the Docker base container that Datasette uses to publish images to one that uses Python 3.14.

Here's the full set of commands I used to create the plugin. I started with my datasette-plugin cookiecutter template:

uvx cookiecutter gh:simonw/datasette-plugin

  [1/8] plugin_name (): os-info
  [2/8] description (): Information about the current OS
  [3/8] hyphenated (os-info): 
  [4/8] underscored (os_info): 
  [5/8] github_username (): datasette
  [6/8] author_name (): Simon Willison
  [7/8] include_static_directory (): 
  [8/8] include_templates_directory ():

This created a datasette-os-info directory with the initial pyproject.toml and tests/ and datasette_os_info/__init__.py files. Here's an example of that starter template.

I created a uv virtual environment for it, installed the initial test dependencies and ran pytest to check that worked:

cd datasette-os-info
uv venv
uv sync --extra test
uv run pytest

Then I fired up Claude Code in that directory in YOLO mode:

claude --dangerously-skip-permissions

(I actually used my claude-yolo shortcut which runs the above.)

Then, in Claude, I told it how to run the tests:

Run uv run pytest

When that worked, I told it to build the plugin:

This is a Datasette plugin which should add a new page /-/os which returns pretty-printed JSON about the current operating system - implement it. I want to pick up as many details as possible across as many OS as possible, including if possible figuring out the base image if it is in a docker container - otherwise the Debian OS release name and suchlike would be good

... and that was it! Claude implemented the plugin using Datasette's register_routes() plugin hook to add the /-/os page,and then without me prompting it to do so built this basic test as well.

It ran the new test, spotted a bug (it had guessed a non-existent Response(..., default_repr=) parameter), fixed the bug and declared itself done.

I built myself a wheel:

uv pip install build
uv run python -m build

Then uploaded that to an S3 bucket and deployed it to test it out using datasette publish ... --install URL-to-wheel. It did exactly what I had hoped - here's what that /-/os page looked like:

{
  "platform": {
    "system": "Linux",
    "release": "4.4.0",
    "version": "#1 SMP Sun Jan 10 15:06:54 PST 2016",
    "machine": "x86_64",
    "processor": "",
    "architecture": [
      "64bit",
      ""
    ],
    "platform": "Linux-4.4.0-x86_64-with-glibc2.41",
    "python_version": "3.14.0",
    "python_implementation": "CPython"
  },
  "hostname": "localhost",
  "cpu_count": 2,
  "linux": {
    "os_release": {
      "PRETTY_NAME": "Debian GNU/Linux 13 (trixie)",
      "NAME": "Debian GNU/Linux",
      "VERSION_ID": "13",
      "VERSION": "13 (trixie)",
      "VERSION_CODENAME": "trixie",
      "DEBIAN_VERSION_FULL": "13.1",
      "ID": "debian",
      "HOME_URL": "https://www.debian.org/",
      "SUPPORT_URL": "https://www.debian.org/support",
      "BUG_REPORT_URL": "https://bugs.debian.org/"
    },
    "debian_version": "13.1",
    "kernel": {
      "sysname": "Linux",
      "nodename": "localhost",
      "release": "4.4.0",
      "version": "#1 SMP Sun Jan 10 15:06:54 PST 2016",
      "machine": "x86_64"
    },
    "cpu_model": "unknown",
    "memory": {
      "MemTotal": "1048576 kB",
      "MemFree": "958244 kB",
      "MemAvailable": "958244 kB",
      "SwapTotal": "0 kB",
      "SwapFree": "0 kB"
    }
  },
  "environment": {
    "HOME": "/root",
    "PATH": "/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
  },
  "python_executable": "/usr/local/bin/python3.14"
}

I dread to think how long it would have taken me to figure out all of those different useful OS details!

To finish up, I had Claude fill out the README via a couple of extra prompts:

Add usage instructions to the README (about the /-/os page)

Then had a brief conversation about how safe the output was:

Is there anything potentially sensitive in that output?

Claude said:

Yes, there are several potentially sensitive items in the output:

Most sensitive:

hostname - Could reveal internal naming schemes or organization details

Environment variables:

HOME - Reveals user's home directory path

USER - Reveals the username

PATH - Reveals filesystem structure and software installation locations

VIRTUAL_ENV - Could reveal project paths and directory structure

python_executable - Reveals filesystem paths and user directories

Moderately sensitive:

CPU count and memory info - Could reveal infrastructure capacity

Kernel version details - Could potentially be used to identify specific vulnerabilities

Considerations:

This endpoint is accessible without authentication by default (at /-/os)

If you're running Datasette on a public server, this information would be visible to anyone

The information could help attackers understand your infrastructure setup

Suggestions:

You might want to:

Add authentication requirements to this endpoint

Make certain fields optional/configurable

Add a warning in the README that this exposes system information

Consider filtering out or redacting sensitive paths (replace usernames with ***, etc.)

Would you like me to add any of these protections to the plugin?

In this case I'm fine with the current behavior provided people understand the risks, so I told it:

Add the warning to the README

And then (because the warning it added was quite long):

Drop the bit about configuring the permission system, just have a short warning telling people to review what it exposes

And requested an extra README note:

Add a note that you can also see the output by running: datasette --get /-/os

Three last prompts:

Add uv.lock to gitignore
Drop Python 3.9 and add Python 3.14 - to the GitHub workflows, also min version in pyproject.toml
Bump to setup-python@v6

... and that was the project finished. I pushed it to GitHub, configured Trusted Publishing for it on PyPI and posted the 0.1 release, which ran this GitHub Actions publish.yml and deployed that release to datasette-os-info on PyPI.

Now that it's live you can try it out without even installing Datasette using a uv one-liner like this:

uv run --isolated \
  --with datasette-os-info \
  datasette --get /-/os

That's using the --get PATH CLI option to show what that path in the Datasette instance would return, as described in the Datasette documentation.

I've shared my full Claude Code transcript in a Gist.

A year ago I'd have been very impressed by this. Today I wasn't even particularly surprised that this worked - the coding agent pattern implemented by Claude Code is spectacularly effective when you combine it with pre-existing templates, and Datasette has been aroung for long enough now that plenty of examples of plugins have made it into the training data for the leading models.

Tags: plugins, projects, python, ai, datasette, generative-ai, llms, ai-assisted-programming, anthropic, claude, uv, coding-agents, claude-code

Python 3.14

2025-10-08T04:10:06+00:00

Python 3.14

This year's major Python version, Python 3.14, just made its first stable release!

As usual the what's new in Python 3.14 document is the best place to get familiar with the new release:

The biggest changes include template string literals, deferred evaluation of annotations, and support for subinterpreters in the standard library.

The library changes include significantly improved capabilities for introspection in asyncio, support for Zstandard via a new compression.zstd module, syntax highlighting in the REPL, as well as the usual deprecations and removals, and improvements in user-friendliness and correctness.

Subinterpreters look particularly interesting as a way to use multiple CPU cores to run Python code despite the continued existence of the GIL. If you're feeling brave and your dependencies cooperate you can also use the free-threaded build of Python 3.14 - now officially supported - to skip the GIL entirely.

A new major Python release means an older release hits the end of its support lifecycle - in this case that's Python 3.9. If you maintain open source libraries that target every supported Python versions (as I do) this means features introduced in Python 3.10 can now be depended on! What's new in Python 3.10 lists those - I'm most excited by structured pattern matching (the match/case statement) and the union type operator, allowing int | float | None as a type annotation in place of Optional[Union[int, float]].

If you use uv you can grab a copy of 3.14 using:

uv self update
uv python upgrade 3.14
uvx python@3.14

Or for free-threaded Python 3.1;:

uvx python@3.14t

The uv team wrote about their Python 3.14 highlights in their announcement of Python 3.14's availability via uv.

The GitHub Actions setup-python action includes Python 3.14 now too, so the following YAML snippet in will run tests on all currently supported versions:

strategy:
  matrix:
    python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- uses: actions/setup-python@v6
  with:
    python-version: ${{ matrix.python-version }}

Full example here for one of my many Datasette plugin repos.

Tags: gil, open-source, python, github-actions, psf, uv

gpt-image-1-mini

2025-10-06T22:54:32+00:00

gpt-image-1-mini

OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model."

They released it very quietly - I didn't hear about this in the DevDay keynote but I later spotted it on the DevDay 2025 announcements page.

It wasn't instantly obvious to me how to use this via their API. I ended up vibe coding a Python CLI tool for it so I could try it out.

I dumped the plain text diff version of the commit to the OpenAI Python library titled feat(api): dev day 2025 launches into ChatGPT GPT-5 Thinking and worked with it to figure out how to use the new image model and build a script for it. Here's the transcript and the the openai_image.py script it wrote.

I had it add inline script dependencies, so you can run it with uv like this:

export OPENAI_API_KEY="$(llm keys get openai)"
uv run https://tools.simonwillison.net/python/openai_image.py "A pelican riding a bicycle"

It picked this illustration style without me specifying it:

(This is a very different test from my normal "Generate an SVG of a pelican riding a bicycle" since it's using a dedicated image generator, not having a text-based model try to generate SVG code.)

My tool accepts a prompt, and optionally a filename (if you don't provide one it saves to a filename like /tmp/image-621b29.png).

It also accepts options for model and dimensions and output quality - the --help output lists those, you can see that here.

OpenAI's pricing is a little confusing. The model page claims low quality images should cost around half a cent and medium quality around a cent and a half. It also lists an image token price of $8/million tokens. It turns out there's a default "high" quality setting - most of the images I've generated have reported between 4,000 and 6,000 output tokens, which costs between 3.2 and 4.8 cents.

One last demo, this time using --quality low:

 uv run https://tools.simonwillison.net/python/openai_image.py \
  'racoon eating cheese wearing a top hat, realistic photo' \
  /tmp/racoon-hat-photo.jpg \
  --size 1024x1024 \
  --output-format jpeg \
  --quality low

This saved the following:

And reported this to standard error:

{
  "background": "opaque",
  "created": 1759790912,
  "generation_time_in_s": 20.87331541599997,
  "output_format": "jpeg",
  "quality": "low",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 17,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 17
    },
    "output_tokens": 272,
    "total_tokens": 289
  }
}

This took 21s, but I'm on an unreliable conference WiFi connection so I don't trust that measurement very much.

272 output tokens = 0.2 cents so this is much closer to the expected pricing from the model page.

Tags: python, tools, ai, openai, generative-ai, uv, text-to-image, pelican-riding-a-bicycle, vibe-coding

Rich Pixels

2025-09-02T11:05:23+00:00

Rich Pixels

Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks.

Here's the key trick - it renders Unicode ▄ (U+2584, "lower half block") characters after setting a foreground and background color for the two pixels it needs to display.

I got GPT-5 to vibe code up a show_image.py terminal command which resizes the provided image to fit the width and height of the current terminal and displays it using Rich Pixels. That script is here, you can run it with uv like this:

uv run https://tools.simonwillison.net/python/show_image.py \
  image.jpg

Here's what I got when I ran it against my V&A East Storehouse photo from this post:

Tags: ascii-art, cli, python, unicode, ai, generative-ai, llms, uv, vibe-coding, gpt-5, rich

Static Sites with Python, uv, Caddy, and Docker

2025-08-24T08:51:30+00:00

Static Sites with Python, uv, Caddy, and Docker

Nik Kantar documents his Docker-based setup for building and deploying mostly static web sites in line-by-line detail.

I found this really useful. The Dockerfile itself without comments is just 8 lines long:

FROM ghcr.io/astral-sh/uv:debian AS build
WORKDIR /src
COPY . .
RUN uv python install 3.13
RUN uv run --no-dev sus
FROM caddy:alpine
COPY Caddyfile /etc/caddy/Caddyfile
COPY --from=build /src/output /srv/

He also includes a Caddyfile that shows how to proxy a subset of requests to the Plausible analytics service.

The static site is built using his sus package for creating static URL redirecting sites, but would work equally well for another static site generator you can install and run with uv run.

Nik deploys his sites using Coolify, a new-to-me take on the self-hosting alternative to Heroku/Vercel pattern which helps run multiple sites on a collection of hosts using Docker containers.

A bunch of the Hacker News comments dismissed this as over-engineering. I don't think that criticism is justified - given Nik's existing deployment environment I think this is a lightweight way to deploy static sites in a way that's consistent with how everything else he runs works already.

More importantly, the world needs more articles like this that break down configuration files and explain what every single line of them does.

Via Hacker News

Tags: python, docker, uv

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

2025-08-19T23:39:19+00:00

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

As promised in their August 4th release of the Qwen image generation model, Qwen have now followed it up with a separate model, Qwen-Image-Edit, which can take an image and a prompt and return an edited version of that image.

Ivan Fioravanti upgraded his macOS qwen-image-mps tool (previously) to run the new model via a new edit command. Since it's now on PyPI you can run it directly using uvx like this:

uvx qwen-image-mps edit -i pelicans.jpg \
  -p 'Give the pelicans rainbow colored plumage' -s 10

Be warned... it downloads a 54GB model file (to ~/.cache/huggingface/hub/models--Qwen--Qwen-Image-Edit) and appears to use all 64GB of my system memory - if you have less than 64GB it likely won't work, and I had to quit almost everything else on my system to give it space to run. A larger machine is almost required to use this.

I fed it this image:

The following prompt:

Give the pelicans rainbow colored plumage

And told it to use just 10 inference steps - the default is 50, but I didn't want to wait that long.

It still took nearly 25 minutes (on a 64GB M2 MacBook Pro) to produce this result:

To get a feel for how much dropping the inference steps affected things I tried the same prompt with the new "Image Edit" mode of Qwen's chat.qwen.ai, which I believe uses the same model. It gave me a result much faster that looked like this:

Update: I left the command running overnight without the -s 10 option - so it would use all 50 steps - and my laptop took 2 hours and 59 minutes to generate this image, which is much more photo-realistic and similar to the one produced by Qwen's hosted model:

Marko Simic reported that:

50 steps took 49min on my MBP M4 Max 128GB

Tags: macos, python, ai, generative-ai, uv, qwen, text-to-image, ivan-fioravanti

TIL: Running a gpt-oss eval suite against LM Studio on a Mac

2025-08-17T03:46:21+00:00

TIL: Running a gpt-oss eval suite against LM Studio on a Mac

The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations.

I decided to try and run that eval suite on my own MacBook Pro, against gpt-oss-20b running inside of LM Studio.

TLDR: once I had the model running inside LM Studio with a longer than default context limit, the following incantation ran an eval suite in around 3.5 hours:

mkdir /tmp/aime25_openai
OPENAI_API_KEY=x \
  uv run --python 3.13 --with 'gpt-oss[eval]' \
  python -m gpt_oss.evals \
  --base-url http://localhost:1234/v1 \
  --eval aime25 \
  --sampler chat_completions \
  --model openai/gpt-oss-20b \
  --reasoning-effort low \
  --n-threads 2

My new TIL breaks that command down in detail and walks through the underlying eval - AIME 2025, which asks 30 questions (8 times each) that are defined using the following format:

{"question": "Find the sum of all integer bases $b>9$ for which $17_{b}$ is a divisor of $97_{b}$.", "answer": "70"}

Tags: python, ai, til, openai, generative-ai, local-llms, llms, evals, uv, lm-studio, gpt-oss

pyx: a Python-native package registry, now in Beta

2025-08-13T18:36:51+00:00

pyx: a Python-native package registry, now in Beta

Since its first release, the single biggest question around the uv Python environment management tool has been around Astral's business model: Astral are a VC-backed company and at some point they need to start making real revenue.

Back in September Astral founder Charlie Marsh said the following:

I don't want to charge people money to use our tools, and I don't want to create an incentive structure whereby our open source offerings are competing with any commercial offerings (which is what you see with a lost of hosted-open-source-SaaS business models).

What I want to do is build software that vertically integrates with our open source tools, and sell that software to companies that are already using Ruff, uv, etc. Alternatives to things that companies already pay for today.

An example of what this might look like (we may not do this, but it's helpful to have a concrete example of the strategy) would be something like an enterprise-focused private package registry. [...]

It looks like those plans have become concrete now! From today's announcement:

TL;DR: pyx is a Python-native package registry --- and the first piece of the Astral platform, our next-generation infrastructure for the Python ecosystem.

We think of pyx as an optimized backend for uv: it's a package registry, but it also solves problems that go beyond the scope of a traditional "package registry", making your Python experience faster, more secure, and even GPU-aware, both for private packages and public sources (like PyPI and the PyTorch index).

pyx is live with our early partners, including Ramp, Intercom, and fal [...]

This looks like a sensible direction to me, and one that stays true to Charlie's promises to carefully design the incentive structure to avoid corrupting the core open source project that the Python community is coming to depend on.

Via @charliermarsh

Tags: open-source, packaging, python, uv, astral, charlie-marsh

qwen-image-mps

2025-08-11T06:19:02+00:00

qwen-image-mps

Ivan Fioravanti built this Python CLI script for running the Qwen/Qwen-Image image generation model on an Apple silicon Mac, optionally using the Qwen-Image-Lightning LoRA to dramatically speed up generation.

Ivan has tested it this on 512GB and 128GB machines and it ran really fast - 42 seconds on his M3 Ultra. I've run it on my 64GB M2 MacBook Pro - after quitting almost everything else - and it just about manages to output images after pegging my GPU (fans whirring, keyboard heating up) and occupying 60GB of my available RAM. With the LoRA option running the script to generate an image took 9m7s on my machine.

Ivan merged my PR adding inline script dependencies for uv which means you can now run it like this:

uv run https://raw.githubusercontent.com/ivanfioravanti/qwen-image-mps/refs/heads/main/qwen-image-mps.py \
-p 'A vintage coffee shop full of raccoons, in a neon cyberpunk city' -f

The first time I ran this it downloaded the 57.7GB model from Hugging Face and stored it in my ~/.cache/huggingface/hub/models--Qwen--Qwen-Image directory. The -f option fetched an extra 1.7GB Qwen-Image-Lightning-8steps-V1.0.safetensors file to my working directory that sped up the generation.

Here's the resulting image:

Via @ivanfioravanti

Tags: macos, python, ai, generative-ai, uv, qwen, text-to-image, ai-in-china, ivan-fioravanti

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

2025-07-31T19:45:36+00:00

Qwen just released their sixth model(!) of this July called Qwen3-Coder-30B-A3B-Instruct - listed as Qwen3-Coder-Flash in their chat.qwen.ai interface.

It's 30.5B total parameters with 3.3B active at any one time. This means it will fit on a 64GB Mac - and even a 32GB Mac if you quantize it - and can run really fast thanks to that smaller set of active parameters.

It's a non-thinking model that is specially trained for coding tasks.

This is an exciting combination of properties: optimized for coding performance and speed and small enough to run on a mid-tier developer laptop.

Trying it out with LM Studio and Open WebUI

I like running models like this using Apple's MLX framework. I ran GLM-4.5 Air the other day using the mlx-lm Python library directly, but this time I decided to try out the combination of LM Studio and Open WebUI.

(LM Studio has a decent interface built in, but I like the Open WebUI one slightly more.)

I installed the model by clicking the "Use model in LM Studio" button on LM Studio's qwen/qwen3-coder-30b page. It gave me a bunch of options:

I chose the 6bit MLX model, which is a 24.82GB download. Other options include 4bit (17.19GB) and 8bit (32.46GB). The download sizes are roughly the same as the amount of RAM required to run the model - picking that 24GB one leaves 40GB free on my 64GB machine for other applications.

Then I opened the developer settings in LM Studio (the green folder icon) and turned on "Enable CORS" so I could access it from a separate Open WebUI instance.

Now I switched over to Open WebUI. I installed and ran it using uv like this:

uvx --python 3.11 open-webui serve

Then navigated to http://localhost:8080/ to access the interface. I opened their settings and configured a new "Connection" to LM Studio:

That needs a base URL of http://localhost:1234/v1 and a key of anything you like. I also set the optional prefix to lm just in case my Ollama installation - which Open WebUI detects automatically - ended up with any duplicate model names.

Having done all of that, I could select any of my LM Studio models in the Open WebUI interface and start running prompts.

A neat feature of Open WebUI is that it includes an automatic preview panel, which kicks in for fenced code blocks that include SVG or HTML:

Here's the exported transcript for "Generate an SVG of a pelican riding a bicycle". It ran at almost 60 tokens a second!

Implementing Space Invaders

I tried my other recent simple benchmark prompt as well:

Write an HTML and JavaScript page implementing space invaders

I like this one because it's a very short prompt that acts as shorthand for quite a complex set of features. There's likely plenty of material in the training data to help the model achieve that goal but it's still interesting to see if they manage to spit out something that works first time.

The first version it gave me worked out of the box, but was a little too hard - the enemy bullets move so fast that it's almost impossible to avoid them:

You can try that out here.

I tried a follow-up prompt of "Make the enemy bullets a little slower". A system like Claude Artifacts or Claude Code implements tool calls for modifying files in place, but the Open WebUI system I was using didn't have a default equivalent which means the model had to output the full file a second time.

It did that, and slowed down the bullets, but it made a bunch of other changes as well, shown in this diff. I'm not too surprised by this - asking a 25GB local model to output a lengthy file with just a single change is quite a stretch.

Here's the exported transcript for those two prompts.

Running LM Studio models with mlx-lm

LM Studio stores its models in the ~/.cache/lm-studio/models directory. This means you can use the mlx-lm Python library to run prompts through the same model like this:

uv run --isolated --with mlx-lm mlx_lm.generate \
  --model ~/.cache/lm-studio/models/lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit \
  --prompt "Write an HTML and JavaScript page implementing space invaders" \
  -m 8192 --top-k 20 --top-p 0.8 --temp 0.7

Be aware that this will load a duplicate copy of the model into memory so you may want to quit LM Studio before running this command!

Accessing the model via my LLM tool

My LLM project provides a command-line tool and Python library for accessing large language models.

Since LM Studio offers an OpenAI-compatible API, you can configure LLM to access models through that API by creating or editing the ~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml file:

zed ~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml

I added the following YAML configuration:

- model_id: qwen3-coder-30b
  model_name: qwen/qwen3-coder-30b
  api_base: http://localhost:1234/v1
  supports_tools: true

Provided LM Studio is running I can execute prompts from my terminal like this:

llm -m qwen3-coder-30b 'A joke about a pelican and a cheesecake'

Why did the pelican refuse to eat the cheesecake?

Because it had a beak for dessert! 🥧🦜

(Or if you prefer: Because it was afraid of getting beak-sick from all that creamy goodness!)

(25GB clearly isn't enough space for a functional sense of humor.)

More interestingly though, we can start exercising the Qwen model's support for tool calling:

llm -m qwen3-coder-30b \
  -T llm_version -T llm_time --td \
  'tell the time then show the version'

Here we are enabling LLM's two default tools - one for telling the time and one for seeing the version of LLM that's currently installed. The --td flag stands for --tools-debug.

The output looks like this, debug output included:

Tool call: llm_time({})
  {
    "utc_time": "2025-07-31 19:20:29 UTC",
    "utc_time_iso": "2025-07-31T19:20:29.498635+00:00",
    "local_timezone": "PDT",
    "local_time": "2025-07-31 12:20:29",
    "timezone_offset": "UTC-7:00",
    "is_dst": true
  }

Tool call: llm_version({})
  0.26

The current time is:
- Local Time (PDT): 2025-07-31 12:20:29
- UTC Time: 2025-07-31 19:20:29

The installed version of the LLM is 0.26.

Pretty good! It managed two tool calls from a single prompt.

Sadly I couldn't get it to work with some of my more complex plugins such as llm-tools-sqlite. I'm trying to figure out if that's a bug in the model, the LM Studio layer or my own code for running tool prompts against OpenAI-compatible endpoints.

The month of Qwen

July has absolutely been the month of Qwen. The models they have released this month are outstanding, packing some extremely useful capabilities even into models I can run in 25GB of RAM or less on my own laptop.

If you're looking for a competent coding model you can run locally Qwen3-Coder-30B-A3B is a very solid choice.

Tags: ai, generative-ai, llms, ai-assisted-programming, llm, uv, qwen, pelican-riding-a-bicycle, llm-release, lm-studio, ai-in-china, space-invaders