Simon Willison's Weblog: annotated-release-notes

sqlite-utils 4.0rc1 adds migrations and nested transactions

2026-06-21T23:35:47+00:00

sqlite-utils is my combined Python library and CLI tool for working with SQLite databases. It provides an extensive set of higher-level operations on top of Python's default sqlite3 package, including support for complex table transformations, automatic table creation from JSON data and a whole lot more.

I released sqlite-utils 4.0rc1, the first release candidate for sqlite-utils v4. The major version bump indicates some (minor) backwards incompatible changes, so I'm interested in having people try this out before I commit to a stable release.

New feature: migrations

There are two significant new features in this RC compared to the previous 4.0 alphas.

The first is support for database migrations. This isn't a completely new implementation - it's a slightly modified port of the sqlite-migrate package I released a few years ago. I think that package has proved itself over time, so I'm now ready to bundle it with sqlite-utils directly.

Here's what a set of migrations in a migrations.py file looks like:

from sqlite_utils import Database, Migrations

migrations = Migrations("creatures")

@migrations()
def create_table(db):
    db["creatures"].create(
        {"id": int, "name": str, "species": str},
        pk="id",
    )

@migrations()
def add_weight(db):
    db["creatures"].add_column("weight", float)

This defines a set of two migrations, one creating the creatures table and another adding a column to it.

You can then run those migrations either using Python:

db = Database("creatures.db")
migrations.apply(db)

Or with the command-line migrate command:

sqlite-utils migrate creatures.db migrations.py

The system is deliberately small: it doesn't provide reverse migrations, so any mistakes you make should be fixed by deploying a fresh migration to undo them.

Its predecessor has been used by LLM and various other projects for several years, so I'm confident that the design is stable and works well.

The new migrations feature is documented here.

New feature: db.atomic() transactions

This feature is a lot less exercised than migrations, so it deserves more attention from testers.

Previously, sqlite-utils mostly left transaction management up to its users, via a with db.conn: construct that reused the sqlite3 mechanism directly.

SQLite supports nested transactions in the form of savepoints, so I wanted an abstraction that could make those as easy to use as possible.

I borrowed the terminology "atomic" from Django and Peewee. Here's what the new API looks like:

with db.atomic():
    db.table("dogs").insert({"id": 1, "name": "Cleo"}, pk="id")
    try:
        with db.atomic():
            db.table("dogs").insert({"id": 2, "name": "Pancakes"})
            raise ValueError("skip this one")
    except ValueError:
        pass
    db.table("dogs").insert({"id": 3, "name": "Marnie"})

Backwards incompatible changes

The backwards incompatible changes in v4 were described in the alpha release notes. For 4.0a0:

Upsert operations now use SQLite's INSERT ... ON CONFLICT SET syntax on all SQLite versions later than 3.23.1. This is a very slight breaking change for apps that depend on the previous INSERT OR IGNORE followed by UPDATE behavior. (#652)

Python library users can opt-in to the previous implementation by passing use_old_upsert=True to the Database() constructor, see Alternative upserts using INSERT OR IGNORE.

Dropped support for Python 3.8, added support for Python 3.13. (#646)

sqlite-utils tui is now provided by the sqlite-utils-tui plugin. (#648)

Test suite now also runs against SQLite 3.23.1, the last version (from 2018-04-10) before the new INSERT ... ON CONFLICT SET syntax was added. (#654)

And for 4.0a1:

Breaking change: The db.table(table_name) method now only works with tables. To access a SQL view use db.view(view_name) instead. (#657)

The table.insert_all() and table.upsert_all() methods can now accept an iterator of lists or tuples as an alternative to dictionaries. The first item should be a list/tuple of column names. See Inserting data from a list or tuple iterator for details. (#672)

Breaking change: The default floating point column type has been changed from FLOAT to REAL, which is the correct SQLite type for floating point values. This affects auto-detected columns when inserting data. (#645)

Now uses pyproject.toml in place of setup.py for packaging. (#675)

Tables in the Python API now do a much better job of remembering the primary key and other schema details from when they were first created. (#655)

Breaking change: The table.convert() and sqlite-utils convert mechanisms no longer skip values that evaluate to False. Previously the --skip-false option was needed, this has been removed. (#542)

Breaking change: Tables created by this library now wrap table and column names in "double-quotes" in the schema. Previously they would use [square-braces]. (#677)

The --functions CLI argument now accepts a path to a Python file in addition to accepting a string full of Python code. It can also now be specified multiple times. (#659)

Breaking change: Type detection is now the default behavior for the insert and upsert CLI commands when importing CSV or TSV data. Previously all columns were treated as TEXT unless the --detect-types flag was passed. Use the new --no-detect-types flag to restore the old behavior. The SQLITE_UTILS_DETECT_TYPES environment variable has been removed. (#679)

Try it out

You can install the new RC like this:

pip install sqlite-utils==4.0rc1

Or try the CLI version directly with uvx like this:

uvx --with sqlite-utils==4.0rc1 sqlite-utils --help

Come chat with us about it in the sqlite-utils Discord channel, or file any bugs in GitHub Issues.

Tags: migrations, projects, sqlite, sqlite-utils, annotated-release-notes

datasette 1.0a34

2026-06-16T21:31:24+00:00

Release: datasette 1.0a34

Quoting the release notes:

The big feature in this alpha is tools to insert, edit and delete rows within the Datasette interface. These features are available on table pages, and edit and delete are also available as action items on the row page.

The inspiration for this feature - which is long overdue - was Datasette Agent. I added SQL write support to that the other day which highlighted how absurd it was that you could insert and edit ties via the chat interface but not in the regular Datasette UI!

Tags: projects, datasette, annotated-release-notes

datasette-agent 0.3a0

2026-06-15T17:19:27+00:00

Release: datasette-agent 0.3a0

New tool, execute_write_sql, which requests user approval and then writes to a database - taking user permissions into account. #27

I added a mechanism for asking user approval in datasette agent 0.2a0. The new execute_write_sql tool can now prompt the user for all kinds of useful operations. Here's an example where I add some pelican sightings to my pelican_sightings table:

The new version also enhances the datasette agent chat terminal mode to support approvals, and adds several new options including --unsafe mode for auto-approving them:

datasette agent chat can execute tools that require user approval. #30

Three new options for datasette agent chat - --root to run as root, --yes to approve all ask user questions, and --unsafe for both.

Tools can now provide plain text alternatives to HTML, for display in the datasette agent chat CLI. #31

The datasette agent chat content.db -m gpt-5.5 --unsafe command can now be used to chat directly with a specific database and directly modify it through prompts like "create a notes table", "add a note about X" etc.

Tags: projects, ai, datasette, annotated-release-notes, generative-ai, llms, llm-tool-use, datasette-agent

datasette 1.0a33

2026-06-11T15:26:49+00:00

Release: datasette 1.0a33

This alpha is a significant step on the road to a stable 1.0, finally extending the ?_extra= pattern I introduced in Datasette 1.0a3 to cover queries and rows in addition to tables. That pattern is also now documented!

I wrote a whole lot more about the new release on the Datasette project blog: Datasette 1.0a33 with JSON extras in the API.

Because API explorer tools are almost free to build now I had Claude Fable 5 in Claude Code (for the plan) and GPT-5.5 xhigh in Codex Desktop (for the implementation) build me this custom extras API explorer to help demonstrate the feature:

Tags: projects, datasette, annotated-release-notes, ai-assisted-programming

datasette 1.0a32

2026-05-31T23:23:38+00:00

Release: datasette 1.0a32

A minor bugfix release. Fixes a bug with INSERT ... RETURNING queries via the new /db/-/execute-write endpoint and a bunch of base_url issues which showed up when I was experimenting with Service Workers yesterday.

Tags: datasette, annotated-release-notes

datasette 1.0a31

2026-05-29T03:32:02+00:00

Release: datasette 1.0a31

Another significant alpha release, with two new headline features.

Datasette now offers users with the necessary permissions the ability to both execute write queries against their database and to save stored queries (renamed from "canned queries") both privately and for use by other members of their Datasette instance.

There's more detail in SQL write queries and stored queries in Datasette 1.0a31 on the Datasette blog, which now has three posts introducing new features since the blog launched two weeks ago.

Here's an animated demo from the blog post showing how the new execute query interface lets people get started with templated insert/update/delete queries from tables they have permission to edit:

Tags: projects, sql, sqlite, datasette, annotated-release-notes

datasette 1.0a30

2026-05-24T23:52:37+00:00

Release: datasette 1.0a30

The big new feature in this alpha is a new customizable "Jump to..." menu, described in detail in The extensible "Jump to" menu in Datasette 1.0a30 on the Datasette blog. You can try it out by hitting / on latest.datasette.io - it looks like this:

The new jump_items_sql() plugin hook allows plugins to add their own items to the set that's searched by the plugin.

Tags: projects, datasette, annotated-release-notes

llm 0.32a2

2026-05-12T17:45:07+00:00

Release: llm 0.32a2

A bunch of useful stuff in this LLM alpha, but the most important detail is this one:

Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions. This enables interleaved reasoning across tool calls for GPT-5 class models. #1435

This means you can now see the summarized reasoning tokens when you run prompts against an OpenAI model, displayed in a different color to standard error. Use the -R or --hide-reasoning flags if you don't want to see that.

Tags: projects, ai, annotated-release-notes, openai, generative-ai, llms, llm

LLM 0.32a0 is a major backwards-compatible refactor

2026-04-29T19:01:47+00:00

I just released LLM 0.32a0, an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while.

Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response.

import llm

model = llm.get_model("gpt-5.5")
response = model.prompt("Capital of France?")
print(response.text())

This made sense when I started working on the library back in April 2023. A lot has changed since then!

LLM provides an abstraction over thousands of different models via its plugin system. The original abstraction - of text input that returns text output - was no longer able to represent everything I needed it to.

Over time LLM itself has grown attachments to handle image, audio, and video input, then schemas for outputting structured JSON, then tools for executing tool calls. Meanwhile LLMs kept evolving, adding reasoning support and the ability to return images and all kinds of other interesting capabilities.

LLM needs to evolve to better handle the diversity of input and output types that can be processed by today's frontier models.

The 0.32a0 alpha has two key changes: model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts.

Prompts as a sequence of messages

LLMs accept input as text, but ever since ChatGPT demonstrated the value of a two-way conversational interface, the most common way to prompt them has been to treat that input as a sequence of conversational turns.

The first turn might look like this:

user: Capital of France?
assistant:

(The model then gets to fill out the reply from the assistant.)

But each subsequent turn needs to replay the entire conversation up to that point, as a sort of screenplay:

user: Capital of France?
assistant: Paris
user: Germany?
assistant:

Most of the JSON APIs from the major vendors follow this pattern. Here's what the above looks like using the OpenAI chat completions API, which has been widely imitated by other providers:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {
        "role": "user",
        "content": "Capital of France?"
      },
      {
        "role": "assistant",
        "content": "Paris"
      },
      {
        "role": "user",
        "content": "Germany?"
      }
    ]
  }'

Prior to 0.32, LLM modeled these as conversations:

model = llm.get_model("gpt-5.5")

conversation = model.conversation()
r1 = conversation.prompt("Capital of France?")
print(r1.text())
# Outputs "Paris"

r2 = conversation.prompt("Germany?")
print(r2.text())
# Outputs "Berlin"

This worked if you were building a conversation with the model from scratch, but it didn't provide a way to feed in a previous conversation from the start. This made tasks like building an emulation of the OpenAI chat completions API much harder than they should have been.

The llm CLI tool worked around this through a custom mechanism for persisting and inflating conversations using SQLite, but that never became a stable part of the LLM API - and there are many places you might want to use the Python library without committing to SQLite as the storage layer.

The new alpha now supports this:

import llm
from llm import user, assistant

model = llm.get_model("gpt-5.5")

response = model.prompt(messages=[
    user("Capital of France?"),
    assistant("Paris"),
    user("Germany?"),
])
print(response.text())

The llm.user() and llm.assistant() functions are new builder functions designed to be used within that messages=[] array.

The previous prompt= option still works, but LLM upgrades it to a single-item messages array behind the scenes.

You can also now reply to a response, as an alternative to building a conversation:

response2 = response.reply("How about Hungary?")
print(response2) # Default __str__() calls .text()

Streaming parts

The other major new interface in the alpha concerns streaming results back from a prompt.

Previously, LLM supported streaming like this:

response = model.prompt("Generate an SVG of a pelican riding a bicycle")
for chunk in response:
    print(chunk, end="")

Or this async variant:

import asyncio
import llm

model = llm.get_async_model("gpt-5.5")
response = model.prompt("Generate an SVG of a pelican riding a bicycle")

async def run():
    async for chunk in response:
        print(chunk, end="", flush=True)

asyncio.run(run())

Many of today's models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content.

Some models can even execute tools on the server-side, for example OpenAI's code interpreter tool or Anthropic's web search. This means the results from the model can combine text, tool calls, tool outputs and other formats.

Multi-modal output models are starting to emerge too, which can return images or even snippets of audio intermixed into that streaming response.

The new LLM alpha models these as a stream of typed message parts. Here's what that looks like as a Python API consumer:

import asyncio
import llm

model = llm.get_model("gpt-5.5")
prompt = "invent 3 cool dogs, first talk about your motivations"

def describe_dog(name: str, bio: str) -> str:
    """Record the name and biography of a hypothetical dog."""
    return f"{name}: {bio}"

def sync_example():
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    for event in response.stream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\nTool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

async def async_example():
    model = llm.get_async_model("gpt-5.5")
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    async for event in response.astream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\nTool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

sync_example()
asyncio.run(async_example())

Sample output (from just the first sync example):

My motivation: create three memorable dogs with distinct “cool” styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story.
Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."}
Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."}
Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."}

At the end of the response you can call response.execute_tool_calls() to actually run the functions that were requested, or send a response.reply() to have those tools called and their return values sent back to the model:

print(response.reply("Tell me about the dogs"))

This new mechanism for streaming different token types means the CLI tool can now display "thinking" text in a different color from the text in the final response. The thinking text goes to stderr so it won't affect results that are piped into other tools.

This example uses Claude Sonnet 4.6 (with an updated streaming event version of the llm-anthropic plugin) as Anthropic's models return their reasoning text as part of the response:

llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' \
  -o thinking_display 1

You can suppress the output of reasoning tokens using the new -R/--no-reasoning flag. Surprisingly that ended up being the only CLI-facing change in this release.

A mechanism for serializing and deserializing responses

As mentioned earlier, LLM has quite inflexible code at the moment for persisting conversations to SQLite. I've added a new mechanism in 0.32a0 that should provide Python API users a way to roll their own alternative:

serializable = response.to_dict()
# serializable is a JSON-style dictionary
# store it anywhere you like, then inflate it:
response = Response.from_dict(serializable)

The dictionary this returns is actually a TypedDict defined in the new llm/serialization.py module.

What's next?

I'm releasing this as an alpha so I can upgrade various plugins and exercise the new design in real world environments for a few days. I expect the stable 0.32 release will be very similar to this alpha, unless alpha testing reveals some design flaw in the way I've put this all together.

There's one remaining large task: I'd like to redesign the SQLite logging system to better capture the more finely grained details that are returned by this new abstraction.

Ideally I'd like to model this as a graph, to best support situations like an OpenAI-style chat completions API where the same conversations are constantly extended and then repeated with every prompt. I want to be able to store those without duplicating them in the database.

I'm undecided as to whether that should be a feature in 0.32 or I should hold it for 0.33.

Tags: projects, python, ai, annotated-release-notes, generative-ai, llms, llm

datasette 1.0a27

2026-04-15T23:16:34+00:00

Release: datasette 1.0a27

Two major changes in this new Datasette alpha. I covered the first of those in detail yesterday - Datasette no longer uses Django-style CSRF form tokens, instead using modern browser headers as described by Filippo Valsorda.

The second big change is that Datasette now fires a new RenameTableEvent any time a table is renamed during a SQLite transaction. This is useful because some plugins (like datasette-comments) attach additional data to table records by name, so a renamed table requires them to react in appropriate ways.

Here are the rest of the changes in the alpha:

New actor= parameter for datasette.client methods, allowing internal requests to be made as a specific actor. This is particularly useful for writing automated tests. (#2688)

New Database(is_temp_disk=True) option, used internally for the internal database. This helps resolve intermittent database locked errors caused by the internal database being in-memory as opposed to on-disk. (#2683) (#2684)

The /<database>/<table>/-/upsert API (docs) now rejects rows with null primary key values. (#1936)

Improved example in the API explorer for the /-/upsert endpoint (docs). (#1936)

The /<database>.json endpoint now includes an "ok": true key, for consistency with other JSON API responses.

call_with_supported_arguments() is now documented as a supported public API. (#2678)

Tags: python, datasette, annotated-release-notes

datasette-llm 0.1a1

2026-03-25T21:24:31+00:00

Release: datasette-llm 0.1a1

New release of the base plugin that makes models from LLM available for use by other Datasette plugins such as datasette-enrichments-llm.

New register_llm_purposes() plugin hook and get_purposes() function for retrieving registered purpose strings. #1

One of the responsibilities of this plugin is to configure which models are used for which purposes, so you can say in one place "data enrichment uses GPT-5.4-nano but SQL query assistance happens using Sonnet 4.6", for example.

Plugins that depend on this can use model = await llm.model(purpose="enrichment") to indicate the purpose of the prompts they wish to execute against the model. Those plugins can now also use the new register_llm_purposes() hook to register those purpose strings, which means future plugins can list those purposes in one place to power things like an admin UI for assigning models to purposes.

Tags: plugins, datasette, annotated-release-notes, llm

datasette-files 0.1a2

2026-03-23T23:06:38+00:00

Release: datasette-files 0.1a2

The most interesting alpha of datasette-files yet, a new plugin which adds the ability to upload files directly into a Datasette instance. Here are the release notes in full:

Columns are now configured using the new column_types system from Datasette 1.0a26. #8

New file_actions plugin hook, plus ability to import an uploaded CSV/TSV file to a table. #10

UI for uploading multiple files at once via the new documented JSON upload API. #11

Thumbnails are now generated for image files and stored in an internal datasette_files_thumbnails table. #13

Tags: datasette, annotated-release-notes

Rodney v0.4.0

2026-02-17T23:02:33+00:00

Rodney v0.4.0

My Rodney CLI tool for browser automation attracted quite the flurry of PRs since I announced it last week. Here are the release notes for the just-released v0.4.0:

Errors now use exit code 2, which means exit code 1 is just for for check failures. #15

New rodney assert command for running JavaScript tests, exit code 1 if they fail. #19

New directory-scoped sessions with --local/--global flags. #14

New reload --hard and clear-cache commands. #17

New rodney start --show option to make the browser window visible. Thanks, Antonio Cuni. #13

New rodney connect PORT command to debug an already-running Chrome instance. Thanks, Peter Fraenkel. #12

New RODNEY_HOME environment variable to support custom state directories. Thanks, Senko Rašić. #11

New --insecure flag to ignore certificate errors. Thanks, Jakub Zgoliński. #10

Windows support: avoid Setsid on Windows via build-tag helpers. Thanks, adm1neca. #18

Tests now run on windows-latest and macos-latest in addition to Linux.

I've been using Showboat to create demos of new features - here those are for rodney assert, rodney reload --hard, rodney exit codes, and rodney start --local.

The rodney assert command is pretty neat: you can now Rodney to test a web app through multiple steps in a shell script that looks something like this (adapted from the README):

#!/bin/bash
set -euo pipefail

FAIL=0

check() {
    if ! "$@"; then
        echo "FAIL: $*"
        FAIL=1
    fi
}

rodney start
rodney open "https://example.com"
rodney waitstable

# Assert elements exist
check rodney exists "h1"

# Assert key elements are visible
check rodney visible "h1"
check rodney visible "#main-content"

# Assert JS expressions
check rodney assert 'document.title' 'Example Domain'
check rodney assert 'document.querySelectorAll("p").length' '2'

# Assert accessibility requirements
check rodney ax-find --role navigation

rodney stop

if [ "$FAIL" -ne 0 ]; then
    echo "Some checks failed"
    exit 1
fi
echo "All checks passed"

Tags: browsers, projects, testing, annotated-release-notes, rodney

Datasette 1.0a24

2026-01-29T17:21:51+00:00

Datasette 1.0a24

New Datasette alpha this morning. Key new features:

Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to rows of data.
The recommended development environment for hacking on Datasette itself now uses uv. Crucially, you can clone Datasette and run uv run pytest to run the tests without needing to manually create a virtual environment or install dependencies first, thanks to the dev dependency group pattern.
A new ?_extra=render_cell parameter for both table and row JSON pages to return the results of executing the render_cell() plugin hook. This should unlock new JavaScript UI features in the future.

More details in the release notes. I also invested a bunch of work in eliminating flaky tests that were intermittently failing in CI - I think those are all handled now.

Tags: projects, python, datasette, annotated-release-notes, uv

shot-scraper 1.9

2025-12-29T22:33:13+00:00

shot-scraper 1.9

New release of my shot-scraper CLI tool for taking screenshots and scraping websites with JavaScript from the terminal.

The shot-scraper har command has a new -x/--extract option which extracts all of the resources loaded by the page out to a set of files. This location can be controlled by the -o dir/ option. #184

Fixed the shot-scraper accessibility command for compatibility with the latest Playwright. #185

The new shot-scraper har -x https://simonwillison.net/ command is really neat. The inspiration was the digital forensics expedition I went on to figure out why Rob Pike got spammed. You can now perform a version of that investigation like this:

cd /tmp
shot-scraper har --wait 10000 'https://theaidigest.org/village?day=265' -x

Then dig around in the resulting JSON files in the /tmp/theaidigest-org-village folder.

Tags: projects, annotated-release-notes, shot-scraper

s3-credentials 0.17

2025-12-16T23:40:31+00:00

s3-credentials 0.17

New release of my s3-credentials CLI tool for managing credentials needed to access just one S3 bucket. Here are the release notes in full:

New commands get-bucket-policy and set-bucket-policy. #91

New commands get-public-access-block and set-public-access-block. #92

New localserver command for starting a web server that makes time limited credentials accessible via a JSON API. #93

That s3-credentials localserver command (documented here) is a little obscure, but I found myself wanting something like that to help me test out a new feature I'm building to help create temporary Litestream credentials using Amazon STS.

Most of that new feature was built by Claude Code from the following starting prompt:

Add a feature s3-credentials localserver which starts a localhost weberver running (using the Python standard library stuff) on port 8094 by default but -p/--port can set a different port and otherwise takes an option that names a bucket and then takes the same options for read--write/read-only etc as other commands. It also takes a required --refresh-interval option which can be set as 5m or 10h or 30s. All this thing does is reply on / to a GET request with the IAM expiring credentials that allow access to that bucket with that policy for that specified amount of time. It caches internally the credentials it generates and will return the exact same data up until they expire (it also tracks expected expiry time) after which it will generate new credentials (avoiding dog pile effects if multiple requests ask at the same time) and return and cache those instead.

Tags: aws, projects, s3, ai, annotated-release-notes, s3-credentials, prompt-engineering, generative-ai, llms, coding-agents, claude-code

LLM 0.28

2025-12-12T20:20:14+00:00

LLM 0.28

I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:

New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 and gpt-5.2-chat-latest. #1300, #1317

When fetching URLs as fragments using llm -f URL, the request now includes a custom user-agent header: llm/VERSION (https://llm.datasette.io/). #1309

Fixed a bug where fragments were not correctly registered with their source when using llm chat. Thanks, Giuseppe Rota. #1316

Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313

Type annotations for the OpenAI Chat, AsyncChat and Completion execute() methods. Thanks, Arjan Mossel. #1315

The project now uses uv and dependency groups for development. See the updated contributing documentation. #1318

That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:

git clone https://github.com/simonw/llm
cd llm
uv run pytest

The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.

Tags: projects, python, ai, annotated-release-notes, generative-ai, llms, llm, uv

sqlite-utils 3.39

2025-11-24T18:59:14+00:00

sqlite-utils 3.39

I got a report of a bug in sqlite-utils concerning plugin installation - if you installed the package using uv tool install further attempts to install plugins with sqlite-utils install X would fail, because uv doesn't bundle pip by default. I had the same bug with Datasette a while ago, turns out I forgot to apply the fix to sqlite-utils.

Since I was pushing a new dot-release I decided to integrate some of the non-breaking changes from the 4.0 alpha I released last night.

I tried to have Claude Code do the backporting for me:

create a new branch called 3.x starting with the 3.38 tag, then consult https://github.com/simonw/sqlite-utils/issues/688 and cherry-pick the commits it lists in the second comment, then review each of the links in the first comment and cherry-pick those as well. After each cherry-pick run the command "just test" to confirm the tests pass and fix them if they don't. Look through the commit history on main since the 3.38 tag to help you with this task.

This worked reasonably well - here's the terminal transcript. It successfully argued me out of two of the larger changes which would have added more complexity than I want in a small dot-release like this.

I still had to do a bunch of manual work to get everything up to scratch, which I carried out in this PR - including adding comments there and then telling Claude Code:

Apply changes from the review on this PR https://github.com/simonw/sqlite-utils/pull/689

Here's the transcript from that.

The release is now out with the following release notes:

Fixed a bug with sqlite-utils install when the tool had been installed using uv. (#687)

The --functions argument now optionally accepts a path to a Python file as an alternative to a string full of code, and can be specified multiple times - see Defining custom SQL functions. (#659)

sqlite-utils now requires on Python 3.10 or higher.

Tags: projects, sqlite, sqlite-utils, annotated-release-notes, uv, coding-agents, claude-code

sqlite-utils 4.0a1 has several (minor) backwards incompatible changes

2025-11-24T14:52:34+00:00

I released a new alpha version of sqlite-utils last night - the 128th release of that package since I started building it back in 2018.

sqlite-utils is two things in one package: a Python library for conveniently creating and manipulating SQLite databases and a CLI tool for working with them in the terminal. Almost every feature provided by the package is available via both of those surfaces.

This is hopefully the last alpha before a 4.0 stable release. I use semantic versioning for this library, so the 4.0 version number indicates that there are backward incompatible changes that may affect code written against the 3.x line.

These changes are mostly very minor: I don't want to break any existing code if I can avoid it. I made it all the way to version 3.38 before I had to ship a major release and I'm sad I couldn't push that even further!

Here are the annotated release notes for 4.0a1.

Breaking change: The db.table(table_name) method now only works with tables. To access a SQL view use db.view(view_name) instead. (#657)

This change is for type hint enthusiasts. The Python library used to encourage accessing both SQL tables and SQL views through the db["name_of_table_or_view"] syntactic sugar - but tables and view have different interfaces since there's no way to handle a .insert(row) on a SQLite view. If you want clean type hints for your code you can now use the db.table(table_name) and db.view(view_name) methods instead.

The table.insert_all() and table.upsert_all() methods can now accept an iterator of lists or tuples as an alternative to dictionaries. The first item should be a list/tuple of column names. See Inserting data from a list or tuple iterator for details. (#672)

A new feature, not a breaking change. I realized that supporting a stream of lists or tuples as an option for populating large tables would be a neat optimization over always dealing with dictionaries each of which duplicated the column names.

I had the idea for this one while walking the dog and built the first prototype by prompting Claude Code for web on my phone. Here's the prompt I used and the prototype report it created, which included a benchmark estimating how much of a performance boost could be had for different sizes of tables.

Breaking change: The default floating point column type has been changed from FLOAT to REAL, which is the correct SQLite type for floating point values. This affects auto-detected columns when inserting data. (#645)

I was horrified to discover a while ago that I'd been creating SQLite columns called FLOAT but the correct type to use was REAL! This change fixes that. Previously the fix was to ask for tables to be created in strict mode.

Now uses pyproject.toml in place of setup.py for packaging. (#675)

As part of this I also figured out recipes for using uv as a development environment for the package, which are now baked into the Justfile.

Tables in the Python API now do a much better job of remembering the primary key and other schema details from when they were first created. (#655)

This one is best explained in the issue.

Breaking change: The table.convert() and sqlite-utils convert mechanisms no longer skip values that evaluate to False. Previously the --skip-false option was needed, this has been removed. (#542)

Another change which I would have made earlier but, since it introduces a minor behavior change to an existing feature, I reserved it for the 4.0 release.

Breaking change: Tables created by this library now wrap table and column names in "double-quotes" in the schema. Previously they would use [square-braces]. (#677)

Back in 2018 when I started this project I was new to working in-depth with SQLite and incorrectly concluded that the correct way to create tables and columns named after reserved words was like this:

create table [my table] (
  [id] integer primary key,
  [key] text
)

That turned out to be a non-standard SQL syntax which the SQLite documentation describes like this:

A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility.

Unfortunately I baked it into the library early on and it's been polluting the world with weirdly escaped table and column names ever since!

I've finally fixed that, with the help of Claude Code which took on the mind-numbing task of updating hundreds of existing tests that asserted against the generated schemas.

The above example table schema now looks like this:

create table "my table" (
  "id" integer primary key,
  "key" text
)

This may seem like a pretty small change but I expect it to cause a fair amount of downstream pain purely in terms of updating tests that work against tables created by sqlite-utils!

The --functions CLI argument now accepts a path to a Python file in addition to accepting a string full of Python code. It can also now be specified multiple times. (#659)

I made this change first in LLM and decided to bring it to sqlite-utils for consistency between the two tools.

Breaking change: Type detection is now the default behavior for the insert and upsert CLI commands when importing CSV or TSV data. Previously all columns were treated as TEXT unless the --detect-types flag was passed. Use the new --no-detect-types flag to restore the old behavior. The SQLITE_UTILS_DETECT_TYPES environment variable has been removed. (#679)

One last minor ugliness that I waited for a major version bump to fix.

Update: Now that the embargo has lifted I can reveal that a substantial amount of the work on this release was performed using a preview version of Anthropic's new Claude Opus 4.5 model. Here's the Claude Code transcript for the work to implement the ability to use an iterator over lists instead of dictionaries for bulk insert and upsert operations.

Tags: projects, sqlite, sqlite-utils, annotated-release-notes, ai-assisted-programming, coding-agents, claude-code

Datasette 1.0a22

2025-11-13T23:04:18+00:00

Datasette 1.0a22

New Datasette 1.0 alpha, adding some small features we needed to properly integrate the new permissions system with Datasette Cloud:

datasette serve --default-deny option for running Datasette configured to deny all permissions by default. (#2592)

datasette.is_client() method for detecting if code is executing inside a datasette.client request. (#2594)

Plus a developer experience improvement for plugin authors:

datasette.pm property can now be used to register and unregister plugins in tests. (#2595)

Tags: projects, datasette, datasette-cloud, annotated-release-notes

Open redirect endpoint in Datasette prior to 0.65.2 and 1.0a21

2025-11-05T23:11:17+00:00

Open redirect endpoint in Datasette prior to 0.65.2 and 1.0a21

This GitHub security advisory covers two new releases of Datasette that I shipped today, both addressing the same open redirect issue with a fix by James Jefferies.

Datasette 0.65.2 fixes the bug and also adds Python 3.14 support and a datasette publish cloudrun fix.

Datasette 1.0a21 also has that Cloud Run fix and two other small new features:

New datasette --get /path --headers option for inspecting the headers returned by a path. (#2578)

New datasette.client.get(..., skip_permission_checks=True) parameter to bypass permission checks when making requests using the internal client. (#2583)

I decided to include the Cloud Run deployment fix so anyone with Datasette instances deployed to Cloud Run can update them with the new patched versions.

Tags: security, datasette, cloudrun, annotated-release-notes

A new SQL-powered permissions system in Datasette 1.0a20

2025-11-04T21:34:42+00:00

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.

Understanding the permissions system

Datasette's permissions system exists to answer the following question:

Is this actor allowed to perform this action, optionally against this particular resource?

An actor is usually a user, but might also be an automation operating via the Datasette API.

An action is a thing they need to do - things like view-table, execute-sql, insert-row.

A resource is the subject of the action - the database you are executing SQL against, the table you want to insert a row into.

Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data.

Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes a write API with a need to configure who can insert, update, and delete rows or create new tables.

Actors can be authenticated in a number of different ways provided by plugins using the actor_from_request() plugin hook. datasette-auth-passwords and datasette-auth-github and datasette-auth-existing-cookies are examples of authentication plugins.

Permissions systems need to be able to efficiently list things

The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result.

This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view?

I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables.

Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is really good at filtering large numbers of records.

The new permission_resources_sql() plugin hook

The biggest change in the new release is that I've replaced the previous permission_allowed(actor, action, resource) plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new permission_resources_sql(actor, action) plugin hook.

Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against.

Here's an example, lifted from the documentation:

from datasette import hookimpl
from datasette.permissions import PermissionSQL


@hookimpl
def permission_resources_sql(datasette, actor, action):
    if action != "view-table":
        return None
    if not actor or actor.get("id") != "alice":
        return None

    return PermissionSQL(
        sql="""
            SELECT
                'accounting' AS parent,
                'sales' AS child,
                1 AS allow,
                'alice can view accounting/sales' AS reason
        """,
    )

This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database.

The PermissionSQL object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging.

When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against the internal catalog tables and efficiently lists all the resources the actor can access.

This query can then be limited or paginated to avoid loading too many results at once.

Hierarchies, plugins, vetoes, and restrictions

Datasette has several additional requirements that make the permissions system more complicated.

Datasette permissions can optionally act against a two-level hierarchy. You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in every database in that Datasette instance.

Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example.

Datasette currently has ten default actions but plugins that add additional features can register new actions to better participate in the permission systems.

Datasette's permission system has a mechanism to veto permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level.

Finally, Datasette includes a mechanism for applying additional restrictions to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are described in more detail in the documentation.

That's a lot of different moving parts for the new implementation to cover.

New debugging tools

Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible.

The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user:

And this page listing the rules that apply to that question - since different plugins may return different rules which get combined together:

This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the --root option.)

Those rules are defined in the datasette/default_permissions.py Python module.

The missing feature: list actors who can act on this resource

There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource.

It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as datasette-auth-github mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before.

API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system.

This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons.

Upgrading plugins for Datasette 1.0a20

Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out.

The Datasette upgrade guide includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document.

I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working extremely well. I've even had Claude Code update those notes itself with things it learned during an upgrade process!

This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect.

I've also been leaning heavily on uv to help with the upgrade process. I wrote myself two new helper scripts - tadd and radd - to help test the new plugins.

tadd = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to pytest so I can run tadd -k test_name or tadd -x --pdb as needed.
radd = "run against datasette dev" - it runs the latest dev datasette command with the plugin installed.

The tadd and radd implementations can be found in this TIL.

Some of my plugin upgrades have become a one-liner to the codex exec command, which runs OpenAI Codex CLI with a prompt without entering interactive mode:

codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"

There are still a bunch more to go - there's a list in this tracking issue - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place.

Using Claude Code to implement this change

This change to Datasette core by far the most ambitious piece of work I've ever attempted using a coding agent.

Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window.

Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable.

I no longer see codebase size as a limiting factor for how useful they can be.

I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right.

For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI.

I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification.

Starting with a proof-of-concept

I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My research/sqlite-permissions-poc project was the one that finally convinced me of a viable approach,

That one started as a free ranging conversation with Claude, at the end of which I told it to generate a specification which I then fed into GPT-5 to implement. You can see that specification at the end of the README.

I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment.

This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months.

Miscellaneous tips I picked up along the way

When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The tadd and radd shortcuts were invaluable for productively working on those plugins while I made changes to core.
Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.
Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.
Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes".
Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch.
A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: Use gh CLI to fetch comments on URL-to-PR and make the requested changes. This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc.
The code I write with LLMs is higher quality code. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low.
Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.
That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as Claude Skills. I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.

What's next?

Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same.

The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week.

This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year!

Tags: plugins, projects, python, sql, sqlite, datasette, annotated-release-notes, uv, coding-agents, claude-code, codex

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

2025-08-11T23:57:50+00:00

I shipped LLM 0.27 today (followed by a 0.27.1 with minor bug fixes), adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features introduced in LLM 0.26. Here are the annotated release notes.

GPT-5

New models: gpt-5, gpt-5-mini and gpt-5-nano. #1229

I would have liked to get these out sooner, but LLM had accumulated quite a lot of other changes since the last release and I wanted to use GPT-5 as an excuse to wrap all of those up and get them out there.

These models work much the same as other OpenAI models, but they have a new reasoning_effort option of minimal. You can try that out like this:

llm -m gpt-5 'A letter advocating for cozy boxes for pelicans in Half Moon Bay harbor' -o reasoning_effort minimal

Setting "minimal" almost completely eliminates the "thinking" time for the model, causing it to behave more like GPT-4o.

Here's the letter it wrote me at a cost of 20 input, 706 output = $0.007085 which is 0.7085 cents.

You can set the default model to GPT-5-mini (since it's a bit cheaper) like this:

llm models default gpt-5-mini

Tools in templates

LLM templates can now include a list of tools. These can be named tools from plugins or arbitrary Python function blocks, see Tools in templates. #1009

I think this is the most important feature in the new release.

I added LLM's tool calling features in LLM 0.26. You can call them from the Python API but you can also call them from the command-line like this:

llm -T llm_version -T llm_time 'Tell the time, then show the version'

Here's the output of llm logs -c after running that command.

This example shows that you have to explicitly list all of the tools you would like to expose to the model, using the -T/--tool option one or more times.

In LLM 0.27 you can now save these tool collections to a template. Let's try that now:

llm -T llm_version -T llm_time -m gpt-5 --save mytools

Now mytools is a template that bundles those two tools and sets the default model to GPT-5. We can run it like this:

llm -t mytools 'Time then version'

Let's do something more fun. My blog has a Datasette mirror which I can run queries against. I'm going to use the llm-tools-datasette plugin to turn that into a tool-driven template. This plugin uses a "toolbox", which looks a bit like a class. Those are described here.

llm install llm-tools-datasette

# Now create that template
llm --tool 'Datasette("https://datasette.simonwillison.net/simonwillisonblog")' \
  -m gpt-5 -s 'Use Datasette tools to answer questions' --save blog

Now I can ask questions of my database like this:

llm -t blog 'top ten tags by number of entries'

The --td option there stands for --tools-debug - it means we can see all tool calls as they are run.

Here's the output of the above:

Top 10 tags by number of entries (excluding drafts):
- quora — 1003
- projects — 265
- datasette — 238
- python — 213
- ai — 200
- llms — 200
- generative-ai — 197
- weeknotes — 193
- web-development — 166
- startups — 157

Full transcript with tool traces here.

I'm really excited about the ability to store configured tools

Tools can now return attachments, for models that support features such as image input. #1014

I want to build a tool that can render SVG to an image, then return that image so the model can see what it has drawn. For reasons.

New methods on the Toolbox class: .add_tool(), .prepare() and .prepare_async(), described in Dynamic toolboxes. #1111

I added these because there's a lot of interest in an MCP plugin for Datasette. Part of the challenge with MCP is that the user provides the URL to a server but we then need to introspect that server and dynamically add the tools we have discovered there. The new .add_tool() method can do that, and the .prepare() and .prepare_async() methods give us a reliable way to run some discovery code outside of the class constructor, allowing it to make asynchronous calls if necessary.

New model.conversation(before_call=x, after_call=y) attributes for registering callback functions to run before and after tool calls. See tool debugging hooks for details. #1088

Raising llm.CancelToolCall now only cancels the current tool call, passing an error back to the model and allowing it to continue. #1148

These hooks are useful for implementing more complex tool calling at the Python API layer. In addition to debugging and logging they allow Python code to intercept tool calls and cancel or delay them based on what they are trying to do.

Some model providers can serve different models from the same configured URL - llm-llama-server for example. Plugins for these providers can now record the resolved model ID of the model that was used to the LLM logs using the response.set_resolved_model(model_id) method. #1117

This solves a frustration I've had for a while where some of my plugins log the same model ID for requests that were processed by a bunch of different models under the hood - making my logs less valuable. The new mechanism now allows plugins to record a more accurate model ID for a prompt, should it differ from the model ID that was requsted.

New -l/--latest option for llm logs -q searchterm for searching logs ordered by date (most recent first) instead of the default relevance search. #1177

My personal log database has grown to over 8,000 entries now, and running full-text search queries against it often returned results from last year that were no longer relevant to me. Being able to find the latest prompt matching "pelican svg" is much more useful.

Everything else was bug fixes and documentation improvements:

Bug fixes and documentation

The register_embedding_models hook is now documented. #1049

Show visible stack trace for llm templates show invalid-template-name. #1053

Handle invalid tool names more gracefully in llm chat. #1104

Add a Tool plugins section to the plugin directory. #1110

Error on register(Klass) if the passed class is not a subclass of Toolbox. #1114

Add -h for --help for all llm CLI commands. #1134

Add missing dataclasses to advanced model plugins docs. #1137

Fixed a bug where llm logs -T llm_version "version" --async incorrectly recorded just one single log entry when it should have recorded two. #1150

All extra OpenAI model keys in extra-openai-models.yaml are now documented. #1228

Tags: projects, python, ai, datasette, annotated-release-notes, generative-ai, llms, llm, llm-tool-use, gpt-5, gpt

Long context support in LLM 0.24 using fragments and template plugins

2025-04-07T17:45:11+00:00

LLM 0.24 is now available with new features to help take advantage of the increasingly long input context supported by modern LLMs.

(LLM is my command-line tool and Python library for interacting with LLMs, supported by 20+ plugins adding support for both local and remote models from a bunch of different providers.)

Trying it out

To install LLM with uv (there are several other options):

uv tool install llm

You'll need to either provide an OpenAI API key or install a plugin to use local models or models from other providers:

llm keys set openai
# Paste OpenAI API key here

To upgrade LLM from a previous version:

llm install -U llm

The biggest new feature is fragments. You can now use -f filename or -f url to add one or more fragments to your prompt, which means you can do things like this:

llm -f https://simonwillison.net/2025/Apr/5/llama-4-notes/ 'bullet point summary'

Here's the output from that prompt, exported using llm logs -c --expand --usage. Token cost was 5,372 input, 374 output which works out as 0.103 cents (around 1/10th of a cent) using the default GPT-4o mini model.

Plugins can implement custom fragment loaders with a prefix. The llm-fragments-github plugin adds a github: prefix that can be used to load every text file in a GitHub repository as a list of fragments:

llm install llm-fragments-github
llm -f github:simonw/s3-credentials 'Suggest new features for this tool'

Here's the output. That took 49,856 input tokens for a total cost of 0.7843 cents - nearly a whole cent!

Improving LLM's support for long context models

Long context is one of the most exciting trends in LLMs over the past eighteen months. Saturday's Llama 4 Scout release gave us the first model with a full 10 million token context. Google's Gemini family has several 1-2 million token models, and the baseline for recent models from both OpenAI and Anthropic is 100 or 200 thousand.

Two years ago most models capped out at 8,000 tokens of input. Long context opens up many new interesting ways to apply this class of technology.

I've been using long context models via my files-to-prompt tool to summarize large codebases, explain how they work and even to debug gnarly bugs. As demonstrated above, it's surprisingly inexpensive to drop tens of thousands of tokens into models like GPT-4o mini or most of the Google Gemini series, and the results are often very impressive.

One of LLM's most useful features is that it logs every prompt and response to a SQLite database. This is great for comparing the same prompt against different models and tracking experiments over time - my own database contained thousands of responses from hundreds of different models accumulated over the past couple of years.

This is where long context prompts were starting to be a problem. Since LLM stores the full prompt and response in the database, asking five questions of the same source code could result in five duplicate copies of that text in the database!

The new fragments feature targets this problem head on. Each fragment is stored once in a fragments table, then de-duplicated in the future using a SHA256 hash of its content.

This saves on storage, and also enables features like llm logs -f X for seeing all logged responses that use a particular fragment.

Fragments can be specified in several different ways:

a path to a file
a URL to data online
an alias that's been set against a previous fragment (see llm fragments set)
a hash ID of the content of a fragment
using prefix:argument to specify fragments from a plugin

Asking questions of LLM's documentation

Wouldn't it be neat if LLM could answer questions about its own documentation?

The new llm-docs plugin (built with the new register_fragment_loaders() plugin hook) enables exactly that:

llm install llm-docs
llm -f docs: "How do I embed a binary file?"

The output starts like this:

To embed a binary file using the LLM command-line interface, you can use the llm embed command with the --binary option. Here’s how you can do it:

Make sure you have the appropriate embedding model installed that supports binary input.

Use the following command syntax:
llm embed -m <model_id> --binary -i <path_to_your_binary_file>
Replace <model_id> with the identifier for the embedding model you want to use (e.g., clip for the CLIP model) and <path_to_your_binary_file> with the path to your actual binary file.

(74,570 input, 240 output = 1.1329 cents with GPT-4o mini)

Using -f docs: with just the prefix is the same as using -f docs:llm. The plugin fetches the documentation for your current version of LLM from my new simonw/docs-for-llms repo, which also provides packaged documentation files for my datasette, s3-credentials, shot-scraper and sqlite-utils projects.

Datasette's documentation has got pretty long, so you might need to run that through a Gemini model instead (using the llm-gemini plugin):

llm -f docs:datasette -m gemini-2.0-flash \
  'Build a render_cell plugin that detects and renders markdown'

Here's the output. 132,042 input, 1,129 output with Gemini 2.0 Flash = 1.3656 cents.

You can browse the combined documentation files this uses in docs-for-llm. They're built using GitHub Actions.

llms-txt is a project lead by Jeremy Howard that encourages projects to publish similar files to help LLMs ingest a succinct copy of their documentation.

The new register_template_loaders() plugin hook allows plugins to register prefix:value custom template loaders, for use with the llm -t option.

llm-templates-github and llm-templates-fabric are two new plugins that make use of that hook.

llm-templates-github lets you share and use templates via a public GitHub repository. Here's how to run my Pelican riding a bicycle benchmark against a specific model:

llm install llm-templates-github
llm -t gh:simonw/pelican-svg -m o3-mini

This executes this pelican-svg.yaml template stored in my simonw/llm-templates repository, using a new repository naming convention.

llm -t gh:simonw/pelican-svg will load that pelican-svg.yaml file from the simonw/llm-templates repo. You can also use llm -t gh:simonw/name-of-repo/name-of-template to load a template from a repository that doesn't follow that convention.

To share your own templates, create a repository on GitHub under your user account called llm-templates and start saving .yaml files to it.

llm-templates-fabric provides a similar mechanism for loading templates from Daniel Miessler's extensive fabric collection:

llm install llm-templates-fabric
curl https://simonwillison.net/2025/Apr/6/only-miffy/ | \
  llm -t f:extract_main_idea

A conversation with Daniel was the inspiration for this new plugin hook.

Template and fragment loader plugins so far

Here's the list of fragment and template loader plugins so far from the LLM plugins directory:

llm-templates-github supports loading templates shared on GitHub, e.g. llm -t gh:simonw/pelican-svg.

llm-templates-fabric provides access to the Fabric collection of prompts: cat setup.py | llm -t fabric:explain_code.

llm-fragments-github can load entire GitHub repositories in a single operation: llm -f github:simonw/files-to-prompt 'explain this code'.

llm-hacker-news imports conversations from Hacker News as fragments: llm -f hn:43615912 'summary with illustrative direct quotes'.

Everything else in LLM 0.24

LLM 0.24 is a big release, spanning 51 commits. The release notes cover everything that's new in full - here are a few of my highlights:

The new llm-openai plugin provides support for o1-pro (which is not supported by the OpenAI mechanism used by LLM core). Future OpenAI features will migrate to this plugin instead of LLM core itself.

The problem with OpenAI models being handled by LLM core is that I have to release a whole new version of LLM every time OpenAI releases a new model or feature. Migrating this stuff out to a plugin means I can release new version of that plugin independently of LLM itself - something I frequently do for llm-anthropic and llm-gemini and others.

The new llm-openai plugin uses their Responses API, a new shape of API which I covered last month.

llm -t $URL option can now take a URL to a YAML template. #856

The new custom template loaders are fun, but being able to paste in a URL to a YAML file somewhere provides a simpler way to share templates.

Templates can now store default model options. #845

Attachments can now be stored in templates. #826

The quickest way to create your own template is with the llm prompt ... --save name-of-template command. This now works with attachments, fragments and default model options, each of which is persisted in the template YAML file.

New llm models options family of commands for setting default options for particular models. #829

I built this when I learned that Qwen's QwQ-32b model works best with temperature 0.7 and top p 0.95.

llm prompt -d path-to-sqlite.db option can now be used to write logs to a custom SQLite database. #858

This proved extremely useful for testing fragments - it meant I could run a prompt and save the full response to a separate SQLite database which I could then upload to S3 and share as a link to Datasette Lite.

llm similar -p/--plain option providing more human-readable output than the default JSON. #853

I'd like this to be the default output, but I'm holding off on changing that until LLM 1.0 since it's a breaking change for people building automations against the JSON from llm similar.

Set the LLM_RAISE_ERRORS=1 environment variable to raise errors during prompts rather than suppressing them, which means you can run python -i -m llm 'prompt' and then drop into a debugger on errors with import pdb; pdb.pm(). #817

Really useful for debugging new model plugins.

llm prompt -q gpt -q 4o option - pass -q searchterm one or more times to execute a prompt against the first model that matches all of those strings - useful for if you can't remember the full model ID. #841

Pretty obscure but I found myself needing this. Vendors love releasing models with names like gemini-2.5-pro-exp-03-25, now I can run llm -q gem -q 2.5 -q exp 'say hi' to save me from looking up the model ID.

OpenAI compatible models configured using extra-openai-models.yaml now support supports_schema: true, vision: true and audio: true options. Thanks @adaitche and @giuli007. #819, #843

I don't use this feature myself but it's clearly popular, this isn't the first time I'e had PRs with improvements from the wider community.

Tags: plugins, projects, ai, annotated-release-notes, openai, generative-ai, llms, llm, gemini, long-context, files-to-prompt

shot-scraper 1.8

2025-03-25T01:59:38+00:00

shot-scraper 1.8

I've added a new feature to shot-scraper that makes it easier to share scripts for other people to use with the shot-scraper javascript command.

shot-scraper javascript lets you load up a web page in an invisible Chrome browser (via Playwright), execute some JavaScript against that page and output the results to your terminal. It's a fun way of running complex screen-scraping routines as part of a terminal session, or even chained together with other commands using pipes.

The -i/--input option lets you load that JavaScript from a file on disk - but now you can also use a gh: prefix to specify loading code from GitHub instead.

To quote the release notes:

shot-scraper javascript can now optionally load scripts hosted on GitHub via the new gh: prefix to the shot-scraper javascript -i/--input option. #173

Scripts can be referenced as gh:username/repo/path/to/script.js or, if the GitHub user has created a dedicated shot-scraper-scripts repository and placed scripts in the root of it, using gh:username/name-of-script.

For example, to run this readability.js script against any web page you can use the following:
shot-scraper javascript --input gh:simonw/readability \
  https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/

The output from that example starts like this:

{
    "title": "Qwen2.5-VL-32B: Smarter and Lighter",
    "byline": "Simon Willison",
    "dir": null,
    "lang": "en-gb",
    "content": "<div id=\"readability-page-1\"...

My simonw/shot-scraper-scripts repo only has that one file in it so far, but I'm looking forward to growing that collection and hopefully seeing other people create and share their own shot-scraper-scripts repos as well.

This feature is an imitation of a similar feature that's coming in the next release of LLM.

Tags: github, javascript, projects, scraping, annotated-release-notes, playwright, shot-scraper

llm-openrouter 0.4

2025-03-10T21:40:56+00:00

llm-openrouter 0.4

I found out this morning that OpenRouter include support for a number of (rate-limited) free API models.

I occasionally run workshops on top of LLMs (like this one) and being able to provide students with a quick way to obtain an API key against models where they don't have to setup billing is really valuable to me!

This inspired me to upgrade my existing llm-openrouter plugin, and in doing so I closed out a bunch of open feature requests.

Consider this post the annotated release notes:

LLM schema support for OpenRouter models that support structured output. #23

I'm trying to get support for LLM's new schema feature into as many plugins as possible.

OpenRouter's OpenAI-compatible API includes support for the response_format structured content option, but with an important caveat: it only works for some models, and if you try to use it on others it is silently ignored.

I filed an issue with OpenRouter requesting they include schema support in their machine-readable model index. For the moment LLM will let you specify schemas for unsupported models and will ignore them entirely, which isn't ideal.

llm openrouter key command displays information about your current API key. #24

Useful for debugging and checking the details of your key's rate limit.

llm -m ... -o online 1 enables web search grounding against any model, powered by Exa. #25

OpenRouter apparently make this feature available to every one of their supported models! They're using new-to-me Exa to power this feature, an AI-focused search engine startup who appear to have built their own index with their own crawlers (according to their FAQ). This feature is currently priced by OpenRouter at $4 per 1000 results, and since 5 results are returned for every prompt that's 2 cents per prompt.

llm openrouter models command for listing details of the OpenRouter models, including a --json option to get JSON and a --free option to filter for just the free models. #26

This offers a neat way to list the available models. There are examples of the output in the comments on the issue.

New option to specify custom provider routing: -o provider '{JSON here}'. #17

Part of OpenRouter's USP is that it can route prompts to different providers depending on factors like latency, cost or as a fallback if your first choice is unavailable - great for if you are using open weight models like Llama which are hosted by competing companies.

The options they provide for routing are very thorough - I had initially hoped to provide a set of CLI options that covered all of these bases, but I decided instead to reuse their JSON format and forward those options directly on to the model.

Tags: cli, plugins, projects, ai, annotated-release-notes, generative-ai, llms, llm, openrouter, ai-assisted-search

Structured data extraction from unstructured content using LLM schemas

2025-02-28T17:07:07+00:00

LLM 0.23 is out today, and the signature feature is support for schemas - a new way of providing structured output from a model that matches a specification provided by the user. I've also upgraded both the llm-anthropic and llm-gemini plugins to add support for schemas.

TLDR: you can now do things like this:

llm --schema 'name,age int,short_bio' 'invent a cool dog'

And get back:

{
  "name": "Zylo",
  "age": 4,
  "short_bio": "Zylo is a unique hybrid breed, a mix between a Siberian Husky and a Corgi. With striking blue eyes and a fluffy, colorful coat that changes shades with the seasons, Zylo embodies the spirit of winter and summer alike. Known for his playful personality and intelligence, Zylo can perform a variety of tricks and loves to fetch his favorite frisbee. Always ready for an adventure, he's just as happy hiking in the mountains as he is cuddling on the couch after a long day of play."
}

More details in the release notes and LLM schemas tutorial, which includes an example (extracting people from news articles) that's even more useful than inventing dogs!

Structured data extraction is a killer app for LLMs

I've suspected for a while that the single most commercially valuable application of LLMs is turning unstructured content into structured data. That's the trick where you feed an LLM an article, or a PDF, or a screenshot and use it to turn that into JSON or CSV or some other structured format.

It's possible to achieve strong results on this with prompting alone: feed data into an LLM, give it an example of the output you would like and let it figure out the details.

Many of the leading LLM providers now bake this in as a feature. OpenAI, Anthropic, Gemini and Mistral all offer variants of "structured output" as additional options through their API:

OpenAI: Structured Outputs
Gemini: Generate structured output with the Gemini API
Mistral: Custom Structured Outputs
Anthropic's tool use can be used for this, as shown in their Extracting Structured JSON using Claude and Tool Use cookbook example.

These mechanisms are all very similar: you pass a JSON schema to the model defining the shape that you would like, they then use that schema to guide the output of the model.

How reliable that is can vary! Some providers use tricks along the lines of Jsonformer, compiling the JSON schema into code that interacts with the model's next-token generation at runtime, limiting it to only generate tokens that are valid in the context of the schema.

Other providers YOLO it - they trust that their model is "good enough" that showing it the schema will produce the right results!

In practice, this means that you need to be aware that sometimes this stuff will go wrong. As with anything LLM, 100% reliability is never guaranteed.

From my experiments so far, and depending on the model that you chose, these mistakes are rare. If you're using a top tier model it will almost certainly do the right thing.

Designing this feature for LLM

I've wanted this feature for ages. I see it as an important step on the way to full tool usage, which is something I'm very excited to bring to the CLI tool and Python library.

LLM is designed as an abstraction layer over different models. This makes building new features much harder, because I need to figure out a common denominator and then build an abstraction that captures as much value as possible while still being general enough to work across multiple models.

Support for structured output across multiple vendors has matured now to the point that I'm ready to commit to a design.

My first version of this feature worked exclusively with JSON schemas. An earlier version of the tutorial started with this example:

curl https://www.nytimes.com/ | uvx strip-tags | \
  llm --schema '{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "headline": {
            "type": "string"
          },
          "short_summary": {
            "type": "string"
          },
          "key_points": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        },
        "required": ["headline", "short_summary", "key_points"]
      }
    }
  },
  "required": ["items"]
}' | jq

Here we're feeding a full JSON schema document to the new llm --schema option, then piping in the homepage of the New York Times (after running it through strip-tags) and asking for headline, short_summary and key_points for multiple items on the page.

This example still works with the finished feature - you can see example JSON output here - but constructing those long-form schemas by hand was a big pain.

So... I invented my own shortcut syntax.

That earlier example is a simple illustration:

llm --schema 'name,age int,short_bio' 'invent a cool dog'

Here the schema is a comma-separated list of field names, with an optional space-separated type.

The full concise schema syntax is described here. There's a more complex example in the tutorial, which uses the newline-delimited form to extract information about people who are mentioned in a news article:

curl 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  uvx strip-tags | \
  llm --schema-multi "
name: the person's name
organization: who they represent
role: their job title or role
learned: what we learned about them from this story
article_headline: the headline of the story
article_date: the publication date in YYYY-MM-DD
" --system 'extract people mentioned in this article'

The --schema-multi option here tells LLM to take that schema for a single object and upgrade it to an array of those objects (actually an object with a single "items" property that's an array of objects), which is a quick way to request that the same schema be returned multiple times against a single input.

Reusing schemas and creating templates

My original plan with schemas was to provide a separate llm extract command for running these kinds of operations. I ended up going in a different direction - I realized that adding --schema to the default llm prompt command would make it interoperable with other existing features (like attachments for feeding in images and PDFs).

The most valuable way to apply schemas is across many different prompts, in order to gather the same structure of information from many different sources.

I put a bunch of thought into the --schema option. It takes a variety of different values - quoting the documentation:

This option can take multiple forms:

A string providing a JSON schema: --schema '{"type": "object", ...}'

A condensed schema definition: --schema 'name,age int'

The name or path of a file on disk containing a JSON schema: --schema dogs.schema.json

The hexadecimal ID of a previously logged schema: --schema 520f7aabb121afd14d0c6c237b39ba2d - these IDs can be found using the llm schemas command.

A schema that has been saved in a template: --schema t:name-of-template

The tutorial demonstrates saving a schema by using it once and then obtaining its ID through the new llm schemas command, then saving it to a template (along with the system prompt) like this:

llm --schema 3b7702e71da3dd791d9e17b76c88730e \
  --system 'extract people mentioned in this article' \
  --save people

And now we can feed in new articles using the llm -t people shortcut to apply that newly saved template:

curl https://www.theguardian.com/commentisfree/2025/feb/27/billy-mcfarland-new-fyre-festival-fantasist | \
  strip-tags | llm -t people

Doing more with the logged structured data

Having run a few prompts that use the same schema, an obvious next step is to do something with the data that has been collected.

I ended up implementing this on top of the existing llm logs mechanism.

LLM already defaults to logging every prompt and response it makes to a SQLite database - mine contains over 4,747 of these records now, according to this query:

sqlite3 "$(llm logs path)" 'select count(*) from responses'

With schemas, an increasing portion of those are valid JSON.

Since LLM records the schema that was used for each response - using the schema ID, which is derived from a content hash of the expanded JSON schema - it's now possible to ask LLM for all responses that used a particular schema:

llm logs --schema 3b7702e71da3dd791d9e17b76c88730e --short

I got back:

- model: gpt-4o-mini
  datetime: '2025-02-28T07:37:18'
  conversation: 01jn5qt397aaxskf1vjp6zxw2a
  system: extract people mentioned in this article
  prompt: Menu AP Logo Menu World U.S. Politics Sports Entertainment Business Science
    Fact Check Oddities Be Well Newsletters N...
- model: gpt-4o-mini
  datetime: '2025-02-28T07:38:58'
  conversation: 01jn5qx4q5he7yq803rnexp28p
  system: extract people mentioned in this article
  prompt: Skip to main contentSkip to navigationSkip to navigationPrint subscriptionsNewsletters
    Sign inUSUS editionUK editionA...
- model: gpt-4o
  datetime: '2025-02-28T07:39:07'
  conversation: 01jn5qxh20tksb85tf3bx2m3bd
  system: extract people mentioned in this article
  attachments:
  - type: image/jpeg
    url: https://static.simonwillison.net/static/2025/onion-zuck.jpg

As you can see, I've run that example schema three times (while constructing the tutorial) using GPT-4o mini - twice against text content from curl ... | strip-tags and once against a screenshot JPEG to demonstrate attachment support.

Extracting gathered JSON from the logs is clearly a useful next step... so I added several options to llm logs to support that use-case.

The first is --data - adding that will cause LLM logs to output just the data that was gathered using a schema. Mix that with -c to see the JSON from the most recent response:

llm logs -c --data

Outputs:

{"name": "Zap", "age": 5, "short_bio": ...

Combining that with the --schema option is where things get really interesting. You can specify a schema using any of the mechanisms described earlier, which means you can see ALL of the data gathered using that schema by combining --data with --schema X (and -n 0 for everything).

Here are all of the dogs I've invented:

llm logs --schema 'name,age int,short_bio' --data -n 0

Output (here truncated):

{"name": "Zap", "age": 5, "short_bio": "Zap is a futuristic ..."}
{"name": "Zephyr", "age": 3, "short_bio": "Zephyr is an adventurous..."}
{"name": "Zylo", "age": 4, "short_bio": "Zylo is a unique ..."}

Some schemas gather multiple items, producing output that looks like this (from the tutorial):

{"items": [{"name": "Mark Zuckerberg", "organization": "...
{"items": [{"name": "Billy McFarland", "organization": "...

We can get back the individual objects by adding --data-key items. Here I'm also using the --schema t:people shortcut to specify the schema that was saved to the people template earlier on.

llm logs --schema t:people --data-key items

Output:

{"name": "Katy Perry", "organization": ...
{"name": "Gayle King", "organization": ...
{"name": "Lauren Sanchez", "organization": ...

This feature defaults to outputting newline-delimited JSON, but you can add the --data-array flag to get back a JSON array of objects instead.

... which means you can pipe it into sqlite-utils insert to create a SQLite database!

llm logs --schema t:people --data-key items --data-array | \
  sqlite-utils insert data.db people -

Add all of this together and we can construct a schema, run it against a bunch of sources and dump the resulting structured data into SQLite where we can explore it using SQL queries (and Datasette). It's a really powerful combination.

Using schemas from LLM's Python library

The most popular way to work with schemas in Python these days is with Pydantic, to the point that many of the official API libraries for models directly incorporate Pydantic for this purpose.

LLM depended on Pydantic already, and for this project I finally dropped my dual support for Pydantic v1 and v2 and committed to v2 only.

A key reason Pydantic is popular for this is that it's trivial to use it to build a JSON schema document:

import pydantic, json

class Dog(pydantic.BaseModel):
    name: str
    age: int
    bio: str

schema = Dog.model_json_schema()
print(json.dumps(schema, indent=2))

Outputs:

{
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "age": {
      "title": "Age",
      "type": "integer"
    },
    "bio": {
      "title": "Bio",
      "type": "string"
    }
  },
  "required": [
    "name",
    "age",
    "bio"
  ],
  "title": "Dog",
  "type": "object"
}

LLM's Python library doesn't require you to use Pydantic, but it supports passing either a Pydantic BaseModel subclass or a full JSON schema to the new model.prompt(schema=) parameter. Here's the usage example from the documentation:

import llm, json
from pydantic import BaseModel

class Dog(BaseModel):
    name: str
    age: int

model = llm.get_model("gpt-4o-mini")
response = model.prompt("Describe a nice dog", schema=Dog)
dog = json.loads(response.text())
print(dog)
# {"name":"Buddy","age":3}

What's next for LLM schemas?

So far I've implemented schema support for models from OpenAI, Anthropic and Gemini. The plugin author documentation includes details on how to add this to further plugins - I'd love to see one of the local model plugins implement this pattern as well.

Update llm-ollama now support schemas thanks to this PR by Adam Compton. And I've added support to llm-mistral.

I'm presenting a workshop at the NICAR 2025 data journalism conference next week about Cutting-edge web scraping techniques. LLM schemas is a great example of NDD - NICAR-Driven Development - where I'm churning out features I need for that conference (see also shot-scraper's new HAR support).

I expect the workshop will be a great opportunity to further refine the design and implementation of this feature!

I'm also going to be using this new feature to add multiple model support to my datasette-extract plugin, which provides a web UI for structured data extraction that writes the resulting records directly to a SQLite database table.

Tags: cli, data-journalism, projects, ai, annotated-release-notes, generative-ai, local-llms, llms, llm, mistral, gemini, ollama, structured-extraction

Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14

2025-02-25T17:50:25+00:00

Claude 3.7 Sonnet (previously) is a very interesting new model. I released llm-anthropic 0.14 last night adding support for the new model's features to LLM. I learned a whole lot about the new model in the process of building that plugin.

The base Claude 3.7 Sonnet model mostly feels like an incremental improvement over the excellent Claude 3.5 Sonnet. It benchmarks better and my vibe checks so far are good: it writes excellent code across Python and JavaScript and seems to do everything 3.5 did, for the same price but with an uptick in overall quality.

Extended thinking mode

The big new feature is extended thinking mode. This is the option that turns Claude into an inference scaling model similar to OpenAI's o1 and o3, Gemini 2.0 Flash Thinking and DeepSeek R1.

From my plugin's release notes:

Support for the new Claude 3.7 Sonnet model, including -o thinking 1 to turn on extended thinking mode and -o thinking_budget X to increase the thinking token budget from the 1024 default. #14

You need to opt-in to this new mode. In the Claude API that means appending this block of JSON to your request:

{
    "thinking": {
        "type": "enabled",
        "budget_tokens": 1024
    }
}

The budget_tokens defines how many tokens Claude can spend "thinking" about your prompt. 1,024 is the minimum, but you can increase that up to a maximum of 128,000 (see below).

This is Claude's equivalent of OpenAI's reasoning_effort option, which can be set to low, medium (the default) or high.

Running thinking prompts

You can run a "thinking" prompt through LLM like this - the new -o thinking 1 option switches on the new extended thinking mode:

llm install -U llm-anthropic
llm -m claude-3.7-sonnet -o thinking 1 "write a speech about pelicans for congress"

The -o thinking_budget 4000 option can increase the number of allowed thinking tokens from the default value of 1024.

For the moment the thinking tokens are not visible - but you can review them once the prompt has completed using the llm logs --json command:

llm logs --json -c

The -c option is inherited from "continue previous conversation" - it will show the log for the most recently completed prompt.

You can see the speech about pelicans here, but here's the more interesting thought process that was output by the model:

I'm being asked to write a speech about pelicans that would be delivered to Congress. This seems like an unusual request, but I can craft a formal, informative speech that might be given by someone advocating for pelican conservation, environmental protection, or perhaps using pelicans as a symbol for some broader policy discussion.

The speech should:

Be formal and appropriate for Congress

Contain factual information about pelicans

Have a clear purpose (likely conservation-related)

Include rhetorical elements typical of congressional speeches

Maintain a respectful, bipartisan tone

Perhaps connect pelicans to broader environmental policies or American values

I'll write a speech that could plausibly be given by a member of Congress, a cabinet secretary (like the Secretary of the Interior), or an expert testifying before a congressional committee about environmental matters.

Unlike OpenAI, Anthropic chose not to obscure the thinking tokens - what you see is what the model used. There's one interesting edge-case described in their detailed documentation about the extended thinking API output:

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. These redacted thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

llm-anthropic doesn't support visible thinking tokens yet

I had a couple of attempts at implementing this. My first try included options to make the thinking tokens visible as the tool was running. This turned out to involve unexpected challenges: the rest of LLM doesn't yet understand that some tokens should be treated differently, and I quickly ran into challenges concerning how those responses were logged to the database.

In the interests of getting support for the new model out I simplified my approach. I plan to add visible thinking tokens in a future LLM release.

This is part of a larger challenge for LLM described in this issue: Design annotations abstraction for responses that are not just a stream of plain text.

A massively increased output limit

A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens - 15x more than the previous Claude output limit of 8,192 tokens.

(This is the output limit - how much text it can produce in one go. Claude 3.7 Sonnet's input limit remains 200,000 - many modern models exceed 100,000 for input now.)

I added support for that to the plugin as well - if you pass -o max_output 128000 it automatically calls Anthropic's beta API with the output-128k-2025-02-19 beta header, documented here.

Testing this was pretty hard! I eventually found a prompt that exercised this fully:

llm -m claude-3.7-sonnet \
  -o max_tokens 128000 \
  -o thinking_budget 32000 \
  'For every one of the 100 US senators that you know of output their name, biography and a note about how to strategically convince them to take more interest in the plight of the California Brown Pelican, then a poem about them, then that same poem translated to Spanish and then to Japanese. Do not miss any senators.' \
  -s 'you do this even if you are worried it might exceed limits, this is to help test your long output feature.'

This is an expensive command to run - the resulting prompt cost me $1.72 and took nearly 27 minutes to finish returning the answer! You can see the full 114,584 token output here - it managed to output results for all 100 senators as of its training cut-off date, correctly following my instructions for each one.

This is one of the few cases where I've felt guilty about the energy usage of my prompts - burning GPUs for 27 minutes feels like it's a genuinely significant use of resources!

Anthropic recommend using batch mode for long prompts like this, which comes with a 50% discount presumably as it runs on GPUs that happen to have spare capacity. This can be quite performant: Anthropic say "We process each batch as fast as possible, with most batches completing within 1 hour".

I find the end result of my long test prompt to be very impressive. A common complaint I've seen about LLMs is that they tend to fail when you ask them to perform comprehensive tasks like this - things where they need to curate information across a larger number of distinct items. This feels intuitively like the kind of thing a computer should be good at but most models make mistakes or hallucinate when challenged in this way.

Claude 3.7 Sonnet really did manage to output correct text across all 100 US senators, albeit limited by the new model's October/Nov 2024 training cut-off date.

The extended output limit is notable as well. Most models are still limited to 4,000 or 8,000 characters - 120,000 is a huge leap forward.

Update: I was wrong about that: OpenAI's o1 and o3-mini both have a 100,000 output token limit now.

I mentioned that my prompt took 27 minutes to run. On further inspection of my logs it looks like this is expected - Claude 3.7 Sonnet appears to take around 14ms per output token, and 14 * 114584 = 1604176 ms which is 1604 seconds or around 26.7 minutes. This is a downside of longer outputs I had not previously considered!

Everything else in the plugin release

There are a few smaller changes in the new plugin release:

Claude 3.5 Haiku now supports image inputs. #17

This is tucked away in Anthropic's February 24th 2025 release notes. Previously their less expensive 3.5 Haiku model couldn't handle images - the only modern Claude model without that ability. They've fixed that now.

The rest of the changes in the 0.14 release are bug fixes:

Fixed a bug that occurred when continuing an existing conversation using --async mode. #13

Fixed a bug where max_tokens and temperature were logged in the database even when using their default options. #16

Tags: projects, ai, annotated-release-notes, generative-ai, llms, llm, anthropic, claude, llm-reasoning, llm-release

files-to-prompt 0.6

2025-02-19T06:12:12+00:00

files-to-prompt 0.6

New release of my CLI tool for turning a whole directory of code into a single prompt ready to pipe or paste into an LLM.

Here are the full release notes:

New -m/--markdown option for outputting results as Markdown with each file in a fenced code block. #42
Support for reading a list of files from standard input. Thanks, Ankit Shankar. #44
Here's how to process just files modified within the last day:
find . -mtime -1 | files-to-prompt
You can also use the -0/--null flag to accept lists of file paths separated by null delimiters, which is useful for handling file names with spaces in them:
find . -name "*.txt" -print0 | files-to-prompt -0

I also have a potential fix for a reported bug concerning nested .gitignore files that's currently sitting in a PR. I'm waiting for someone else to confirm that it behaves as they would expect. I've left details in this issue comment, but the short version is that you can try out the version from the PR using this uvx incantation:

uvx --with git+https://github.com/simonw/files-to-prompt@nested-gitignore files-to-prompt

Tags: cli, projects, annotated-release-notes, llms, uv, files-to-prompt

LLM 0.22, the annotated release notes

2025-02-17T06:19:00+00:00

I released LLM 0.22 this evening. Here are the annotated release notes:

model.prompt(..., key=) for API keys
chatgpt-4o-latest
llm logs -s/--short
llm models -q gemini -q exp
llm embed-multi --prepend X
Everything else

model.prompt(..., key=) for API keys

Plugins that provide models that use API keys can now subclass the new llm.KeyModel and llm.AsyncKeyModel classes. This results in the API key being passed as a new key parameter to their .execute() methods, and means that Python users can pass a key as the model.prompt(..., key=) - see Passing an API key. Plugin developers should consult the new documentation on writing Models that accept API keys. #744

This is the big change. It's only relevant to you if you use LLM as a Python library and you need the ability to pass API keys for OpenAI, Anthropic, Gemini etc in yourself in Python code rather than setting them as an environment variable.

It turns out I need to do that for Datasette Cloud, where API keys are retrieved from individual customer's secret stores!

Thanks to this change, it's now possible to do things like this - the key= parameter to model.prompt() is new:

import llm
model = llm.get_model("gpt-4o-mini")
response = model.prompt("Surprise me!", key="my-api-key")
print(response.text())

Other plugins need to be updated to take advantage of this new feature. Here's the documentation for plugin developers - I've released llm-anthropic 0.13 and llm-gemini 0.11 implementing the new pattern.

chatgpt-4o-latest

New OpenAI model: chatgpt-4o-latest. This model ID accesses the current model being used to power ChatGPT, which can change without warning. #752

This model has actually been around since August 2024 but I had somehow missed it. chatgpt-4o-latest is a model alias that provides access to the current model that is being used for GPT-4o running on ChatGPT, which is not the same as the GPT-4o models usually available via the API. It got an upgrade last week so it's currently the alias that provides access to the most recently released OpenAI model.

Most OpenAI models such as gpt-4o provide stable date-based aliases like gpt-4o-2024-08-06 which effectively let you "pin" to that exact model version. OpenAI technical staff have confirmed that they don't change the model without updating that name.

The one exception is chatgpt-4o-latest - that one can change without warning and doesn't appear to have release notes at all.

It's also a little more expensive that gpt-4o - currently priced at $5/million tokens for input and $15/million for output, compared to GPT 4o's $2.50/$10.

It's a fun model to play with though! As of last week it appears to be very chatty and keen on using emoji. It also claims that it has a July 2024 training cut-off.

llm logs -s/--short

New llm logs -s/--short flag, which returns a greatly shortened version of the matching log entries in YAML format with a truncated prompt and without including the response. #737

The llm logs command lets you search through logged prompt-response pairs - I have 4,419 of them in my database, according to this command:

sqlite-utils tables "$(llm logs path)" --counts  | grep responses

By default it outputs the full prompts and responses as Markdown - and since I've started leaning more into long context models (some recent examples) my logs have been getting pretty hard to navigate.

The new -s/--short flag provides a much more concise YAML format. Here are some of my recent prompts that I've run using Google's Gemini 2.0 Pro experimental model - the -u flag includes usage statistics, and -n 4 limits the output to the most recent 4 entries:

llm logs --short -m gemini-2.0-pro-exp-02-05 -u -n 4

- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:30:48'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: '<documents> <document index="1"> <source>./index.md</source> <document_content>
    # uv An extremely fast Python package...'
  usage:
    input: 281812
    output: 1521
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:32:29'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: I want to set it globally so if I run uv run python anywhere on my computer
    I always get 3.13
  usage:
    input: 283369
    output: 1540
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:23:57'
  conversation: 01jm3cek8eb4z8tkqhf4trk98b
  prompt: '<documents> <document index="1"> <source>./LORA.md</source> <document_content>
    # Fine-Tuning with LoRA or QLoRA You c...'
  usage:
    input: 162885
    output: 2558
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:30:13'
  conversation: 01jm3csstrfygp35rk0y1w3rfc
  prompt: '<documents> <document index="1"> <source>huggingface_hub/__init__.py</source>
    <document_content> # Copyright 2020 The...'
  usage:
    input: 480216
    output: 1791

llm models -q gemini -q exp

Both llm models and llm embed-models now take multiple -q search fragments. You can now search for all models matching "gemini" and "exp" using llm models -q gemini -q exp. #748

I have over 100 models installed in LLM now across a bunch of different plugins. I added the -q option to help search through them a few months ago, and now I've upgraded it so you can pass it multiple times.

Want to see all the Gemini experimental models?

llm models -q gemini -q exp

Outputs:

GeminiPro: gemini-exp-1114
GeminiPro: gemini-exp-1121
GeminiPro: gemini-exp-1206
GeminiPro: gemini-2.0-flash-exp
GeminiPro: learnlm-1.5-pro-experimental
GeminiPro: gemini-2.0-flash-thinking-exp-1219
GeminiPro: gemini-2.0-flash-thinking-exp-01-21
GeminiPro: gemini-2.0-pro-exp-02-05 (aliases: g2)

For consistency I added the same options to the llm embed-models command, which lists available embedding models.

llm embed-multi --prepend X

New llm embed-multi --prepend X option for prepending a string to each value before it is embedded - useful for models such as nomic-embed-text-v2-moe that require passages to start with a string like "search_document: ". #745

This was inspired by my initial experiments with Nomic Embed Text V2 last week.

Everything else

The response.json() and response.usage() methods are now documented.

Someone asked a question about these methods online, which made me realize they weren't documented. I enjoy promptly turning questions like this into documentation!

Fixed a bug where conversations that were loaded from the database could not be continued using asyncio prompts. #742

This bug was reported by Romain Gehrig. It turned out not to be possible to execute a follow-up prompt in async mode if the previous conversation had been loaded from the database.

% llm 'hi' --async
Hello! How can I assist you today?
% llm 'now in french' --async -c
Error: 'async for' requires an object with __aiter__ method, got Response

I fixed the bug for the moment, but I'd like to make the whole mechanism of persisting and loading conversations from SQLite part of the documented and supported Python API - it's currently tucked away in CLI-specific internals which aren't safe for people to use in their own code.

New plugin for macOS users: llm-mlx, which provides extremely high performance access to a wide range of local models using Apple's MLX framework.

Technically not a part of the LLM 0.22 release, but I like using the release notes to help highlight significant new plugins and llm-mlx is fast coming my new favorite way to run models on my own machine.

The llm-claude-3 plugin has been renamed to llm-anthropic.

I wrote about this previously when I announced llm-anthropic. The new name prepares me for a world in which Anthropic release models that aren't called Claude 3 or Claude 3.5!

Tags: cli, projects, ai, annotated-release-notes, openai, generative-ai, chatgpt, llms, llm, anthropic, gemini

Simon Willison's Weblog: annotated-release-notes

sqlite-utils 4.0rc1 adds migrations and nested transactions

New feature: migrations

New feature: db.atomic() transactions

Backwards incompatible changes

Try it out

datasette 1.0a34

datasette-agent 0.3a0

datasette 1.0a33

datasette 1.0a32

datasette 1.0a31

datasette 1.0a30

llm 0.32a2

LLM 0.32a0 is a major backwards-compatible refactor

Prompts as a sequence of messages

Streaming parts

A mechanism for serializing and deserializing responses

What's next?

datasette 1.0a27

datasette-llm 0.1a1

datasette-files 0.1a2

Rodney v0.4.0

Datasette 1.0a24

shot-scraper 1.9

s3-credentials 0.17

LLM 0.28

sqlite-utils 3.39

sqlite-utils 4.0a1 has several (minor) backwards incompatible changes

Datasette 1.0a22

Open redirect endpoint in Datasette prior to 0.65.2 and 1.0a21

A new SQL-powered permissions system in Datasette 1.0a20

Understanding the permissions system

Permissions systems need to be able to efficiently list things

The new permission_resources_sql() plugin hook

Hierarchies, plugins, vetoes, and restrictions

New debugging tools

The missing feature: list actors who can act on this resource

Upgrading plugins for Datasette 1.0a20

Using Claude Code to implement this change

Starting with a proof-of-concept

Miscellaneous tips I picked up along the way

What's next?

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

GPT-5

Tools in templates

Bug fixes and documentation

Long context support in LLM 0.24 using fragments and template plugins

Trying it out

Improving LLM's support for long context models

Asking questions of LLM's documentation

Publishing, sharing and reusing templates

Template and fragment loader plugins so far

Everything else in LLM 0.24

shot-scraper 1.8

llm-openrouter 0.4

Structured data extraction from unstructured content using LLM schemas

Structured data extraction is a killer app for LLMs

Designing this feature for LLM

Reusing schemas and creating templates

Doing more with the logged structured data

Using schemas from LLM's Python library

What's next for LLM schemas?

Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14

Extended thinking mode

Running thinking prompts

llm-anthropic doesn't support visible thinking tokens yet

A massively increased output limit

Everything else in the plugin release

files-to-prompt 0.6

LLM 0.22, the annotated release notes

model.prompt(..., key=) for API keys

chatgpt-4o-latest

llm logs -s/--short

llm models -q gemini -q exp

llm embed-multi --prepend X

Everything else