Simon Willison's Weblog: plugins

datasette-llm 0.1a1

2026-03-25T21:24:31+00:00

New release of the base plugin that makes models from LLM available for use by other Datasette plugins such as datasette-enrichments-llm.

New register_llm_purposes() plugin hook and get_purposes() function for retrieving registered purpose strings. #1

One of the responsibilities of this plugin is to configure which models are used for which purposes, so you can say in one place "data enrichment uses GPT-5.4-nano but SQL query assistance happens using Sonnet 4.6", for example.

Plugins that depend on this can use model = await llm.model(purpose="enrichment") to indicate the purpose of the prompts they wish to execute against the model. Those plugins can now also use the new register_llm_purposes() hook to register those purpose strings, which means future plugins can list those purposes in one place to power things like an admin UI for assigning models to purposes.

Tags: plugins, datasette, annotated-release-notes, llm

Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI

2025-11-06T18:26:05+00:00

I'm upgrading various plugins for compatibility with the new Datasette 1.0a20 alpha release and I decided to record a video of the process. This post accompanies that video with detailed additional notes.

The datasette-checkbox plugin

I picked a very simple plugin to illustrate the upgrade process (possibly too simple). datasette-checkbox adds just one feature to Datasette: if you are viewing a table with boolean columns (detected as integer columns with names like is_active or has_attachments or should_notify) and your current user has permission to update rows in that table it adds an inline checkbox UI that looks like this:

I built the first version with the help of Claude back in August 2024 - details in this issue comment.

Most of the implementation is JavaScript that makes calls to Datasette 1.0's JSON write API. The Python code just checks that the user has the necessary permissions before including the extra JavaScript.

Running the plugin's tests

The first step in upgrading any plugin is to run its tests against the latest Datasette version.

Thankfully uv makes it easy to run code in scratch virtual environments that include the different code versions you want to test against.

I have a test utility called tadd (for "test against development Datasette") which I use for that purpose. I can run it in any plugin directory like this:

tadd

And it will run the existing plugin tests against whatever version of Datasette I have checked out in my ~/dev/datasette directory.

You can see the full implementation of tadd (and its friend radd described below) in this TIL - the basic version looks like this:

#!/bin/sh
uv run --no-project --isolated \
  --with-editable '.[test]' --with-editable ~/dev/datasette \
  python -m pytest "$@"

I started by running tadd in the datasette-checkbox directory, and got my first failure... but it wasn't due to permissions, it was because the pyproject.toml for the plugin was pinned to a specific mismatched version of Datasette:

dependencies = [
    "datasette==1.0a19"
]

I fixed this problem by swapping == to >= and ran the tests again... and they passed! Which was a problem because I was expecting permission-related failures.

It turns out when I first wrote the plugin I was lazy with the tests - they weren't actually confirming that the table page loaded without errors.

I needed to actually run the code myself to see the expected bug.

First I created myself a demo database using sqlite-utils create-table:

sqlite-utils create-table demo.db \
  demo id integer is_checked integer --pk id

Then I ran it with Datasette against the plugin's code like so:

radd demo.db

Sure enough, visiting /demo/demo produced a 500 error about the missing Datasette.permission_allowed() method.

The next step was to update the test to also trigger this error:

@pytest.mark.asyncio
async def test_plugin_adds_javascript():
    datasette = Datasette()
    db = datasette.add_memory_database("demo")
    await db.execute_write(
        "CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, is_active INTEGER)"
    )
    await datasette.invoke_startup()
    response = await datasette.client.get("/demo/test")
    assert response.status_code == 200

And now tadd fails as expected.

Upgrading the plugin with Codex

It this point I could have manually fixed the plugin itself - which would likely have been faster given the small size of the fix - but instead I demonstrated a bash one-liner I've been using to apply these kinds of changes automatically:

codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"

codex exec runs OpenAI Codex in non-interactive mode - it will loop until it has finished the prompt you give it.

I tell it to consult the subset of the Datasette upgrade documentation that talks about Datasette permissions and then get the tadd command to pass its tests.

This is an example of what I call designing agentic loops - I gave Codex the tools it needed (tadd) and a clear goal and let it get to work on my behalf.

The remainder of the video covers finishing up the work - testing the fix manually, commiting my work using:

git commit -a -m "$(basename "$PWD") for datasette>=1.0a20" \
  -m "Refs https://github.com/simonw/datasette/issues/2577"

Then shipping a 0.1a4 release to PyPI using the pattern described in this TIL. Finally, I demonstrated that the shipped plugin worked in a fresh environment using uvx like this:

uvx --prerelease=allow --with datasette-checkbox \
  datasette --root ~/dev/ecosystem/datasette-checkbox/demo.db

Executing this command installs and runs a fresh Datasette instance with a fresh copy of the new alpha plugin (--prerelease=allow). It's a neat way of confirming that freshly released software works as expected.

A colophon for the video

This video was shot in a single take using Descript, with no rehearsal and perilously little preparation in advance. I recorded through my AirPods and applied the "Studio Sound" filter to clean up the audio. I pasted in a simonwillison.net closing slide from my previous video and exported it locally at 1080p, then uploaded it to YouTube.

Something I learned from the Software Carpentry instructor training course is that making mistakes in front of an audience is actively helpful - it helps them see a realistic version of how software development works and they can learn from watching you recover. I see this as a great excuse for not editing out all of my mistakes!

I'm trying to build new habits around video content that let me produce useful videos while minimizing the amount of time I spend on production.

I plan to iterate more on the format as I get more comfortable with the process. I'm hoping I can find the right balance between production time and value to viewers.

Tags: plugins, python, youtube, ai, datasette, generative-ai, llms, ai-assisted-programming, uv, coding-agents, codex-cli

A new SQL-powered permissions system in Datasette 1.0a20

2025-11-04T21:34:42+00:00

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.

Understanding the permissions system

Datasette's permissions system exists to answer the following question:

Is this actor allowed to perform this action, optionally against this particular resource?

An actor is usually a user, but might also be an automation operating via the Datasette API.

An action is a thing they need to do - things like view-table, execute-sql, insert-row.

A resource is the subject of the action - the database you are executing SQL against, the table you want to insert a row into.

Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data.

Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes a write API with a need to configure who can insert, update, and delete rows or create new tables.

Actors can be authenticated in a number of different ways provided by plugins using the actor_from_request() plugin hook. datasette-auth-passwords and datasette-auth-github and datasette-auth-existing-cookies are examples of authentication plugins.

Permissions systems need to be able to efficiently list things

The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result.

This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view?

I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables.

Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is really good at filtering large numbers of records.

The new permission_resources_sql() plugin hook

The biggest change in the new release is that I've replaced the previous permission_allowed(actor, action, resource) plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new permission_resources_sql(actor, action) plugin hook.

Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against.

Here's an example, lifted from the documentation:

from datasette import hookimpl
from datasette.permissions import PermissionSQL


@hookimpl
def permission_resources_sql(datasette, actor, action):
    if action != "view-table":
        return None
    if not actor or actor.get("id") != "alice":
        return None

    return PermissionSQL(
        sql="""
            SELECT
                'accounting' AS parent,
                'sales' AS child,
                1 AS allow,
                'alice can view accounting/sales' AS reason
        """,
    )

This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database.

The PermissionSQL object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging.

When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against the internal catalog tables and efficiently lists all the resources the actor can access.

This query can then be limited or paginated to avoid loading too many results at once.

Hierarchies, plugins, vetoes, and restrictions

Datasette has several additional requirements that make the permissions system more complicated.

Datasette permissions can optionally act against a two-level hierarchy. You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in every database in that Datasette instance.

Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example.

Datasette currently has ten default actions but plugins that add additional features can register new actions to better participate in the permission systems.

Datasette's permission system has a mechanism to veto permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level.

Finally, Datasette includes a mechanism for applying additional restrictions to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are described in more detail in the documentation.

That's a lot of different moving parts for the new implementation to cover.

New debugging tools

Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible.

The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user:

And this page listing the rules that apply to that question - since different plugins may return different rules which get combined together:

This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the --root option.)

Those rules are defined in the datasette/default_permissions.py Python module.

The missing feature: list actors who can act on this resource

There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource.

It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as datasette-auth-github mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before.

API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system.

This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons.

Upgrading plugins for Datasette 1.0a20

Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out.

The Datasette upgrade guide includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document.

I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working extremely well. I've even had Claude Code update those notes itself with things it learned during an upgrade process!

This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect.

I've also been leaning heavily on uv to help with the upgrade process. I wrote myself two new helper scripts - tadd and radd - to help test the new plugins.

tadd = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to pytest so I can run tadd -k test_name or tadd -x --pdb as needed.
radd = "run against datasette dev" - it runs the latest dev datasette command with the plugin installed.

The tadd and radd implementations can be found in this TIL.

Some of my plugin upgrades have become a one-liner to the codex exec command, which runs OpenAI Codex CLI with a prompt without entering interactive mode:

codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"

There are still a bunch more to go - there's a list in this tracking issue - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place.

Using Claude Code to implement this change

This change to Datasette core by far the most ambitious piece of work I've ever attempted using a coding agent.

Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window.

Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable.

I no longer see codebase size as a limiting factor for how useful they can be.

I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right.

For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI.

I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification.

Starting with a proof-of-concept

I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My research/sqlite-permissions-poc project was the one that finally convinced me of a viable approach,

That one started as a free ranging conversation with Claude, at the end of which I told it to generate a specification which I then fed into GPT-5 to implement. You can see that specification at the end of the README.

I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment.

This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months.

Miscellaneous tips I picked up along the way

When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The tadd and radd shortcuts were invaluable for productively working on those plugins while I made changes to core.
Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.
Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.
Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes".
Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch.
A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: Use gh CLI to fetch comments on URL-to-PR and make the requested changes. This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc.
The code I write with LLMs is higher quality code. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low.
Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.
That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as Claude Skills. I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.

What's next?

Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same.

The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week.

This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year!

Tags: plugins, projects, python, sql, sqlite, datasette, annotated-release-notes, uv, coding-agents, claude-code, codex-cli

Superpowers: How I'm using coding agents in October 2025

2025-10-10T23:30:14+00:00

Superpowers: How I'm using coding agents in October 2025

A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right.

Jesse is one of the most creative users of coding agents (Claude Code in particular) that I know. He's put a great amount of work into evolving an effective process for working with them, encourage red/green TDD (watch the test fail first), planning steps, self-updating memory notes and even implementing a feelings journal ("I feel engaged and curious about this project" - Claude).

Claude Code just launched plugins, and Jesse is celebrating by wrapping up a whole host of his accumulated tricks as a new plugin called Superpowers. You can add it to your Claude Code like this:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

There's a lot in here! It's worth spending some time browsing the repository - here's just one fun example, in skills/debugging/root-cause-tracing/SKILL.md:

---
name: Root Cause Tracing
description: Systematically trace bugs backward through call stack to find original trigger
when_to_use: Bug appears deep in call stack but you need to find where it originates
version: 1.0.0
languages: all
---
Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.

When to Use
digraph when_to_use {
    "Bug appears deep in stack?" [shape=diamond];
    "Can trace backwards?" [shape=diamond];
    "Fix at symptom point" [shape=box];
    "Trace to original trigger" [shape=box];
    "BETTER: Also add defense-in-depth" [shape=box];

    "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
    "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
    "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
    "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
}
[...]

This one is particularly fun because it then includes a Graphviz DOT graph illustrating the process - it turns out Claude can interpret those as workflow instructions just fine, and Jesse has been wildly experimenting with them.

I vibe-coded up a quick URL-based DOT visualizer, here's that one rendered:

There is so much to learn about putting these tools to work in the most effective way possible. Jesse is way ahead of the curve, so it's absolutely worth spending some time exploring what he's shared so far.

And if you're worried about filling up your context with a bunch of extra stuff, here's a reassuring note from Jesse:

The core of it is VERY token light. It pulls in one doc of fewer than 2k tokens. As it needs bits of the process, it runs a shell script to search for them. The long end to end chat for the planning and implementation process for that todo list app was 100k tokens.

It uses subagents to manage token-heavy stuff, including all the actual implementation.

(Jesse's post also tipped me off about Claude's /mnt/skills/public folder, see my notes here.)

Tags: plugins, ai, prompt-engineering, generative-ai, llms, ai-assisted-programming, anthropic, claude, vibe-coding, coding-agents, claude-code, sub-agents, jesse-vincent, skills

Claude can write complete Datasette plugins now

2025-10-08T23:43:43+00:00

This isn't necessarily surprising, but it's worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now.

I've seen models complete aspects of this in the past, but today is the first time I've shipped a new plugin where every line of code and test was written by Claude, with minimal prompting from myself.

The plugin is called datasette-os-info. It's a simple debugging tool - all it does is add a /-/os JSON page which dumps out as much information as it can about the OS it's running on. Here's a live demo on my TIL website.

I built it to help experiment with changing the Docker base container that Datasette uses to publish images to one that uses Python 3.14.

Here's the full set of commands I used to create the plugin. I started with my datasette-plugin cookiecutter template:

uvx cookiecutter gh:simonw/datasette-plugin

  [1/8] plugin_name (): os-info
  [2/8] description (): Information about the current OS
  [3/8] hyphenated (os-info): 
  [4/8] underscored (os_info): 
  [5/8] github_username (): datasette
  [6/8] author_name (): Simon Willison
  [7/8] include_static_directory (): 
  [8/8] include_templates_directory ():

This created a datasette-os-info directory with the initial pyproject.toml and tests/ and datasette_os_info/__init__.py files. Here's an example of that starter template.

I created a uv virtual environment for it, installed the initial test dependencies and ran pytest to check that worked:

cd datasette-os-info
uv venv
uv sync --extra test
uv run pytest

Then I fired up Claude Code in that directory in YOLO mode:

claude --dangerously-skip-permissions

(I actually used my claude-yolo shortcut which runs the above.)

Then, in Claude, I told it how to run the tests:

Run uv run pytest

When that worked, I told it to build the plugin:

This is a Datasette plugin which should add a new page /-/os which returns pretty-printed JSON about the current operating system - implement it. I want to pick up as many details as possible across as many OS as possible, including if possible figuring out the base image if it is in a docker container - otherwise the Debian OS release name and suchlike would be good

... and that was it! Claude implemented the plugin using Datasette's register_routes() plugin hook to add the /-/os page,and then without me prompting it to do so built this basic test as well.

It ran the new test, spotted a bug (it had guessed a non-existent Response(..., default_repr=) parameter), fixed the bug and declared itself done.

I built myself a wheel:

uv pip install build
uv run python -m build

Then uploaded that to an S3 bucket and deployed it to test it out using datasette publish ... --install URL-to-wheel. It did exactly what I had hoped - here's what that /-/os page looked like:

{
  "platform": {
    "system": "Linux",
    "release": "4.4.0",
    "version": "#1 SMP Sun Jan 10 15:06:54 PST 2016",
    "machine": "x86_64",
    "processor": "",
    "architecture": [
      "64bit",
      ""
    ],
    "platform": "Linux-4.4.0-x86_64-with-glibc2.41",
    "python_version": "3.14.0",
    "python_implementation": "CPython"
  },
  "hostname": "localhost",
  "cpu_count": 2,
  "linux": {
    "os_release": {
      "PRETTY_NAME": "Debian GNU/Linux 13 (trixie)",
      "NAME": "Debian GNU/Linux",
      "VERSION_ID": "13",
      "VERSION": "13 (trixie)",
      "VERSION_CODENAME": "trixie",
      "DEBIAN_VERSION_FULL": "13.1",
      "ID": "debian",
      "HOME_URL": "https://www.debian.org/",
      "SUPPORT_URL": "https://www.debian.org/support",
      "BUG_REPORT_URL": "https://bugs.debian.org/"
    },
    "debian_version": "13.1",
    "kernel": {
      "sysname": "Linux",
      "nodename": "localhost",
      "release": "4.4.0",
      "version": "#1 SMP Sun Jan 10 15:06:54 PST 2016",
      "machine": "x86_64"
    },
    "cpu_model": "unknown",
    "memory": {
      "MemTotal": "1048576 kB",
      "MemFree": "958244 kB",
      "MemAvailable": "958244 kB",
      "SwapTotal": "0 kB",
      "SwapFree": "0 kB"
    }
  },
  "environment": {
    "HOME": "/root",
    "PATH": "/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
  },
  "python_executable": "/usr/local/bin/python3.14"
}

I dread to think how long it would have taken me to figure out all of those different useful OS details!

To finish up, I had Claude fill out the README via a couple of extra prompts:

Add usage instructions to the README (about the /-/os page)

Then had a brief conversation about how safe the output was:

Is there anything potentially sensitive in that output?

Claude said:

Yes, there are several potentially sensitive items in the output:

Most sensitive:

hostname - Could reveal internal naming schemes or organization details

Environment variables:

HOME - Reveals user's home directory path

USER - Reveals the username

PATH - Reveals filesystem structure and software installation locations

VIRTUAL_ENV - Could reveal project paths and directory structure

python_executable - Reveals filesystem paths and user directories

Moderately sensitive:

CPU count and memory info - Could reveal infrastructure capacity

Kernel version details - Could potentially be used to identify specific vulnerabilities

Considerations:

This endpoint is accessible without authentication by default (at /-/os)

If you're running Datasette on a public server, this information would be visible to anyone

The information could help attackers understand your infrastructure setup

Suggestions:

You might want to:

Add authentication requirements to this endpoint

Make certain fields optional/configurable

Add a warning in the README that this exposes system information

Consider filtering out or redacting sensitive paths (replace usernames with ***, etc.)

Would you like me to add any of these protections to the plugin?

In this case I'm fine with the current behavior provided people understand the risks, so I told it:

Add the warning to the README

And then (because the warning it added was quite long):

Drop the bit about configuring the permission system, just have a short warning telling people to review what it exposes

And requested an extra README note:

Add a note that you can also see the output by running: datasette --get /-/os

Three last prompts:

Add uv.lock to gitignore
Drop Python 3.9 and add Python 3.14 - to the GitHub workflows, also min version in pyproject.toml
Bump to setup-python@v6

... and that was the project finished. I pushed it to GitHub, configured Trusted Publishing for it on PyPI and posted the 0.1 release, which ran this GitHub Actions publish.yml and deployed that release to datasette-os-info on PyPI.

Now that it's live you can try it out without even installing Datasette using a uv one-liner like this:

uv run --isolated \
  --with datasette-os-info \
  datasette --get /-/os

That's using the --get PATH CLI option to show what that path in the Datasette instance would return, as described in the Datasette documentation.

I've shared my full Claude Code transcript in a Gist.

A year ago I'd have been very impressed by this. Today I wasn't even particularly surprised that this worked - the coding agent pattern implemented by Claude Code is spectacularly effective when you combine it with pre-existing templates, and Datasette has been aroung for long enough now that plenty of examples of plugins have made it into the training data for the leading models.

Tags: plugins, projects, python, ai, datasette, generative-ai, llms, ai-assisted-programming, anthropic, claude, uv, coding-agents, claude-code

llm-fragments-youtube

2025-06-14T18:19:37+00:00

llm-fragments-youtube

Excellent new LLM plugin by Agustin Bacigalup which lets you use the subtitles of any YouTube video as a fragment for running prompts against.

I tried it out like this:

llm install llm-fragments-youtube
llm -f youtube:dQw4w9WgXcQ \
  'summary of people and what they do'

Which returned (full transcript):

The lyrics you've provided are from the song "Never Gonna Give You Up" by Rick Astley. The song features a narrator who is expressing unwavering love and commitment to another person. Here's a summary of the people involved and their roles:

The Narrator (Singer): A person deeply in love, promising loyalty, honesty, and emotional support. They emphasize that they will never abandon, hurt, or deceive their partner.

The Partner (Implied Listener): The person the narrator is addressing, who is experiencing emotional pain or hesitation ("Your heart's been aching but you're too shy to say it"). The narrator is encouraging them to understand and trust in the commitment being offered.

In essence, the song portrays a one-sided but heartfelt pledge of love, with the narrator assuring their partner of their steadfast dedication.

The plugin works by including yt-dlp as a Python dependency and then executing it via a call to subprocess.run().

Tags: plugins, youtube, ai, generative-ai, llms, llm

llm-mistral 0.14

2025-05-29T03:33:17+00:00

llm-mistral 0.14

I added tool-support to my plugin for accessing the Mistral API from LLM today, plus support for Mistral's new Codestral Embed embedding model.

An interesting challenge here is that I'm not using an official client library for llm-mistral - I rolled my own client on top of their streaming HTTP API using Florimond Manca's httpx-sse library. It's a very pleasant way to interact with streaming APIs - here's my code that does most of the work.

The problem I faced is that Mistral's API documentation for function calling has examples in Python and TypeScript but doesn't include curl or direct documentation of their HTTP endpoints!

I needed documentation at the HTTP level. Could I maybe extract that directly from Mistral's official Python library?

It turns out I could. I started by cloning the repo:

git clone https://github.com/mistralai/client-python
cd client-python/src/mistralai
files-to-prompt . | ttok

My ttok tool gave me a token count of 212,410 (counted using OpenAI's tokenizer, but that's normally a close enough estimate) - Mistral's models tap out at 128,000 so I switched to Gemini 2.5 Flash which can easily handle that many.

I ran this:

files-to-prompt -c . > /tmp/mistral.txt

llm -f /tmp/mistral.txt \
  -m gemini-2.5-flash-preview-05-20 \
  -s 'Generate comprehensive HTTP API documentation showing
how function calling works, include example curl commands for each step'

The results were pretty spectacular! Gemini 2.5 Flash produced a detailed description of the exact set of HTTP APIs I needed to interact with, and the JSON formats I should pass to them.

There are a bunch of steps needed to get tools working in a new model, as described in the LLM plugin authors documentation. I started working through them by hand... and then got lazy and decided to see if I could get a model to do the work for me.

This time I tried the new Claude Opus 4. I fed it three files: my existing, incomplete llm_mistral.py, a full copy of llm_gemini.py with its working tools implementation and a copy of the API docs Gemini had written for me earlier. I prompted:

I need to update this Mistral code to add tool support. I've included examples of that code for Gemini, and a detailed README explaining the Mistral format.

Claude churned away and wrote me code that was most of what I needed. I tested it in a bunch of different scenarios, pasted problems back into Claude to see what would happen, and eventually took over and finished the rest of the code myself. Here's the full transcript.

I'm a little sad I didn't use Mistral to write the code to support Mistral, but I'm pleased to add yet another model family to the list that's supported for tool usage in LLM.

Tags: plugins, projects, python, ai, httpx, generative-ai, llms, ai-assisted-programming, llm, claude, mistral, gemini, llm-tool-use, claude-4

llm-llama-server 0.2

2025-05-28T06:27:14+00:00

llm-llama-server 0.2

Here's a second option for using LLM's new tool support against local models (the first was via llm-ollama).

It turns out the llama.cpp ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server plugin only needed a quick upgrade to get those working there.

Unfortunately it looks like streaming support doesn't work with tools in llama-server at the moment, so I added a new model ID called llama-server-tools which disables streaming and enables tools.

Here's how to try it out. First, ensure you have llama-server - the easiest way to get that on macOS is via Homebrew:

brew install llama.cpp

Start the server running like this. This command will download and cache the 3.2GB unsloth/gemma-3-4b-it-GGUF:Q4_K_XL if you don't yet have it:

llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

Then in another window:

llm install llm-llama-server
llm -m llama-server-tools -T llm_time 'what time is it?' --td

And since you don't even need an API key for this, even if you've never used LLM before you can try it out with this uvx one-liner:

uvx --with llm-llama-server llm -m llama-server-tools -T llm_time 'what time is it?' --td

For more notes on using llama.cpp with LLM see Trying out llama.cpp’s new vision support from a couple of weeks ago.

Tags: plugins, projects, ai, generative-ai, llm, llm-tool-use, uv, llama-cpp

llm-pdf-to-images

2025-05-18T20:48:41+00:00

llm-pdf-to-images

Inspired by my previous llm-video-frames plugin, I thought it would be neat to have a plugin for LLM that can take a PDF and turn that into an image-per-page so you can feed PDFs into models that support image inputs but don't yet support PDFs.

This should now do exactly that:

llm install llm-pdf-to-images
llm -f pdf-to-images:path/to/document.pdf 'Summarize this document'

Under the hood it's using the PyMuPDF library. The key code to convert a PDF into images looks like this:

import fitz
doc = fitz.open("input.pdf")
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
    jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)

Once I'd figured out that code I got o4-mini to write most of the rest of the plugin, using llm-fragments-github to load in the example code from the video plugin:

llm -f github:simonw/llm-video-frames '
import fitz
doc = fitz.open("input.pdf")
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
    jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)
' -s 'output llm_pdf_to_images.py which adds a pdf-to-images: 
 fragment loader that converts a PDF to frames using fitz like in the example' \
-m o4-mini

Here's the transcript - more details in this issue.

I had some weird results testing this with GPT 4.1 mini. I created a test PDF with two pages - one white, one black - and ran a test prompt like this:

llm -f 'pdf-to-images:blank-pages.pdf' \
  'describe these images'

The first image features a stylized red maple leaf with triangular facets, giving it a geometric appearance. The maple leaf is a well-known symbol associated with Canada.

The second image is a simple black silhouette of a cat sitting and facing to the left. The cat's tail curls around its body. The design is minimalistic and iconic.

I got even wilder hallucinations for other prompts, like "summarize this document" or "describe all figures". I have a collection of those in this Gist.

Thankfully this behavior is limited to GPT-4.1 mini. I upgraded to full GPT-4.1 and got much more sensible results:

llm -f 'pdf-to-images:blank-pages.pdf' \
  'describe these images' -m gpt-4.1

Certainly! Here are the descriptions of the two images you provided:

First image: This image is completely white. It appears blank, with no discernible objects, text, or features.

Second image: This image is entirely black. Like the first, it is blank and contains no visible objects, text, or distinct elements.

If you have questions or need a specific kind of analysis or modification, please let me know!

Tags: pdf, plugins, projects, ai, generative-ai, llms, ai-assisted-programming, llm, hallucinations

django-simple-deploy

2025-05-17T12:49:52+00:00

django-simple-deploy

Eric Matthes presented a lightning talk about this project at PyCon US this morning. "Django has a deploy command now". You can run it like this:

pip install django-simple-deploy[fly_io]
# Add django_simple_deploy to INSTALLED_APPS.
python manage.py deploy --automate-all

It's plugin-based (inspired by Datasette!) and the project has stable plugins for three hosting platforms: dsd-flyio, dsd-heroku and dsd-platformsh.

Currently in development: dsd-vps - a plugin that should work with any VPS provider, using Paramiko to connect to a newly created instance and run all of the commands needed to start serving a Django application.

Tags: django, paramiko, plugins, python, heroku, datasette, fly

llm -f issue:... -m echo --no-log

2025-05-09T18:55:24+00:00

I had some notes in a GitHub issue thread in a private repository that I wanted to export as Markdown. I realized that I could get them using a combination of several recent projects.

Here's what I ran:

export GITHUB_TOKEN="$(llm keys get github)"                                             
llm -f issue:https://github.com/simonw/todos/issues/170 \
  -m echo --no-log | jq .prompt -r > notes.md

I have a GitHub personal access token stored in my LLM keys, for use with Anthony Shaw's llm-github-models plugin.

My own llm-fragments-github plugin expects an optional GITHUB_TOKEN environment variable, so I set that first - here's an issue to have it use the github key instead.

With that set, the issue: fragment loader can take a URL to a private GitHub issue thread and load it via the API using the token, then concatenate the comments together as Markdown. Here's the code for that.

Fragments are meant to be used as input to LLMs. I built a llm-echo plugin recently which adds a fake LLM called "echo" which simply echos its input back out again.

Adding --no-log prevents that junk data from being stored in my LLM log database.

The output is JSON with a "prompt" key for the original prompt. I use jq .prompt to extract that out, then -r to get it as raw text (not a "JSON string").

... and I write the result to notes.md.

Tags: github, plugins, projects, jq, llm, anthony-shaw

Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)

2025-05-05T17:38:25+00:00

The new llm-video-frames plugin can turn a video file into a sequence of JPEG frames and feed them directly into a long context vision LLM such as GPT-4.1, even when that LLM doesn't directly support video input. It depends on a plugin feature I added to LLM 0.25, which I released last night.

Here's how to try it out:

brew install ffmpeg # or apt-get or your package manager of choice
uv tool install llm # or pipx install llm or pip install llm
llm install llm-video-frames
llm keys set openai
# Paste your OpenAI API key here

llm -f video-frames:video.mp4 \
  'describe the key scenes in this video' \
  -m gpt-4.1-mini

The video-frames:filepath.mp4 syntax is provided by the new plugin. It takes that video, converts it to a JPEG for every second of the video and then turns those into attachments that can be passed to the LLM.

Here's a demo, against this video of Cleo:

llm -f video-frames:cleo.mp4 'describe key moments' -m gpt-4.1-mini

And the output from the model (transcript here):

The sequence of images captures the key moments of a dog being offered and then enjoying a small treat:

In the first image, a hand is holding a small cupcake with purple frosting close to a black dog's face. The dog looks eager and is focused intently on the treat.

The second image shows the dog beginning to take a bite of the cupcake from the person's fingers. The dog's mouth is open, gently nibbling on the treat.

In the third image, the dog has finished or is almost done with the treat and looks content, with a slight smile and a relaxed posture. The treat is no longer in the person's hand, indicating that the dog has consumed it.

This progression highlights the dog's anticipation, enjoyment, and satisfaction with the treat.

Total cost: 7,072 input tokens, 156 output tokens - for GPT-4.1 mini that's 0.3078 cents (less than a third of a cent).

In this case the plugin converted the video into three images: frame_00001.jpg, frame_00002.jpg and frame_00003.jpg.

The plugin accepts additional arguments. You can increase the frames-per-second using ?fps=2 - for example:

llm -f 'video-frames:video.mp4?fps=2' 'summarize this video'

Or you can add ?timestamps=1 to cause ffmpeg to overlay a timestamp in the bottom right corner of each frame. This gives the model a chance to return timestamps in its output.

Let's try that with the Cleo video:

llm -f 'video-frames:cleo.mp4?timestamps=1&fps=5' \
  'key moments, include timestamps' -m gpt-4.1-mini

Here's the output (transcript here):

Here are the key moments from the video "cleo.mp4" with timestamps:

00:00:00.000 - A dog on leash looks at a small cupcake with purple frosting being held by a person.

00:00:00.800 - The dog closely sniffs the cupcake.

00:00:01.400 - The person puts a small amount of the cupcake frosting on their finger.

00:00:01.600 - The dog starts licking the frosting from the person's finger.

00:00:02.600 - The dog continues licking enthusiastically.

Let me know if you need more details or a description of any specific part.

That one sent 14 images to the API, at a total cost of 32,968 input, 141 output = 1.3413 cents.

It sent 5.9MB of image data to OpenAI's API, encoded as base64 in the JSON API call.

The GPT-4.1 model family accepts up to 1,047,576 tokens. Aside from a 20MB size limit per image I haven't seen any documentation of limits on the number of images. You can fit a whole lot of JPEG frames in a million tokens!

Here's what one of those frames looks like with the timestamp overlaid in the corner:

How I built the plugin with o4-mini

This is a great example of how rapid prototyping with an LLM can help demonstrate the value of a feature.

I was considering whether it would make sense for fragment plugins to return images in issue 972 when I had the idea to use ffmpeg to split a video into frames.

I know from past experience that a good model can write an entire plugin for LLM if you feed it the right example, so I started with this (reformatted here for readability):

llm -m o4-mini -f github:simonw/llm-hacker-news -s 'write a new plugin called llm_video_frames.py which takes video:path-to-video.mp4 and creates a temporary directory which it then populates with one frame per second of that video using ffmpeg - then it returns a list of [llm.Attachment(path="path-to-frame1.jpg"), ...] - it should also support passing video:video.mp4?fps=2 to increase to two frames per second, and if you pass ?timestamps=1 or &timestamps=1 then it should add a text timestamp to the bottom right conner of each image with the mm:ss timestamp of that frame (or hh:mm:ss if more than one hour in) and the filename of the video without the path as well.' -o reasoning_effort high

Here's the transcript.

The new attachment mechanism went from vague idea to "I should build that" as a direct result of having an LLM-built proof-of-concept that demonstrated the feasibility of the new feature.

The code it produced was about 90% of the code I shipped in the finished plugin. Total cost 5,018 input, 2,208 output = 1.5235 cents.

Annotated release notes for everything else in LLM 0.25

Here are the annotated release notes for everything else:

New plugin feature: register_fragment_loaders(register) plugins can now return a mixture of fragments and attachments. The llm-video-frames plugin is the first to take advantage of this mechanism. #972

As decsribed above. The inspiration for this feature came from the llm-arxiv plugin by agustif.

New OpenAI models: gpt-4.1, gpt-4.1-mini, gpt-41-nano, o3, o4-mini. #945, #965, #976.

My original plan was to leave these models exclusively to the new llm-openai plugin, since that allows me to add support for new models without a full LLM release. I'm going to punt on that until I'm ready to entirely remove the OpenAI models from LLM core.

New environment variables: LLM_MODEL and LLM_EMBEDDING_MODEL for setting the model to use without needing to specify -m model_id every time. #932

A convenience feature for if you want to set the default model for a terminal session with LLM without using the global default model" mechanism.

New command: llm fragments loaders, to list all currently available fragment loader prefixes provided by plugins. #941

Mainly for consistence with the existing llm templates loaders command. Here's the output when I run llm fragments loaders on my machine:

docs:
  Fetch the latest documentation for the specified package from
  https://github.com/simonw/docs-for-llms

  Use '-f docs:' for the documentation of your current version of LLM.

docs-preview:
  Similar to docs: but fetches the latest docs including alpha/beta releases.

symbex:
  Walk the given directory, parse every .py file, and for every
  top-level function or class-method produce its signature and
  docstring plus an import line.

github:
  Load files from a GitHub repository as fragments

  Argument is a GitHub repository URL or username/repository

issue:
  Fetch GitHub issue/pull and comments as Markdown

  Argument is either "owner/repo/NUMBER" or URL to an issue

pr:
  Fetch GitHub pull request with comments and diff as Markdown

  Argument is either "owner/repo/NUMBER" or URL to a pull request

hn:
  Given a Hacker News article ID returns the full nested conversation.

  For example: -f hn:43875136

video-frames:
  Fragment loader "video-frames:<path>?fps=N&timestamps=1"
  - extracts frames at `fps` per second (default 1)
  - if `timestamps=1`, overlays "filename hh:mm:ss" at bottom-right

That's from llm-docs, llm-fragments-symbex, llm-fragments-github, llm-hacker-news and llm-video-frames.

llm fragments command now shows fragments ordered by the date they were first used. #973

This makes it easier to quickly debug a new fragment plugin - you can run llm fragments and glance at the bottom few entries.

I've also been using the new llm-echo debugging plugin for this - it adds a new fake model called "echo" which simply outputs whatever the prompt, system prompt, fragments and attachments are that were passed to the model:

llm -f docs:sqlite-utils -m echo 'Show me the context'

Output here.

llm chat now includes a !edit command for editing a prompt using your default terminal text editor. Thanks, Benedikt Willi. #969

This is a really nice enhancement to llm chat, making it much more convenient to edit longe prompts.

And the rest:

Allow -t and --system to be used at the same time. #916

Fixed a bug where accessing a model via its alias would fail to respect any default options set for that model. #968

Improved documentation for extra-openai-models.yaml. Thanks, Rahim Nathwani and Dan Guido. #950, #957

llm -c/--continue now works correctly with the -d/--database option. llm chat now accepts that -d/--database option. Thanks, Sukhbinder Singh. #933

Tags: cli, ffmpeg, plugins, projects, ai, generative-ai, llms, ai-assisted-programming, llm, vision-llms

llm-fragments-github 0.2

2025-04-20T14:01:09+00:00

llm-fragments-github 0.2

I upgraded my llm-fragments-github plugin to add a new fragment type called issue. It lets you pull the entire content of a GitHub issue thread into your prompt as a concatenated Markdown file.

(If you haven't seen fragments before I introduced them in Long context support in LLM 0.24 using fragments and template plugins.)

I used it just now to have Gemini 2.5 Pro provide feedback and attempt an implementation of a complex issue against my LLM project:

llm install llm-fragments-github
llm -f github:simonw/llm \
  -f issue:simonw/llm/938 \
  -m gemini-2.5-pro-exp-03-25 \
  --system 'muse on this issue, then propose a whole bunch of code to help implement it'

Here I'm loading the FULL content of the simonw/llm repo using that -f github:simonw/llm fragment (documented here), then loading all of the comments from issue 938 where I discuss quite a complex potential refactoring. I ask Gemini 2.5 Pro to "muse on this issue" and come up with some code.

This worked shockingly well. Here's the full response, which highlighted a few things I hadn't considered yet (such as the need to migrate old database records to the new tree hierarchy) and then spat out a whole bunch of code which looks like a solid start to the actual implementation work I need to do.

I ran this against Google's free Gemini 2.5 Preview, but if I'd used the paid model it would have cost me 202,680 input tokens, 10,460 output tokens and 1,859 thinking tokens for a total of 62.989 cents.

As a fun extra, the new issue: feature itself was written almost entirely by OpenAI o3, again using fragments. I ran this:

llm -m openai/o3 \
  -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
  -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
  -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
      number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'

Here I'm using the ability to pass a URL to -f and giving it the full source of my llm_hacker_news.py plugin (which shows how a fragment can load data from an API) plus the HTML source of my github-issue-to-markdown tool (which I wrote a few months ago with Claude). I effectively asked o3 to take that HTML/JavaScript tool and port it to Python to work with my fragments plugin mechanism.

o3 provided almost the exact implementation I needed, and even included support for a GITHUB_TOKEN environment variable without me thinking to ask for it. Total cost: 19.928 cents.

On a final note of curiosity I tried running this prompt against Gemma 3 27B QAT running on my Mac via MLX and llm-mlx:

llm install llm-mlx
llm mlx download-model mlx-community/gemma-3-27b-it-qat-4bit

llm -m mlx-community/gemma-3-27b-it-qat-4bit \
  -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
  -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
  -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
      number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'

That worked pretty well too. It turns out a 16GB local model file is powerful enough to write me an LLM plugin now!

Tags: github, plugins, ai, generative-ai, local-llms, llms, ai-assisted-programming, llm, gemini, mlx, o3, long-context, gemma

llm-fragments-rust

2025-04-11T17:36:27+00:00

llm-fragments-rust

Inspired by Filippo Valsorda's llm-fragments-go, Francois Garillot created llm-fragments-rust, an LLM fragments plugin that lets you pull documentation for any Rust crate directly into a prompt to LLM.

I really like this example, which uses two fragments to load documentation for two crates at once:

llm -f rust:rand@0.8.5 -f rust:tokio "How do I generate random numbers asynchronously?"

The code uses some neat tricks: it creates a new Rust project in a temporary directory (similar to how llm-fragments-go works), adds the crates and uses cargo doc --no-deps --document-private-items to generate documentation. Then it runs cargo tree --edges features to add dependency information, and cargo metadata --format-version=1 to include additional metadata about the crate.

Via @huitseeker

Tags: plugins, ai, rust, generative-ai, llms, ai-assisted-programming, llm

llm-docsmith

2025-04-10T18:09:18+00:00

llm-docsmith

Matheus Pedroni released this neat plugin for LLM for adding docstrings to existing Python code. You can run it like this:

llm install llm-docsmith
llm docsmith ./scripts/main.py -o

The -o option previews the changes that will be made - without -o it edits the files directly.

It also accepts a -m claude-3.7-sonnet parameter for using an alternative model from the default (GPT-4o mini).

The implementation uses the Python libcst "Concrete Syntax Tree" package to manipulate the code, which means there's no chance of it making edits to anything other than the docstrings.

Here's the full system prompt it uses.

One neat trick is at the end of the system prompt it says:

You will receive a JSON template. Fill the slots marked with <SLOT> with the appropriate description. Return as JSON.

That template is actually provided JSON generated using these Pydantic classes:

class Argument(BaseModel):
    name: str
    description: str
    annotation: str | None = None
    default: str | None = None

class Return(BaseModel):
    description: str
    annotation: str | None

class Docstring(BaseModel):
    node_type: Literal["class", "function"]
    name: str
    docstring: str
    args: list[Argument] | None = None
    ret: Return | None = None

class Documentation(BaseModel):
    entries: list[Docstring]

The code adds <SLOT> notes to that in various places, so the template included in the prompt ends up looking like this:

{
  "entries": [
    {
      "node_type": "function",
      "name": "create_docstring_node",
      "docstring": "<SLOT>",
      "args": [
        {
          "name": "docstring_text",
          "description": "<SLOT>",
          "annotation": "str",
          "default": null
        },
        {
          "name": "indent",
          "description": "<SLOT>",
          "annotation": "str",
          "default": null
        }
      ],
      "ret": {
        "description": "<SLOT>",
        "annotation": "cst.BaseStatement"
      }
    }
  ]
}

Via @pnmath

Tags: plugins, python, ai, prompt-engineering, generative-ai, llm, pydantic

llm-fragments-go

2025-04-10T15:19:33+00:00

llm-fragments-go

Filippo Valsorda released the first plugin by someone other than me that uses LLM's new register_fragment_loaders() plugin hook I announced the other day.

Install with llm install llm-fragments-go and then:

You can feed the docs of a Go package into LLM using the go: fragment with the package name, optionally followed by a version suffix.

llm -f go:golang.org/x/mod/sumdb/note@v0.23.0 "Write a single file command that generates a key, prints the verifier key, signs an example message, and prints the signed note."

The implementation is just 33 lines of Python and works by running these commands in a temporary directory:

go mod init llm_fragments_go
go get golang.org/x/mod/sumdb/note@v0.23.0
go doc -all golang.org/x/mod/sumdb/note

Via @filippo.abyssdomain.expert

Tags: go, plugins, ai, generative-ai, llms, llm, filippo-valsorda

llm-hacker-news

2025-04-08T00:11:30+00:00

llm-hacker-news

I built this new plugin to exercise the new register_fragment_loaders() plugin hook I added to LLM 0.24. It's the plugin equivalent of the Bash script I've been using to summarize Hacker News conversations for the past 18 months.

You can use it like this:

llm install llm-hacker-news
llm -f hn:43615912 'summary with illustrative direct quotes'

You can see the output in this issue.

The plugin registers a hn: prefix - combine that with the ID of a Hacker News conversation to pull that conversation into the context.

It uses the Algolia Hacker News API which returns JSON like this. Rather than feed the JSON directly to the LLM it instead converts it to a hopefully more LLM-friendly format that looks like this example from the plugin's test:

[1] BeakMaster: Fish Spotting Techniques

[1.1] CoastalFlyer: The dive technique works best when hunting in shallow waters.

[1.1.1] PouchBill: Agreed. Have you tried the hover method near the pier?

[1.1.2] WingSpan22: My bill gets too wet with that approach.

[1.1.2.1] CoastalFlyer: Try tilting at a 40° angle like our Australian cousins.

[1.2] BrownFeathers: Anyone spotted those "silver fish" near the rocks?

[1.2.1] GulfGlider: Yes! They're best caught at dawn.
Just remember: swoop > grab > lift

That format was suggested by Claude, which then wrote most of the plugin implementation for me. Here's that Claude transcript.

Tags: hacker-news, plugins, projects, ai, generative-ai, llms, ai-assisted-programming, llm, anthropic, claude

Long context support in LLM 0.24 using fragments and template plugins

2025-04-07T17:45:11+00:00

LLM 0.24 is now available with new features to help take advantage of the increasingly long input context supported by modern LLMs.

(LLM is my command-line tool and Python library for interacting with LLMs, supported by 20+ plugins adding support for both local and remote models from a bunch of different providers.)

Trying it out

To install LLM with uv (there are several other options):

uv tool install llm

You'll need to either provide an OpenAI API key or install a plugin to use local models or models from other providers:

llm keys set openai
# Paste OpenAI API key here

To upgrade LLM from a previous version:

llm install -U llm

The biggest new feature is fragments. You can now use -f filename or -f url to add one or more fragments to your prompt, which means you can do things like this:

llm -f https://simonwillison.net/2025/Apr/5/llama-4-notes/ 'bullet point summary'

Here's the output from that prompt, exported using llm logs -c --expand --usage. Token cost was 5,372 input, 374 output which works out as 0.103 cents (around 1/10th of a cent) using the default GPT-4o mini model.

Plugins can implement custom fragment loaders with a prefix. The llm-fragments-github plugin adds a github: prefix that can be used to load every text file in a GitHub repository as a list of fragments:

llm install llm-fragments-github
llm -f github:simonw/s3-credentials 'Suggest new features for this tool'

Here's the output. That took 49,856 input tokens for a total cost of 0.7843 cents - nearly a whole cent!

Improving LLM's support for long context models

Long context is one of the most exciting trends in LLMs over the past eighteen months. Saturday's Llama 4 Scout release gave us the first model with a full 10 million token context. Google's Gemini family has several 1-2 million token models, and the baseline for recent models from both OpenAI and Anthropic is 100 or 200 thousand.

Two years ago most models capped out at 8,000 tokens of input. Long context opens up many new interesting ways to apply this class of technology.

I've been using long context models via my files-to-prompt tool to summarize large codebases, explain how they work and even to debug gnarly bugs. As demonstrated above, it's surprisingly inexpensive to drop tens of thousands of tokens into models like GPT-4o mini or most of the Google Gemini series, and the results are often very impressive.

One of LLM's most useful features is that it logs every prompt and response to a SQLite database. This is great for comparing the same prompt against different models and tracking experiments over time - my own database contained thousands of responses from hundreds of different models accumulated over the past couple of years.

This is where long context prompts were starting to be a problem. Since LLM stores the full prompt and response in the database, asking five questions of the same source code could result in five duplicate copies of that text in the database!

The new fragments feature targets this problem head on. Each fragment is stored once in a fragments table, then de-duplicated in the future using a SHA256 hash of its content.

This saves on storage, and also enables features like llm logs -f X for seeing all logged responses that use a particular fragment.

Fragments can be specified in several different ways:

a path to a file
a URL to data online
an alias that's been set against a previous fragment (see llm fragments set)
a hash ID of the content of a fragment
using prefix:argument to specify fragments from a plugin

Asking questions of LLM's documentation

Wouldn't it be neat if LLM could answer questions about its own documentation?

The new llm-docs plugin (built with the new register_fragment_loaders() plugin hook) enables exactly that:

llm install llm-docs
llm -f docs: "How do I embed a binary file?"

The output starts like this:

To embed a binary file using the LLM command-line interface, you can use the llm embed command with the --binary option. Here’s how you can do it:

Make sure you have the appropriate embedding model installed that supports binary input.

Use the following command syntax:
llm embed -m <model_id> --binary -i <path_to_your_binary_file>
Replace <model_id> with the identifier for the embedding model you want to use (e.g., clip for the CLIP model) and <path_to_your_binary_file> with the path to your actual binary file.

(74,570 input, 240 output = 1.1329 cents with GPT-4o mini)

Using -f docs: with just the prefix is the same as using -f docs:llm. The plugin fetches the documentation for your current version of LLM from my new simonw/docs-for-llms repo, which also provides packaged documentation files for my datasette, s3-credentials, shot-scraper and sqlite-utils projects.

Datasette's documentation has got pretty long, so you might need to run that through a Gemini model instead (using the llm-gemini plugin):

llm -f docs:datasette -m gemini-2.0-flash \
  'Build a render_cell plugin that detects and renders markdown'

Here's the output. 132,042 input, 1,129 output with Gemini 2.0 Flash = 1.3656 cents.

You can browse the combined documentation files this uses in docs-for-llm. They're built using GitHub Actions.

llms-txt is a project lead by Jeremy Howard that encourages projects to publish similar files to help LLMs ingest a succinct copy of their documentation.

The new register_template_loaders() plugin hook allows plugins to register prefix:value custom template loaders, for use with the llm -t option.

llm-templates-github and llm-templates-fabric are two new plugins that make use of that hook.

llm-templates-github lets you share and use templates via a public GitHub repository. Here's how to run my Pelican riding a bicycle benchmark against a specific model:

llm install llm-templates-github
llm -t gh:simonw/pelican-svg -m o3-mini

This executes this pelican-svg.yaml template stored in my simonw/llm-templates repository, using a new repository naming convention.

llm -t gh:simonw/pelican-svg will load that pelican-svg.yaml file from the simonw/llm-templates repo. You can also use llm -t gh:simonw/name-of-repo/name-of-template to load a template from a repository that doesn't follow that convention.

To share your own templates, create a repository on GitHub under your user account called llm-templates and start saving .yaml files to it.

llm-templates-fabric provides a similar mechanism for loading templates from Daniel Miessler's extensive fabric collection:

llm install llm-templates-fabric
curl https://simonwillison.net/2025/Apr/6/only-miffy/ | \
  llm -t f:extract_main_idea

A conversation with Daniel was the inspiration for this new plugin hook.

Template and fragment loader plugins so far

Here's the list of fragment and template loader plugins so far from the LLM plugins directory:

llm-templates-github supports loading templates shared on GitHub, e.g. llm -t gh:simonw/pelican-svg.

llm-templates-fabric provides access to the Fabric collection of prompts: cat setup.py | llm -t fabric:explain_code.

llm-fragments-github can load entire GitHub repositories in a single operation: llm -f github:simonw/files-to-prompt 'explain this code'.

llm-hacker-news imports conversations from Hacker News as fragments: llm -f hn:43615912 'summary with illustrative direct quotes'.

Everything else in LLM 0.24

LLM 0.24 is a big release, spanning 51 commits. The release notes cover everything that's new in full - here are a few of my highlights:

The new llm-openai plugin provides support for o1-pro (which is not supported by the OpenAI mechanism used by LLM core). Future OpenAI features will migrate to this plugin instead of LLM core itself.

The problem with OpenAI models being handled by LLM core is that I have to release a whole new version of LLM every time OpenAI releases a new model or feature. Migrating this stuff out to a plugin means I can release new version of that plugin independently of LLM itself - something I frequently do for llm-anthropic and llm-gemini and others.

The new llm-openai plugin uses their Responses API, a new shape of API which I covered last month.

llm -t $URL option can now take a URL to a YAML template. #856

The new custom template loaders are fun, but being able to paste in a URL to a YAML file somewhere provides a simpler way to share templates.

Templates can now store default model options. #845

Attachments can now be stored in templates. #826

The quickest way to create your own template is with the llm prompt ... --save name-of-template command. This now works with attachments, fragments and default model options, each of which is persisted in the template YAML file.

New llm models options family of commands for setting default options for particular models. #829

I built this when I learned that Qwen's QwQ-32b model works best with temperature 0.7 and top p 0.95.

llm prompt -d path-to-sqlite.db option can now be used to write logs to a custom SQLite database. #858

This proved extremely useful for testing fragments - it meant I could run a prompt and save the full response to a separate SQLite database which I could then upload to S3 and share as a link to Datasette Lite.

llm similar -p/--plain option providing more human-readable output than the default JSON. #853

I'd like this to be the default output, but I'm holding off on changing that until LLM 1.0 since it's a breaking change for people building automations against the JSON from llm similar.

Set the LLM_RAISE_ERRORS=1 environment variable to raise errors during prompts rather than suppressing them, which means you can run python -i -m llm 'prompt' and then drop into a debugger on errors with import pdb; pdb.pm(). #817

Really useful for debugging new model plugins.

llm prompt -q gpt -q 4o option - pass -q searchterm one or more times to execute a prompt against the first model that matches all of those strings - useful for if you can't remember the full model ID. #841

Pretty obscure but I found myself needing this. Vendors love releasing models with names like gemini-2.5-pro-exp-03-25, now I can run llm -q gem -q 2.5 -q exp 'say hi' to save me from looking up the model ID.

OpenAI compatible models configured using extra-openai-models.yaml now support supports_schema: true, vision: true and audio: true options. Thanks @adaitche and @giuli007. #819, #843

I don't use this feature myself but it's clearly popular, this isn't the first time I'e had PRs with improvements from the wider community.

Tags: plugins, projects, ai, annotated-release-notes, openai, generative-ai, llms, llm, gemini, long-context, files-to-prompt

llm-openrouter 0.4

2025-03-10T21:40:56+00:00

llm-openrouter 0.4

I found out this morning that OpenRouter include support for a number of (rate-limited) free API models.

I occasionally run workshops on top of LLMs (like this one) and being able to provide students with a quick way to obtain an API key against models where they don't have to setup billing is really valuable to me!

This inspired me to upgrade my existing llm-openrouter plugin, and in doing so I closed out a bunch of open feature requests.

Consider this post the annotated release notes:

LLM schema support for OpenRouter models that support structured output. #23

I'm trying to get support for LLM's new schema feature into as many plugins as possible.

OpenRouter's OpenAI-compatible API includes support for the response_format structured content option, but with an important caveat: it only works for some models, and if you try to use it on others it is silently ignored.

I filed an issue with OpenRouter requesting they include schema support in their machine-readable model index. For the moment LLM will let you specify schemas for unsupported models and will ignore them entirely, which isn't ideal.

llm openrouter key command displays information about your current API key. #24

Useful for debugging and checking the details of your key's rate limit.

llm -m ... -o online 1 enables web search grounding against any model, powered by Exa. #25

OpenRouter apparently make this feature available to every one of their supported models! They're using new-to-me Exa to power this feature, an AI-focused search engine startup who appear to have built their own index with their own crawlers (according to their FAQ). This feature is currently priced by OpenRouter at $4 per 1000 results, and since 5 results are returned for every prompt that's 2 cents per prompt.

llm openrouter models command for listing details of the OpenRouter models, including a --json option to get JSON and a --free option to filter for just the free models. #26

This offers a neat way to list the available models. There are examples of the output in the comments on the issue.

New option to specify custom provider routing: -o provider '{JSON here}'. #17

Part of OpenRouter's USP is that it can route prompts to different providers depending on factors like latency, cost or as a fallback if your first choice is unavailable - great for if you are using open weight models like Llama which are hosted by competing companies.

The options they provide for routing are very thorough - I had initially hoped to provide a set of CLI options that covered all of these bases, but I decided instead to reuse their JSON format and forward those options directly on to the model.

Tags: cli, plugins, projects, ai, annotated-release-notes, generative-ai, llms, llm, openrouter, ai-assisted-search

llm-ollama 0.9.0

2025-03-04T07:17:52+00:00

llm-ollama 0.9.0

This release of the llm-ollama plugin adds support for schemas, thanks to a PR by Adam Compton.

Ollama provides very robust support for this pattern thanks to their structured outputs feature, which works across all of the models that they support by intercepting the logic that outputs the next token and restricting it to only tokens that would be valid in the context of the provided schema.

With Ollama and llm-ollama installed you can run even run structured schemas against vision prompts for local models. Here's one against Ollama's llama3.2-vision:

llm -m llama3.2-vision:latest \
  'describe images' \
  --schema 'species,description,count int' \
  -a https://static.simonwillison.net/static/2025/two-pelicans.jpg

I got back this:

{
    "species": "Pelicans",
    "description": "The image features a striking brown pelican with its distinctive orange beak, characterized by its large size and impressive wingspan.",
    "count": 1
}

(Actually a bit disappointing, as there are two pelicans and their beaks are brown.)

Tags: plugins, ai, generative-ai, llama, local-llms, llms, llm, vision-llms, ollama

llm-mistral 0.11

2025-03-04T07:05:21+00:00

llm-mistral 0.11

I added schema support to this plugin which adds support for the Mistral API to LLM. Release notes:

Support for LLM schemas. #19

-o prefix '{' option for forcing a response prefix. #18

Schemas now work with OpenAI, Anthropic, Gemini and Mistral hosted models, plus self-hosted models via Ollama and llm-ollama.

Tags: plugins, projects, ai, generative-ai, local-llms, llms, llm, mistral, ollama

Run LLMs on macOS using llm-mlx and Apple's MLX framework

2025-02-15T19:13:29+00:00

llm-mlx is a brand new plugin for my LLM Python Library and CLI utility which builds on top of Apple's excellent MLX array framework library and mlx-lm package. If you're a terminal user or Python developer with a Mac this may be the new easiest way to start exploring local Large Language Models.

Running Llama 3.2 3B using llm-mlx

If you haven't already got LLM installed you'll need to install it - you can do that in a bunch of different ways - in order of preference I like uv tool install llm or pipx install llm or brew install llm or pip install llm.

Next, install the new plugin (macOS only):

llm install llm-mlx

Now download and register a model. Llama 3.2 3B is an excellent first choice - it's pretty small (a 1.8GB download) but is a surprisingly capable starter model.

llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit

This will download 1.8GB of model weights from mlx-community/Llama-3.2-3B-Instruct-4bit on Hugging Face and store them here:

~/.cache/huggingface/hub/models--mlx-community--Llama-3.2-3B-Instruct-4bit

Now you can start running prompts:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Python code to traverse a tree, briefly'

Which output this for me:

Here's a brief example of how to traverse a tree in Python:
class Node:
    def __init__(self, value):
        self.value = value
        self.children = []

def traverse_tree(node):
    if node is None:
        return
    
    print(node.value)
    
    for child in node.children:
        traverse_tree(child)

# Example usage:
root = Node("A")
root.children = [Node("B"), Node("C")]
root.children[0].children = [Node("D"), Node("E")]
root.children[1].children = [Node("F")]

traverse_tree(root)  # Output: A, B, D, E, C, F
In this example, we define a Node class to represent each node in the tree, with a value attribute and a list of children. The traverse_tree function recursively visits each node in the tree, printing its value.

This is a basic example of a tree traversal, and there are many variations and optimizations depending on the specific use case.

That generation ran at an impressive 152 tokens per second!

That command was a bit of a mouthful, so let's assign an alias to the model:

llm aliases set l32 mlx-community/Llama-3.2-3B-Instruct-4bit

Now we can use that shorter alias instead:

llm -m l32 'a joke about a haggis buying a car'

(The joke isn't very good.)

As with other models supported by LLM, you can also pipe things to it. Here's how to get it to explain a piece of Python code (in this case itself):

cat llm_mlx.py | llm -m l32 'explain this code'

The response started like this:

This code is a part of a larger project that uses the Hugging Face Transformers library to create a text-to-text conversational AI model. The code defines a custom model class MlxModel and a set of command-line interface (CLI) commands for working with MLX models. [...]

Here's the rest of the response. I'm pretty amazed at how well it did for a tiny 1.8GB model!

This plugin can only run models that have been converted to work with Apple's MLX framework. Thankfully the mlx-community organization on Hugging Face has published over 1,000 of these. A few I've tried successfully:

mlx-community/Qwen2.5-0.5B-Instruct-4bit - 278 MB
mlx-community/Mistral-7B-Instruct-v0.3-4bit - 4.08 GB
mlx-community/Mistral-Small-24B-Instruct-2501-4bit - 13.26 GB
mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit - 18.5 GB
mlx-community/Llama-3.3-70B-Instruct-4bit - 40 GB

Mistral-Small-24B really does feel GPT-4 quality despite only needing around 12GB of RAM to run - so it's a good default model if you want to leave space to run other apps.

DeepSeek-R1-Distill-Qwen-32B-4bit is a great model if you want to try out R1 style "reasoning" which outputs a <think>...</think>... block at the start of each response.

That last Llama 70B one needs a machine with 64GB of RAM to run but is very capable - I wrote about trying out the Ollama version of Llama 3.3 70B in I can now run a GPT-4 class model on my laptop back in December.

llm mlx download-model mlx-community/Llama-3.3-70B-Instruct-4bit

For larger models like that I recommend using the llm chat command as that ensures they stay in memory for multiple rounds of the conversation:

llm chat -m mlx-community/Llama-3.3-70B-Instruct-4bit

Activity Monitor shows the process using 37.5GB of RAM while the model is running.

Some of these jokes are actually pretty good!

Chatting with mlx-community/Llama-3.3-70B-Instruct-4bit
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> hi
How's your day going so far? Is there something I can help you with or would you like to chat?
> jokes about weird animals please
Here are a few:

1. Why did the axolotl go to therapy? Because it was struggling to come to terms with its identity crisis... and also because it had a lot of "regrowing" to do!
2. What did the aye-aye say to the other aye-aye when they were out on a date? "You've got a finger on the pulse of my heart!"
3. Why did the naked mole rat go to the party? Because it was a "hair-raising" experience and it wanted to get to the "root" of the fun!
4. What do you call a group of flamingos playing instruments? The Pink Floyd... of the avian world, of course!
5. Why did the blobfish go to the doctor? It had a "gut" feeling something was wrong... and also because it was feeling a little "deflated"!

Want more? I've got a whole zoo of 'em!

That ran at 8.8 tokens per second.

As with every LLM plugin, all prompts and responses are logged to a SQLite database that can be accessed using the llm logs command.

You can use LLM models from your Python code as well. The README includes notes on how to do this, as does the LLM Python API documentation. TLDR:

import llm
model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())

MLX and mlx-lm

The secret sauce behind this new plugin is Apple's MLX library, which as been improving at an extraordinary pace over the past year - it first launched on GitHub just in November of 2023.

The core library is C++, but the MLX team at Apple also maintain a mlx-examples repo with a wealth of Python examples, including the mlx-lm and mlx-whisper packages.

The community around it is growing too. I'm particularly impressed by Prince Canuma's mlx-vlm library for accessing multi-modal vision LLMs - I'd love to get that working as an LLM plugin once I figure out how to use it for conversations.

I've used MLX for a few experiments in the past, but this tweet from MLX core developer Awni Hannun finally convinced me to wrap it up as an LLM plugin:

In the latest MLX small LLMs are a lot faster.

On M4 Max 4-bit Qwen 0.5B generates 1k tokens at a whopping 510 toks/sec. And runs at over 150 tok/sec on iPhone 16 pro.

This is really good software. This small team at Apple appear to be almost single-handedly giving NVIDIA's CUDA a run for their money!

Building the plugin

The llm-mlx plugin came together pretty quickly. The first version was ~100 lines of Python, much of it repurposed from my existing llm-gguf plugin.

The hardest problem was figuring out how to hide the Hugging Face progress bars that displayed every time it attempted to access a model!

I eventually found the from huggingface_hub.utils import disable_progress_bars utility function by piping library code through Gemini 2.0.

I then added model options support allowing you to pass options like this:

llm -m l32 'a greeting' -o temperature 1.0 -o seed 2

So far using a fixed seed appears to produce repeatable results, which is exciting for writing more predictable tests in the future.

For the automated tests that run in GitHub Actions I decided to use a small model - I went with the tiny 75MB mlx-community/SmolLM-135M-Instruct-4bit (explored previously). I configured GitHub Actions to cache the model in between CI runs by adding the following YAML to my .github/workflows/test.yml file:

    - name: Cache models
      uses: actions/cache@v4
      with:
        path: ~/.cache/huggingface
        key: ${{ runner.os }}-huggingface-

Tags: apple, plugins, projects, ai, github-actions, generative-ai, local-llms, llms, llm, qwen, mlx, prince-canuma, ai-in-china

llm-sort

2025-02-11T20:50:21+00:00

llm-sort

Delightful LLM plugin by Evangelos Lamprou which adds the ability to perform "semantic search" - allowing you to sort the contents of a file based on using a prompt against an LLM to determine sort order.

Best illustrated by these examples from the README:

llm sort --query "Which names is more suitable for a pet monkey?" names.txt

cat titles.txt | llm sort --query "Which book should I read to cook better?"

It works using this pairwise prompt, which is executed multiple times using Python's sorted(documents, key=functools.cmp_to_key(compare_callback)) mechanism:

Given the query:
{query}

Compare the following two lines:

Line A:
{docA}

Line B:
{docB}

Which line is more relevant to the query? Please answer with "Line A" or "Line B".

From the lobste.rs comments, Cole Kurashige:

I'm not saying I'm prescient, but in The Before Times I did something similar with Mechanical Turk

This made me realize that so many of the patterns we were using against Mechanical Turk a decade+ ago can provide hints about potential ways to apply LLMs.

Via lobste.rs

Tags: mechanical-turk, plugins, python, ai, generative-ai, llms, llm

Using pip to install a Large Language Model that's under 100MB

2025-02-07T06:34:59+00:00

I just released llm-smollm2, a new plugin for LLM that bundles a quantized copy of the SmolLM2-135M-Instruct LLM inside of the Python package.

This means you can now pip install a full LLM!

If you're already using LLM you can install it like this:

llm install llm-smollm2

Then run prompts like this:

llm -m SmolLM2 'Are dogs real?'

(New favourite test prompt for tiny models, courtesy of Tim Duffy. Here's the result).

If you don't have LLM yet first follow these installation instructions, or brew install llm or pipx install llm or uv tool install llm depending on your preferred way of getting your Python tools.

If you have uv setup you don't need to install anything at all! The following command will spin up an ephemeral environment, install the necessary packages and start a chat session with the model all in one go:

uvx --with llm-smollm2 llm chat -m SmolLM2

tell me a long story about a dog The story of a dog is a tale of love, loss, and the unspoken rules of the universe. It begins with a small, timid, and timid dog named Max. Max was born into a loving family, but his life was forever changed by a cruel and cruel owner. He was left to fend for himself, and he was left to fend for himself. Max's owner, a kind and gentle soul named Elena, was a cruel and cruel one. She had no patience, no love, and she would do anything to keep Max away from her. Max was left to fend for himself, and he was left to fend for himself. Max's owner, Elena, was determined to make Max happy. She took Max to a beautiful and beautiful place, where Max was allowed to roam freely. Max was allowed to roam freely, and Max was allowed to roam freely. [Then repeats that sentence many times]" style="max-width: 100%;" />

Finding a tiny model

The fact that the model is almost exactly 100MB is no coincidence: that's the default size limit for a Python package that can be uploaded to the Python Package Index (PyPI).

I asked on Bluesky if anyone had seen a just-about-usable GGUF model that was under 100MB, and Artisan Loaf pointed me to SmolLM2-135M-Instruct.

I ended up using this quantization by QuantFactory just because it was the first sub-100MB model I tried that worked.

Trick for finding quantized models: Hugging Face has a neat "model tree" feature in the side panel of their model pages, which includes links to relevant quantized models. I find most of my GGUFs using that feature.

Building the plugin

I first tried the model out using Python and the llama-cpp-python library like this:

uv run --with llama-cpp-python python

Then:

from llama_cpp import Llama
from pprint import pprint
llm = Llama(model_path="SmolLM2-135M-Instruct.Q4_1.gguf")
output = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Hi"}
])
pprint(output)

This gave me the output I was expecting:

{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': None,
              'message': {'content': 'Hello! How can I assist you today?',
                          'role': 'assistant'}}],
 'created': 1738903256,
 'id': 'chatcmpl-76ea1733-cc2f-46d4-9939-90efa2a05e7c',
 'model': 'SmolLM2-135M-Instruct.Q4_1.gguf',
 'object': 'chat.completion',
 'usage': {'completion_tokens': 9, 'prompt_tokens': 31, 'total_tokens': 40}}

But it also spammed my terminal with a huge volume of debugging output - which started like this:

llama_model_load_from_file_impl: using device Metal (Apple M2 Max) - 49151 MiB free
llama_model_loader: loaded meta data with 33 key-value pairs and 272 tensors from SmolLM2-135M-Instruct.Q4_1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama

And then continued for more than 500 lines!

I've had this problem with llama-cpp-python and llama.cpp in the past, and was sad to find that the documentation still doesn't have a great answer for how to avoid this.

So I turned to the just released Gemini 2.0 Pro (Experimental), because I know it's a strong model with a long input limit.

I ran the entire llama-cpp-python codebase through it like this:

cd /tmp
git clone https://github.com/abetlen/llama-cpp-python
cd llama-cpp-python
files-to-prompt -e py . -c | llm -m gemini-2.0-pro-exp-02-05 \
  'How can I prevent this library from logging any information at all while it is running - no stderr or anything like that'

Here's the answer I got back. It recommended setting the logger to logging.CRITICAL, passing verbose=False to the constructor and, most importantly, using the following context manager to suppress all output:

from contextlib import contextmanager, redirect_stderr, redirect_stdout

@contextmanager
def suppress_output():
    """
    Suppresses all stdout and stderr output within the context.
    """
    with open(os.devnull, "w") as devnull:
        with redirect_stdout(devnull), redirect_stderr(devnull):
            yield

This worked! It turned out most of the output came from initializing the LLM class, so I wrapped that like so:

with suppress_output():
    model = Llama(model_path=self.model_path, verbose=False)

Proof of concept in hand I set about writing the plugin. I started with my simonw/llm-plugin cookiecutter template:

uvx cookiecutter gh:simonw/llm-plugin

  [1/6] plugin_name (): smollm2
  [2/6] description (): SmolLM2-135M-Instruct.Q4_1 for LLM
  [3/6] hyphenated (smollm2): 
  [4/6] underscored (smollm2): 
  [5/6] github_username (): simonw
  [6/6] author_name (): Simon Willison

The rest of the plugin was mostly borrowed from my existing llm-gguf plugin, updated based on the latest README for the llama-cpp-python project.

There's more information on building plugins in the tutorial on writing a plugin.

Packaging the plugin

Once I had that working the last step was to figure out how to package it for PyPI. I'm never quite sure of the best way to bundle a binary file in a Python package, especially one that uses a pyproject.toml file... so I dumped a copy of my existing pyproject.toml file into o3-mini-high and prompted:

Modify this to bundle a SmolLM2-135M-Instruct.Q4_1.gguf file inside the package. I don't want to use hatch or a manifest or anything, I just want to use setuptools.

Here's the shared transcript - it gave me exactly what I wanted. I bundled it by adding this to the end of the toml file:

[tool.setuptools.package-data]
llm_smollm2 = ["SmolLM2-135M-Instruct.Q4_1.gguf"]

Then dropping that .gguf file into the llm_smollm2/ directory and putting my plugin code in llm_smollm2/__init__.py.

I tested it locally by running this:

python -m pip install build
python -m build

I fired up a fresh virtual environment and ran pip install ../path/to/llm-smollm2/dist/llm_smollm2-0.1-py3-none-any.whl to confirm that the package worked as expected.

Publishing to PyPI

My cookiecutter template comes with a GitHub Actions workflow that publishes the package to PyPI when a new release is created using the GitHub web interface. Here's the relevant YAML:

  deploy:
    runs-on: ubuntu-latest
    needs: [test]
    environment: release
    permissions:
      id-token: write
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: "3.13"
        cache: pip
        cache-dependency-path: pyproject.toml
    - name: Install dependencies
      run: |
        pip install setuptools wheel build
    - name: Build
      run: |
        python -m build
    - name: Publish
      uses: pypa/gh-action-pypi-publish@release/v1

This runs after the test job has passed. It uses the pypa/gh-action-pypi-publish Action to publish to PyPI - I wrote more about how that works in this TIL.

Is the model any good?

This one really isn't! It's not really surprising but it turns out 94MB really isn't enough space for a model that can do anything useful.

It's super fun to play with, and I continue to maintain that small, weak models are a great way to help build a mental model of how this technology actually works.

That's not to say SmolLM2 isn't a fantastic model family. I'm running the smallest, most restricted version here. SmolLM - blazingly fast and remarkably powerful describes the full model family - which comes in 135M, 360M, and 1.7B sizes. The larger versions are a whole lot more capable.

If anyone can figure out something genuinely useful to do with the 94MB version I'd love to hear about it.

Tags: pip, plugins, projects, pypi, python, ai, github-actions, generative-ai, local-llms, llms, ai-assisted-programming, llm, gemini, uv, smollm, o3, llama-cpp

llm-anthropic

2025-02-02T06:17:53+00:00

llm-anthropic

I've renamed my llm-claude-3 plugin to llm-anthropic, on the basis that Claude 4 will probably happen at some point so this is a better name for the plugin.

If you're a previous user of llm-claude-3 you can upgrade to the new plugin like this:

llm install -U llm-claude-3

This should remove the old plugin and install the new one, because the latest llm-claude-3 depends on llm-anthropic. Just installing llm-anthropic may leave you with both plugins installed at once.

There is one extra manual step you'll need to take during this upgrade: creating a new anthropic stored key with the same API token you previously stored under claude. You can do that like so:

llm keys set anthropic --value "$(llm keys get claude)"

I released llm-anthropic 0.12 yesterday with new features not previously included in llm-claude-3:

Support for Claude's prefill feature, using the new -o prefill '{' option and the accompanying -o hide_prefill 1 option to prevent the prefill from being included in the output text. #2

New -o stop_sequences '```' option for specifying one or more stop sequences. To specify multiple stop sequences pass a JSON array of strings :-o stop_sequences '["end", "stop"].

Model options are now documented in the README.

If you install or upgrade llm-claude-3 you will now get llm-anthropic instead, thanks to a tiny package on PyPI which depends on the new plugin name. I created that with my pypi-rename cookiecutter template.

Here's the issue for the rename. I archived the llm-claude-3 repository on GitHub, and got to use the brand new PyPI archiving feature to archive the llm-claude-3 project on PyPI as well.

Tags: plugins, pypi, python, ai, generative-ai, llms, llm, anthropic, claude

llm-openrouter 0.3

2024-12-08T23:56:14+00:00

llm-openrouter 0.3

New release of my llm-openrouter plugin, which allows LLM to access models hosted by OpenRouter.

Quoting the release notes:

Enable image attachments for models that support images. Thanks, Adam Montgomery. #12

Provide async model access. #15

Fix documentation to list correct LLM_OPENROUTER_KEY environment variable. #10

Tags: plugins, releases, ai, generative-ai, llms, llm, openrouter

datasette-enrichments-llm

2024-12-05T23:46:48+00:00

datasette-enrichments-llm

Today's new alpha release is datasette-enrichments-llm, a plugin for Datasette 1.0a+ that provides an enrichment that lets you run prompts against data from one or more column and store the result in another column.

So far it's a light re-implementation of the existing datasette-enrichments-gpt plugin, now using the new llm.get_async_models() method to allow users to select any async-enabled model that has been registered by a plugin - so currently any of the models from OpenAI, Anthropic, Gemini or Mistral via their respective plugins.

Still plenty to do on this one. Next step is to integrate it with datasette-llm-usage and use it to drive a design-complete stable version of that.

Tags: plugins, projects, releases, ai, datasette, generative-ai, llms, llm, enrichments

datasette-queries

2024-12-03T23:59:26+00:00

datasette-queries

I released the first alpha of a new plugin to replace the crusty old datasette-saved-queries. This one adds a new UI element to the top of the query results page with an expandable form for saving the query as a new canned query:

It's my first plugin to depend on LLM and datasette-llm-usage - it uses GPT-4o mini to power an optional "Suggest title and description" button, labeled with the becoming-standard ✨ sparkles emoji to indicate an LLM-powered feature.

I intend to expand this to work across multiple models as I continue to iterate on llm-datasette-usage to better support those kinds of patterns.

For the moment though each suggested title and description call costs about 250 input tokens and 50 output tokens, which against GPT-4o mini adds up to 0.0067 cents.

Tags: plugins, projects, releases, ai, datasette, openai, generative-ai, llms, llm

datasette-llm-usage

2024-12-02T21:33:05+00:00

datasette-llm-usage

I released the first alpha of a Datasette plugin to help track LLM usage by other plugins, with the goal of supporting token allowances - both for things like free public apps that stop working after a daily allowance, plus free previews of AI features for paid-account-based projects such as Datasette Cloud.

It's using the usage features I added in LLM 0.19.

The alpha doesn't do much yet - it will start getting interesting once I upgrade other plugins to depend on it.

Design notes so far in issue #1.

Tags: plugins, projects, releases, ai, datasette, datasette-cloud, generative-ai, llms, llm

Ask questions of SQLite databases and CSV/JSON files in your terminal

2024-11-25T01:33:03+00:00

I built a new plugin for my sqlite-utils CLI tool that lets you ask human-language questions directly of SQLite databases and CSV/JSON files on your computer.

It's called sqlite-utils-ask. Here's how you install it:

sqlite-utils install sqlite-utils-ask

It picks up API keys from an OPENAI_API_KEY environment variable, or you can install LLM and use llm keys set openai to store a key in a configuration file.

Then you can use it like this:

curl -O https://datasette.io/content.db
sqlite-utils ask content.db "how many sqlite-utils pypi downloads in 2024?"

This command will extract the SQL schema for the provided database file, send that through an LLM along with your question, get back a SQL query and attempt to run it to derive a result.

If all goes well it spits out an answer something like this:

SELECT SUM(downloads)
FROM stats
WHERE package = 'sqlite-utils' AND date >= '2024-01-01' AND date < '2025-01-01';

[
    {
        "SUM(downloads)": 4300221
    }
]

If the SQL query fails to execute (due to a syntax error of some kind) it passes that error back to the model for corrections and retries up to three times before giving up.

Add -v/--verbose to see the exact prompt it's using:

System prompt:
You will be given a SQLite schema followed by a question. Generate a single SQL
query to answer that question. Return that query in a ```sql ... ```
fenced code block.

Example: How many repos are there?
Answer:
```sql
select count(*) from repos
```

Prompt:
...
CREATE TABLE [stats] (
   [package] TEXT,
   [date] TEXT,
   [downloads] INTEGER,
   PRIMARY KEY ([package], [date])
);
...
how many sqlite-utils pypi downloads in 2024?

I've truncated the above to just the relevant table - it actually includes the full schema of every table in that database.

By default, the tool sends just that database schema and your question to the LLM. If you add the -e/--examples option it will also include five common values for each of the text columns in that schema with an average length less than 32 characters. This can sometimes help get a better result, for example sending values "CA" and "FL" and "TX" for a state column can tip the model of that it should use state abbreviations rather than full names in its queries.

Asking questions of CSV and JSON data

The core sqlite-utils CLI usually works against SQLite files directly, but three years ago I added the ability to run SQL queries against CSV and JSON files directly with the sqlite-utils memory command. This works by loading that data into an in-memory SQLite database before executing a SQL query.

I decided to reuse that mechanism to enable LLM prompts against CSV and JSON data directly as well.

The sqlite-utils ask-files command looks like this:

sqlite-utils ask-files transactions.csv "total sales by year"

This command accepts one or more files, and you can provide a mix of CSV, TSV and JSON. Each provided file will be imported into a different table, allowing the model to construct join queries where necessary.

Implementation notes

The core of the plugin is implemented as around 250 lines of Python, using the sqlite-utils register_commands() plugin hook to add the ask and ask-files commands.

It adds LLM as a dependency, and takes advantage of LLM's Python API to abstract over the details of talking to the models. This means sqlite-utils-ask can use any of the models supported by LLM or its plugins - if you want to run your prompt through Claude 3.5 Sonnet you can do this:

sqlite-utils install llm-claude-3
sqlite-utils ask content.db "count rows in news table" -m claude-3.5-sonnet

The plugin defaults to gpt-4o-mini initially to take advantage of that model's automatic prompt caching: if you run multiple questions against the same schema you'll end up sending the same lengthy prompt prefix multiple times, and OpenAI's prompt caching should automatically kick in and provide a 50% discount on those input tokens.

Then I ran the actual numbers and found that gpt-4o-mini is cheap enough that even without caching a 4,000 token prompt (that's a pretty large SQL schema) should cost less than a tenth of a cent. So those caching savings aren't worth anything at all!

Tags: cli, plugins, projects, sqlite, ai, sqlite-utils, generative-ai, llms, ai-assisted-programming, llm

Simon Willison's Weblog: plugins

datasette-llm 0.1a1

Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI

The datasette-checkbox plugin

Running the plugin's tests

Upgrading the plugin with Codex

A colophon for the video

A new SQL-powered permissions system in Datasette 1.0a20

Understanding the permissions system

Permissions systems need to be able to efficiently list things

The new permission_resources_sql() plugin hook

Hierarchies, plugins, vetoes, and restrictions

New debugging tools

The missing feature: list actors who can act on this resource

Upgrading plugins for Datasette 1.0a20

Using Claude Code to implement this change

Starting with a proof-of-concept

Miscellaneous tips I picked up along the way

What's next?

Superpowers: How I'm using coding agents in October 2025

Claude can write complete Datasette plugins now

llm-fragments-youtube

llm-mistral 0.14

llm-llama-server 0.2

llm-pdf-to-images

django-simple-deploy

llm -f issue:... -m echo --no-log

Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)

How I built the plugin with o4-mini

Annotated release notes for everything else in LLM 0.25

llm-fragments-github 0.2

llm-fragments-rust

llm-docsmith

llm-fragments-go

llm-hacker-news

Long context support in LLM 0.24 using fragments and template plugins

Trying it out

Improving LLM's support for long context models

Asking questions of LLM's documentation

Publishing, sharing and reusing templates

Template and fragment loader plugins so far

Everything else in LLM 0.24

llm-openrouter 0.4

llm-ollama 0.9.0

llm-mistral 0.11

Run LLMs on macOS using llm-mlx and Apple's MLX framework

Running Llama 3.2 3B using llm-mlx

MLX and mlx-lm

Building the plugin

llm-sort

Using pip to install a Large Language Model that's under 100MB

Finding a tiny model

Building the plugin

Packaging the plugin

Publishing to PyPI

Is the model any good?

llm-anthropic

llm-openrouter 0.3

datasette-enrichments-llm

datasette-queries

datasette-llm-usage

Ask questions of SQLite databases and CSV/JSON files in your terminal

Asking questions of CSV and JSON data

Implementation notes