Simon Willison's Weblog: cli

Rich Pixels

2025-09-02T11:05:23+00:00

Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks.

Here's the key trick - it renders Unicode ▄ (U+2584, "lower half block") characters after setting a foreground and background color for the two pixels it needs to display.

I got GPT-5 to vibe code up a show_image.py terminal command which resizes the provided image to fit the width and height of the current terminal and displays it using Rich Pixels. That script is here, you can run it with uv like this:

uv run https://tools.simonwillison.net/python/show_image.py \
  image.jpg

Here's what I got when I ran it against my V&A East Storehouse photo from this post:

Tags: ascii-art, cli, python, unicode, ai, generative-ai, llms, uv, vibe-coding, gpt-5, rich

f2

2025-05-24T19:20:48+00:00

Really neat CLI tool for bulk renaming of files and directories by Ayooluwa Isaiah, written in Go and designed to work cross-platform.

There's a lot of great design in this. Basic usage is intuitive - here's how to rename all .svg files to .tmp.svg in the current directory:

f2 -f '.svg' -r '.tmp.svg' path/to/dir

f2 defaults to a dry run which looks like this:

*————————————————————*————————————————————————*————————*
|      ORIGINAL      |        RENAMED         | STATUS |
*————————————————————*————————————————————————*————————*
| claude-pelican.svg | claude-pelican.tmp.svg | ok     |
| gemini-pelican.svg | gemini-pelican.tmp.svg | ok     |
*————————————————————*————————————————————————*————————*
dry run: commit the above changes with the -x/--exec flag

Running -x executes the rename.

The really cool stuff is the advanced features - Ayooluwa has thought of everything. The EXIF integration is particularly clevel - here's an example from the advanced tutorial which renames a library of photos to use their EXIF creation date as part of the file path:

f2 -r '{x.cdt.YYYY}/{x.cdt.MM}-{x.cdt.MMM}/{x.cdt.YYYY}-{x.cdt.MM}-{x.cdt.DD}/{f}{ext}' -R

The -R flag means "recursive". The small -r uses variable syntax for EXIF data. There are plenty of others too, including hash variables that use the hash of the file contents.

Installation notes

I had Go 1.23.2 installed on my Mac via Homebrew. I ran this:

go install github.com/ayoisaiah/f2/v2/cmd/f2@latest

And got an error:

requires go >= 1.24.2 (running go 1.23.2; GOTOOLCHAIN=local)

So I upgraded Go using Homebrew:

brew upgrade go

Which took me to 1.24.3 - then the go install command worked. It put the binary in ~/go/bin/f2.

There's also an npm package, similar to the pattern I wrote about a while ago of people Bundling binary tools in Python wheels.

Via Hacker News

Tags: cli, go

OpenAI Codex

2025-05-16T19:12:06+00:00

OpenAI Codex

Announced today, here's the documentation for OpenAI's "cloud-based software engineering agent". It's not yet available for us $20/month Plus customers ("coming soon") but if you're a $200/month Pro user you can try it out now.

At a high level, you specify a prompt, and the agent goes to work in its own environment. After about 8–10 minutes, the agent gives you back a diff.

You can execute prompts in either ask mode or code mode. When you select ask, Codex clones a read-only version of your repo, booting faster and giving you follow-up tasks. Code mode, however, creates a full-fledged environment that the agent can run and test against.

This 4 minute demo video is a useful overview. One note that caught my eye is that the setup phase for an environment can pull from the internet (to install necessary dependencies) but the agent loop itself still runs in a network disconnected sandbox.

It sounds similar to GitHub's own Copilot Workspace project, which can compose PRs against your code based on a prompt. The big difference is that Codex incorporates a full Code Interpeter style environment, allowing it to build and run the code it's creating and execute tests in a loop.

Copilot Workspaces has a level of integration with Codespaces but still requires manual intervention to help exercise the code.

Also similar to Copilot Workspaces is a confusing name. OpenAI now have four products called Codex:

OpenAI Codex, announced today.
Codex CLI, a completely different coding assistant tool they released a few weeks ago that is the same kind of shape as Claude Code. This one owns the openai/codex namespace on GitHub.
codex-mini, a brand new model released today that is used by their Codex product. It's a fine-tuned o4-mini variant. I released llm-openai-plugin 0.4 adding support for that model.
OpenAI Codex (2021) - Internet Archive link, OpenAI's first specialist coding model from the GPT-3 era. This was used by the original GitHub Copilot and is still the current topic of Wikipedia's OpenAI Codex page.

My favorite thing about this most recent Codex product is that OpenAI shared the full Dockerfile for the environment that the system uses to run code - in openai/codex-universal on GitHub because openai/codex was taken already.

This is extremely useful documentation for figuring out how to use this thing - I'm glad they're making this as transparent as possible.

And to be fair, If you ignore it previous history Codex Is a good name for this product. I'm just glad they didn't call it Ada.

Tags: cli, github, ai, openai, generative-ai, llms, ai-assisted-programming, llm, ai-agents, llm-release, coding-agents, async-coding-agents

sqlite-utils 4.0a0

2025-05-09T04:02:31+00:00

sqlite-utils 4.0a0

New alpha release of sqlite-utils, my Python library and CLI tool for manipulating SQLite databases.

It's the first 4.0 alpha because there's a (minor) backwards-incompatible change: I've upgraded the .upsert() and .upsert_all() methods to use SQLIte's UPSERT mechanism, INSERT INTO ... ON CONFLICT DO UPDATE. Details in this issue.

That feature was added to SQLite in version 3.24.0, released 2018-06-04. I'm pretty cautious about my SQLite version support since the underlying library can be difficult to upgrade, depending on your platform and operating system.

I'm going to leave the new alpha to bake for a little while before pushing a stable release. Since this is a major version bump I'm going to take the opportunity to see if there are any other minor API warts that I can clean up at the same time.

Tags: cli, projects, sqlite, sqlite-utils

Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)

2025-05-05T17:38:25+00:00

The new llm-video-frames plugin can turn a video file into a sequence of JPEG frames and feed them directly into a long context vision LLM such as GPT-4.1, even when that LLM doesn't directly support video input. It depends on a plugin feature I added to LLM 0.25, which I released last night.

Here's how to try it out:

brew install ffmpeg # or apt-get or your package manager of choice
uv tool install llm # or pipx install llm or pip install llm
llm install llm-video-frames
llm keys set openai
# Paste your OpenAI API key here

llm -f video-frames:video.mp4 \
  'describe the key scenes in this video' \
  -m gpt-4.1-mini

The video-frames:filepath.mp4 syntax is provided by the new plugin. It takes that video, converts it to a JPEG for every second of the video and then turns those into attachments that can be passed to the LLM.

Here's a demo, against this video of Cleo:

llm -f video-frames:cleo.mp4 'describe key moments' -m gpt-4.1-mini

And the output from the model (transcript here):

The sequence of images captures the key moments of a dog being offered and then enjoying a small treat:

In the first image, a hand is holding a small cupcake with purple frosting close to a black dog's face. The dog looks eager and is focused intently on the treat.

The second image shows the dog beginning to take a bite of the cupcake from the person's fingers. The dog's mouth is open, gently nibbling on the treat.

In the third image, the dog has finished or is almost done with the treat and looks content, with a slight smile and a relaxed posture. The treat is no longer in the person's hand, indicating that the dog has consumed it.

This progression highlights the dog's anticipation, enjoyment, and satisfaction with the treat.

Total cost: 7,072 input tokens, 156 output tokens - for GPT-4.1 mini that's 0.3078 cents (less than a third of a cent).

In this case the plugin converted the video into three images: frame_00001.jpg, frame_00002.jpg and frame_00003.jpg.

The plugin accepts additional arguments. You can increase the frames-per-second using ?fps=2 - for example:

llm -f 'video-frames:video.mp4?fps=2' 'summarize this video'

Or you can add ?timestamps=1 to cause ffmpeg to overlay a timestamp in the bottom right corner of each frame. This gives the model a chance to return timestamps in its output.

Let's try that with the Cleo video:

llm -f 'video-frames:cleo.mp4?timestamps=1&fps=5' \
  'key moments, include timestamps' -m gpt-4.1-mini

Here's the output (transcript here):

Here are the key moments from the video "cleo.mp4" with timestamps:

00:00:00.000 - A dog on leash looks at a small cupcake with purple frosting being held by a person.

00:00:00.800 - The dog closely sniffs the cupcake.

00:00:01.400 - The person puts a small amount of the cupcake frosting on their finger.

00:00:01.600 - The dog starts licking the frosting from the person's finger.

00:00:02.600 - The dog continues licking enthusiastically.

Let me know if you need more details or a description of any specific part.

That one sent 14 images to the API, at a total cost of 32,968 input, 141 output = 1.3413 cents.

It sent 5.9MB of image data to OpenAI's API, encoded as base64 in the JSON API call.

The GPT-4.1 model family accepts up to 1,047,576 tokens. Aside from a 20MB size limit per image I haven't seen any documentation of limits on the number of images. You can fit a whole lot of JPEG frames in a million tokens!

Here's what one of those frames looks like with the timestamp overlaid in the corner:

How I built the plugin with o4-mini

This is a great example of how rapid prototyping with an LLM can help demonstrate the value of a feature.

I was considering whether it would make sense for fragment plugins to return images in issue 972 when I had the idea to use ffmpeg to split a video into frames.

I know from past experience that a good model can write an entire plugin for LLM if you feed it the right example, so I started with this (reformatted here for readability):

llm -m o4-mini -f github:simonw/llm-hacker-news -s 'write a new plugin called llm_video_frames.py which takes video:path-to-video.mp4 and creates a temporary directory which it then populates with one frame per second of that video using ffmpeg - then it returns a list of [llm.Attachment(path="path-to-frame1.jpg"), ...] - it should also support passing video:video.mp4?fps=2 to increase to two frames per second, and if you pass ?timestamps=1 or &timestamps=1 then it should add a text timestamp to the bottom right conner of each image with the mm:ss timestamp of that frame (or hh:mm:ss if more than one hour in) and the filename of the video without the path as well.' -o reasoning_effort high

Here's the transcript.

The new attachment mechanism went from vague idea to "I should build that" as a direct result of having an LLM-built proof-of-concept that demonstrated the feasibility of the new feature.

The code it produced was about 90% of the code I shipped in the finished plugin. Total cost 5,018 input, 2,208 output = 1.5235 cents.

Annotated release notes for everything else in LLM 0.25

Here are the annotated release notes for everything else:

New plugin feature: register_fragment_loaders(register) plugins can now return a mixture of fragments and attachments. The llm-video-frames plugin is the first to take advantage of this mechanism. #972

As decsribed above. The inspiration for this feature came from the llm-arxiv plugin by agustif.

New OpenAI models: gpt-4.1, gpt-4.1-mini, gpt-41-nano, o3, o4-mini. #945, #965, #976.

My original plan was to leave these models exclusively to the new llm-openai plugin, since that allows me to add support for new models without a full LLM release. I'm going to punt on that until I'm ready to entirely remove the OpenAI models from LLM core.

New environment variables: LLM_MODEL and LLM_EMBEDDING_MODEL for setting the model to use without needing to specify -m model_id every time. #932

A convenience feature for if you want to set the default model for a terminal session with LLM without using the global default model" mechanism.

New command: llm fragments loaders, to list all currently available fragment loader prefixes provided by plugins. #941

Mainly for consistence with the existing llm templates loaders command. Here's the output when I run llm fragments loaders on my machine:

docs:
  Fetch the latest documentation for the specified package from
  https://github.com/simonw/docs-for-llms

  Use '-f docs:' for the documentation of your current version of LLM.

docs-preview:
  Similar to docs: but fetches the latest docs including alpha/beta releases.

symbex:
  Walk the given directory, parse every .py file, and for every
  top-level function or class-method produce its signature and
  docstring plus an import line.

github:
  Load files from a GitHub repository as fragments

  Argument is a GitHub repository URL or username/repository

issue:
  Fetch GitHub issue/pull and comments as Markdown

  Argument is either "owner/repo/NUMBER" or URL to an issue

pr:
  Fetch GitHub pull request with comments and diff as Markdown

  Argument is either "owner/repo/NUMBER" or URL to a pull request

hn:
  Given a Hacker News article ID returns the full nested conversation.

  For example: -f hn:43875136

video-frames:
  Fragment loader "video-frames:<path>?fps=N&timestamps=1"
  - extracts frames at `fps` per second (default 1)
  - if `timestamps=1`, overlays "filename hh:mm:ss" at bottom-right

That's from llm-docs, llm-fragments-symbex, llm-fragments-github, llm-hacker-news and llm-video-frames.

llm fragments command now shows fragments ordered by the date they were first used. #973

This makes it easier to quickly debug a new fragment plugin - you can run llm fragments and glance at the bottom few entries.

I've also been using the new llm-echo debugging plugin for this - it adds a new fake model called "echo" which simply outputs whatever the prompt, system prompt, fragments and attachments are that were passed to the model:

llm -f docs:sqlite-utils -m echo 'Show me the context'

Output here.

llm chat now includes a !edit command for editing a prompt using your default terminal text editor. Thanks, Benedikt Willi. #969

This is a really nice enhancement to llm chat, making it much more convenient to edit longe prompts.

And the rest:

Allow -t and --system to be used at the same time. #916

Fixed a bug where accessing a model via its alias would fail to respect any default options set for that model. #968

Improved documentation for extra-openai-models.yaml. Thanks, Rahim Nathwani and Dan Guido. #950, #957

llm -c/--continue now works correctly with the -d/--database option. llm chat now accepts that -d/--database option. Thanks, Sukhbinder Singh. #933

Tags: cli, ffmpeg, plugins, projects, ai, generative-ai, llms, ai-assisted-programming, llm, vision-llms

llm-fragment-symbex

2025-04-23T14:25:38+00:00

llm-fragment-symbex

I released a new LLM fragment loader plugin that builds on top of my Symbex project.

Symbex is a CLI tool I wrote that can run against a folder full of Python code and output functions, classes, methods or just their docstrings and signatures, using the Python AST module to parse the code.

llm-fragments-symbex brings that ability directly to LLM. It lets you do things like this:

llm install llm-fragments-symbex
llm -f symbex:path/to/project -s 'Describe this codebase'

I just ran that against my LLM project itself like this:

cd llm
llm -f symbex:. -s 'guess what this code does'

Here's the full output, which starts like this:

This code listing appears to be an index or dump of Python functions, classes, and methods primarily belonging to a codebase related to large language models (LLMs). It covers a broad functionality set related to managing LLMs, embeddings, templates, plugins, logging, and command-line interface (CLI) utilities for interaction with language models. [...]

That page also shows the input generated by the fragment - here's a representative extract:

# from llm.cli import resolve_attachment
def resolve_attachment(value):
    """Resolve an attachment from a string value which could be:
    - "-" for stdin
    - A URL
    - A file path

    Returns an Attachment object.
    Raises AttachmentError if the attachment cannot be resolved."""

# from llm.cli import AttachmentType
class AttachmentType:

    def convert(self, value, param, ctx):

# from llm.cli import resolve_attachment_with_type
def resolve_attachment_with_type(value: str, mimetype: str) -> Attachment:

If your Python code has good docstrings and type annotations, this should hopefully be a shortcut for providing full API documentation to a model without needing to dump in the entire codebase.

The above example used 13,471 input tokens and 781 output tokens, using openai/gpt-4.1-mini. That model is extremely cheap, so the total cost was 0.6638 cents - less than a cent.

The plugin itself was mostly written by o4-mini using the llm-fragments-github plugin to load the simonw/symbex and simonw/llm-hacker-news repositories as example code:

llm \
  -f github:simonw/symbex \
  -f github:simonw/llm-hacker-news \
  -s "Write a new plugin as a single llm_fragments_symbex.py file which
   provides a custom loader which can be used like this:
   llm -f symbex:path/to/folder - it then loads in all of the python
   function signatures with their docstrings from that folder using
   the same trick that symbex uses, effectively the same as running
   symbex . '*' '*.*' --docs --imports -n" \
   -m openai/o4-mini -o reasoning_effort high

Here's the response. 27,819 input, 2,918 output = 4.344 cents.

In working on this project I identified and fixed a minor cosmetic defect in Symbex itself. Technically this is a breaking change (it changes the output) so I shipped that as Symbex 2.0.

Tags: cli, projects, ai, generative-ai, llms, ai-assisted-programming, symbex, llm

Claude Code: Best practices for agentic coding

2025-04-19T22:17:38+00:00

Claude Code: Best practices for agentic coding

Extensive new documentation from Anthropic on how to get the best results out of their Claude Code CLI coding agent tool, which includes this fascinating tip:

We recommend using the word "think" to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: "think" < "think hard" < "think harder" < "ultrathink." Each level allocates progressively more thinking budget for Claude to use.

Apparently ultrathink is a magic word!

I was curious if this was a feature of the Claude model itself or Claude Code in particular. Claude Code isn't open source but you can view the obfuscated JavaScript for it, and make it a tiny bit less obfuscated by running it through Prettier. With Claude's help I used this recipe:

mkdir -p /tmp/claude-code-examine
cd /tmp/claude-code-examine
npm init -y
npm install @anthropic-ai/claude-code
cd node_modules/@anthropic-ai/claude-code
npx prettier --write cli.js

Then used ripgrep to search for "ultrathink":

rg ultrathink -C 30

And found this chunk of code:

let B = W.message.content.toLowerCase();
if (
  B.includes("think harder") ||
  B.includes("think intensely") ||
  B.includes("think longer") ||
  B.includes("think really hard") ||
  B.includes("think super hard") ||
  B.includes("think very hard") ||
  B.includes("ultrathink")
)
  return (
    l1("tengu_thinking", { tokenCount: 31999, messageId: Z, provider: G }),
    31999
  );
if (
  B.includes("think about it") ||
  B.includes("think a lot") ||
  B.includes("think deeply") ||
  B.includes("think hard") ||
  B.includes("think more") ||
  B.includes("megathink")
)
  return (
    l1("tengu_thinking", { tokenCount: 1e4, messageId: Z, provider: G }), 1e4
  );
if (B.includes("think"))
  return (
    l1("tengu_thinking", { tokenCount: 4000, messageId: Z, provider: G }),
    4000
  );

So yeah, it looks like "ultrathink" is a Claude Code feature - presumably that 31999 is a number that affects the token thinking budget, especially since "megathink" maps to 1e4 tokens (10,000) and just plain "think" maps to 4,000.

Via @HamelHusain

Tags: cli, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, llm-reasoning, coding-agents, claude-code

openai/codex

2025-04-16T17:25:39+00:00

openai/codex

Just released by OpenAI, a "lightweight coding agent that runs in your terminal". Looks like their version of Claude Code, though unlike Claude Code Codex is released under an open source (Apache 2) license.

Here's the main prompt that runs in a loop, which starts like this:

You are operating as and within the Codex CLI, a terminal-based agentic coding assistant built by OpenAI. It wraps OpenAI models to enable natural language interaction with a local codebase. You are expected to be precise, safe, and helpful.

You can:
- Receive user prompts, project context, and files.
- Stream responses and emit function calls (e.g., shell commands, code edits).
- Apply patches, run commands, and manage user approvals based on policy.
- Work inside a sandboxed, git-backed workspace with rollback support.
- Log telemetry so sessions can be replayed or inspected later.
- More details on your functionality are available at codex --help

The Codex CLI is open-sourced. Don't confuse yourself with the old Codex language model built by OpenAI many moons ago (this is understandably top of mind for you!). Within this context, Codex refers to the open-source agentic coding interface. [...]

I like that the prompt describes OpenAI's previous Codex language model as being from "many moons ago". Prompt engineering is so weird.

Since the prompt says that it works "inside a sandboxed, git-backed workspace" I went looking for the sandbox. On macOS it uses the little-known sandbox-exec process, part of the OS but grossly under-documented. The best information I've found about it is this article from 2020, which notes that man sandbox-exec lists it as deprecated. I didn't spot evidence in the Codex code of sandboxes for other platforms.

Tags: cli, macos, open-source, sandboxing, ai, openai, prompt-engineering, generative-ai, llms, ai-assisted-programming, ai-agents, coding-agents, claude-code, codex-cli

llm-openrouter 0.4

2025-03-10T21:40:56+00:00

llm-openrouter 0.4

I found out this morning that OpenRouter include support for a number of (rate-limited) free API models.

I occasionally run workshops on top of LLMs (like this one) and being able to provide students with a quick way to obtain an API key against models where they don't have to setup billing is really valuable to me!

This inspired me to upgrade my existing llm-openrouter plugin, and in doing so I closed out a bunch of open feature requests.

Consider this post the annotated release notes:

LLM schema support for OpenRouter models that support structured output. #23

I'm trying to get support for LLM's new schema feature into as many plugins as possible.

OpenRouter's OpenAI-compatible API includes support for the response_format structured content option, but with an important caveat: it only works for some models, and if you try to use it on others it is silently ignored.

I filed an issue with OpenRouter requesting they include schema support in their machine-readable model index. For the moment LLM will let you specify schemas for unsupported models and will ignore them entirely, which isn't ideal.

llm openrouter key command displays information about your current API key. #24

Useful for debugging and checking the details of your key's rate limit.

llm -m ... -o online 1 enables web search grounding against any model, powered by Exa. #25

OpenRouter apparently make this feature available to every one of their supported models! They're using new-to-me Exa to power this feature, an AI-focused search engine startup who appear to have built their own index with their own crawlers (according to their FAQ). This feature is currently priced by OpenRouter at $4 per 1000 results, and since 5 results are returned for every prompt that's 2 cents per prompt.

llm openrouter models command for listing details of the OpenRouter models, including a --json option to get JSON and a --free option to filter for just the free models. #26

This offers a neat way to list the available models. There are examples of the output in the comments on the issue.

New option to specify custom provider routing: -o provider '{JSON here}'. #17

Part of OpenRouter's USP is that it can route prompts to different providers depending on factors like latency, cost or as a fallback if your first choice is unavailable - great for if you are using open weight models like Llama which are hosted by competing companies.

The options they provide for routing are very thorough - I had initially hoped to provide a set of CLI options that covered all of these bases, but I decided instead to reuse their JSON format and forward those options directly on to the model.

Tags: cli, plugins, projects, ai, annotated-release-notes, generative-ai, llms, llm, openrouter, ai-assisted-search

Mistral OCR

2025-03-07T01:39:26+00:00

Mistral OCR

New closed-source specialist OCR model by Mistral - you can feed it images or a PDF and it produces Markdown with optional embedded images.

It's available via their API, or it's "available to self-host on a selective basis" for people with stringent privacy requirements who are willing to talk to their sales team.

I decided to try out their API, so I copied and pasted example code from their notebook into my custom Claude project and told it:

Turn this into a CLI app, depends on mistralai - it should take a file path and an optional API key defauling to env vironment called MISTRAL_API_KEY

After some further iteration / vibe coding I got to something that worked, which I then tidied up and shared as mistral_ocr.py.

You can try it out like this:

export MISTRAL_API_KEY='...'
uv run http://tools.simonwillison.net/python/mistral_ocr.py \
  mixtral.pdf --html --inline-images > mixtral.html

I fed in the Mixtral paper as a PDF. The API returns Markdown, but my --html option renders that Markdown as HTML and the --inline-images option takes any images and inlines them as base64 URIs (inspired by monolith). The result is mixtral.html, a 972KB HTML file with images and text bundled together.

This did a pretty great job!

My script renders Markdown tables but I haven't figured out how to render inline Markdown MathML yet. I ran the command a second time and requested Markdown output (the default) like this:

uv run http://tools.simonwillison.net/python/mistral_ocr.py \
  mixtral.pdf > mixtral.md

Here's that Markdown rendered as a Gist - there are a few MathML glitches so clearly the Mistral OCR MathML dialect and the GitHub Formatted Markdown dialect don't quite line up.

My tool can also output raw JSON as an alternative to Markdown or HTML - full details in the documentation.

The Mistral API is priced at roughly 1000 pages per dollar, with a 50% discount for batch usage.

The big question with LLM-based OCR is always how well it copes with accidental instructions in the text (can you safely OCR a document full of prompting examples?) and how well it handles text it can't write.

Mistral's Sophia Yang says it "should be robust" against following instructions in the text, and invited people to try and find counter-examples.

Alexander Doria noted that Mistral OCR can hallucinate text when faced with handwriting that it cannot understand.

Via @sophiamyang

Tags: cli, ocr, pdf, projects, ai, generative-ai, llms, claude, mistral, vision-llms, uv

monolith

2025-03-06T15:37:48+00:00

monolith

Neat CLI tool built in Rust that can create a single packaged HTML file of a web page plus all of its dependencies.

cargo install monolith # or brew install
monolith https://simonwillison.net/ > simonwillison.html

That command produced this 1.5MB single file result. All of the linked images, CSS and JavaScript assets have had their contents inlined into base64 URIs in their src= and href= attributes.

I was intrigued as to how it works, so I dumped the whole repository into Gemini 2.0 Pro and asked for an architectural summary:

cd /tmp
git clone https://github.com/Y2Z/monolith
cd monolith
files-to-prompt . -c | llm -m gemini-2.0-pro-exp-02-05 \
  -s 'architectural overview as markdown'

Here's what I got. Short version: it uses the reqwest, html5ever, markup5ever_rcdom and cssparser crates to fetch and parse HTML and CSS and extract, combine and rewrite the assets. It doesn't currently attempt to run any JavaScript.

Via Comment on Hacker News

Tags: cli, scraping, ai, rust, generative-ai, llms, ai-assisted-programming, files-to-prompt

Aider: Using uv as an installer

2025-03-06T01:47:20+00:00

Aider: Using uv as an installer

Paul Gauthier has an innovative solution for the challenge of helping end users get a copy of his Aider CLI Python utility installed in an isolated virtual environment without first needing to teach them what an "isolated virtual environment" is.

Provided you already have a Python install of version 3.8 or higher you can run this:

pip install aider-install && aider-install

The aider-install package itself depends on uv. When you run aider-install it executes the following Python code:

def install_aider():
    try:
        uv_bin = uv.find_uv_bin()
        subprocess.check_call([
            uv_bin, "tool", "install", "--force", "--python", "python3.12", "aider-chat@latest"
        ])
        subprocess.check_call([uv_bin, "tool", "update-shell"])
    except subprocess.CalledProcessError as e:
        print(f"Failed to install aider: {e}")
        sys.exit(1)

This first figures out the location of the uv Rust binary, then uses it to install his aider-chat package by running the equivalent of this command:

uv tool install --force --python python3.12 aider-chat@latest

This will in turn install a brand new standalone copy of Python 3.12 and tuck it away in uv's own managed directory structure where it shouldn't hurt anything else.

The aider-chat script defaults to being dropped in the XDG standard directory, which is probably ~/.local/bin - see uv's documentation. The --force flag ensures that uv will overwrite any previous attempts at installing aider-chat in that location with the new one.

Finally, running uv tool update-shell ensures that bin directory is on the user's PATH.

I think I like this. There is a LOT of stuff going on here, and experienced users may well opt for an alternative installation mechanism.

But for non-expert Python users who just want to start using Aider, I think this pattern represents quite a tasteful way of getting everything working with minimal risk of breaking the user's system.

Update: Paul adds:

Offering this install method dramatically reduced the number of GitHub issues from users with conflicted/broken python environments.

I also really like the "curl | sh" aider installer based on uv. Even users who don't have python installed can use it.

Tags: cli, python, aider, uv, paul-gauthier

strip-tags 0.6

2025-02-28T22:02:16+00:00

strip-tags 0.6

It's been a while since I updated this tool, but in investigating a tricky mistake in my tutorial for LLM schemas I discovered a bug that I needed to fix.

Those release notes in full:

Fixed a bug where strip-tags -t meta still removed <meta> tags from the <head> because the entire <head> element was removed first. #32

Kept <meta> tags now default to keeping their content and property attributes.

The CLI -m/--minify option now also removes any remaining blank lines. #33

A new strip_tags(remove_blank_lines=True) option can be used to achieve the same thing with the Python library function.

Now I can do this and persist the <meta> tags for the article along with the stripped text content:

curl -s 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  strip-tags -t meta --minify

Here's the output from that command.

Tags: cli, html, projects

Structured data extraction from unstructured content using LLM schemas

2025-02-28T17:07:07+00:00

LLM 0.23 is out today, and the signature feature is support for schemas - a new way of providing structured output from a model that matches a specification provided by the user. I've also upgraded both the llm-anthropic and llm-gemini plugins to add support for schemas.

TLDR: you can now do things like this:

llm --schema 'name,age int,short_bio' 'invent a cool dog'

And get back:

{
  "name": "Zylo",
  "age": 4,
  "short_bio": "Zylo is a unique hybrid breed, a mix between a Siberian Husky and a Corgi. With striking blue eyes and a fluffy, colorful coat that changes shades with the seasons, Zylo embodies the spirit of winter and summer alike. Known for his playful personality and intelligence, Zylo can perform a variety of tricks and loves to fetch his favorite frisbee. Always ready for an adventure, he's just as happy hiking in the mountains as he is cuddling on the couch after a long day of play."
}

More details in the release notes and LLM schemas tutorial, which includes an example (extracting people from news articles) that's even more useful than inventing dogs!

Structured data extraction is a killer app for LLMs

I've suspected for a while that the single most commercially valuable application of LLMs is turning unstructured content into structured data. That's the trick where you feed an LLM an article, or a PDF, or a screenshot and use it to turn that into JSON or CSV or some other structured format.

It's possible to achieve strong results on this with prompting alone: feed data into an LLM, give it an example of the output you would like and let it figure out the details.

Many of the leading LLM providers now bake this in as a feature. OpenAI, Anthropic, Gemini and Mistral all offer variants of "structured output" as additional options through their API:

OpenAI: Structured Outputs
Gemini: Generate structured output with the Gemini API
Mistral: Custom Structured Outputs
Anthropic's tool use can be used for this, as shown in their Extracting Structured JSON using Claude and Tool Use cookbook example.

These mechanisms are all very similar: you pass a JSON schema to the model defining the shape that you would like, they then use that schema to guide the output of the model.

How reliable that is can vary! Some providers use tricks along the lines of Jsonformer, compiling the JSON schema into code that interacts with the model's next-token generation at runtime, limiting it to only generate tokens that are valid in the context of the schema.

Other providers YOLO it - they trust that their model is "good enough" that showing it the schema will produce the right results!

In practice, this means that you need to be aware that sometimes this stuff will go wrong. As with anything LLM, 100% reliability is never guaranteed.

From my experiments so far, and depending on the model that you chose, these mistakes are rare. If you're using a top tier model it will almost certainly do the right thing.

Designing this feature for LLM

I've wanted this feature for ages. I see it as an important step on the way to full tool usage, which is something I'm very excited to bring to the CLI tool and Python library.

LLM is designed as an abstraction layer over different models. This makes building new features much harder, because I need to figure out a common denominator and then build an abstraction that captures as much value as possible while still being general enough to work across multiple models.

Support for structured output across multiple vendors has matured now to the point that I'm ready to commit to a design.

My first version of this feature worked exclusively with JSON schemas. An earlier version of the tutorial started with this example:

curl https://www.nytimes.com/ | uvx strip-tags | \
  llm --schema '{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "headline": {
            "type": "string"
          },
          "short_summary": {
            "type": "string"
          },
          "key_points": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        },
        "required": ["headline", "short_summary", "key_points"]
      }
    }
  },
  "required": ["items"]
}' | jq

Here we're feeding a full JSON schema document to the new llm --schema option, then piping in the homepage of the New York Times (after running it through strip-tags) and asking for headline, short_summary and key_points for multiple items on the page.

This example still works with the finished feature - you can see example JSON output here - but constructing those long-form schemas by hand was a big pain.

So... I invented my own shortcut syntax.

That earlier example is a simple illustration:

llm --schema 'name,age int,short_bio' 'invent a cool dog'

Here the schema is a comma-separated list of field names, with an optional space-separated type.

The full concise schema syntax is described here. There's a more complex example in the tutorial, which uses the newline-delimited form to extract information about people who are mentioned in a news article:

curl 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  uvx strip-tags | \
  llm --schema-multi "
name: the person's name
organization: who they represent
role: their job title or role
learned: what we learned about them from this story
article_headline: the headline of the story
article_date: the publication date in YYYY-MM-DD
" --system 'extract people mentioned in this article'

The --schema-multi option here tells LLM to take that schema for a single object and upgrade it to an array of those objects (actually an object with a single "items" property that's an array of objects), which is a quick way to request that the same schema be returned multiple times against a single input.

Reusing schemas and creating templates

My original plan with schemas was to provide a separate llm extract command for running these kinds of operations. I ended up going in a different direction - I realized that adding --schema to the default llm prompt command would make it interoperable with other existing features (like attachments for feeding in images and PDFs).

The most valuable way to apply schemas is across many different prompts, in order to gather the same structure of information from many different sources.

I put a bunch of thought into the --schema option. It takes a variety of different values - quoting the documentation:

This option can take multiple forms:

A string providing a JSON schema: --schema '{"type": "object", ...}'

A condensed schema definition: --schema 'name,age int'

The name or path of a file on disk containing a JSON schema: --schema dogs.schema.json

The hexadecimal ID of a previously logged schema: --schema 520f7aabb121afd14d0c6c237b39ba2d - these IDs can be found using the llm schemas command.

A schema that has been saved in a template: --schema t:name-of-template

The tutorial demonstrates saving a schema by using it once and then obtaining its ID through the new llm schemas command, then saving it to a template (along with the system prompt) like this:

llm --schema 3b7702e71da3dd791d9e17b76c88730e \
  --system 'extract people mentioned in this article' \
  --save people

And now we can feed in new articles using the llm -t people shortcut to apply that newly saved template:

curl https://www.theguardian.com/commentisfree/2025/feb/27/billy-mcfarland-new-fyre-festival-fantasist | \
  strip-tags | llm -t people

Doing more with the logged structured data

Having run a few prompts that use the same schema, an obvious next step is to do something with the data that has been collected.

I ended up implementing this on top of the existing llm logs mechanism.

LLM already defaults to logging every prompt and response it makes to a SQLite database - mine contains over 4,747 of these records now, according to this query:

sqlite3 "$(llm logs path)" 'select count(*) from responses'

With schemas, an increasing portion of those are valid JSON.

Since LLM records the schema that was used for each response - using the schema ID, which is derived from a content hash of the expanded JSON schema - it's now possible to ask LLM for all responses that used a particular schema:

llm logs --schema 3b7702e71da3dd791d9e17b76c88730e --short

I got back:

- model: gpt-4o-mini
  datetime: '2025-02-28T07:37:18'
  conversation: 01jn5qt397aaxskf1vjp6zxw2a
  system: extract people mentioned in this article
  prompt: Menu AP Logo Menu World U.S. Politics Sports Entertainment Business Science
    Fact Check Oddities Be Well Newsletters N...
- model: gpt-4o-mini
  datetime: '2025-02-28T07:38:58'
  conversation: 01jn5qx4q5he7yq803rnexp28p
  system: extract people mentioned in this article
  prompt: Skip to main contentSkip to navigationSkip to navigationPrint subscriptionsNewsletters
    Sign inUSUS editionUK editionA...
- model: gpt-4o
  datetime: '2025-02-28T07:39:07'
  conversation: 01jn5qxh20tksb85tf3bx2m3bd
  system: extract people mentioned in this article
  attachments:
  - type: image/jpeg
    url: https://static.simonwillison.net/static/2025/onion-zuck.jpg

As you can see, I've run that example schema three times (while constructing the tutorial) using GPT-4o mini - twice against text content from curl ... | strip-tags and once against a screenshot JPEG to demonstrate attachment support.

Extracting gathered JSON from the logs is clearly a useful next step... so I added several options to llm logs to support that use-case.

The first is --data - adding that will cause LLM logs to output just the data that was gathered using a schema. Mix that with -c to see the JSON from the most recent response:

llm logs -c --data

Outputs:

{"name": "Zap", "age": 5, "short_bio": ...

Combining that with the --schema option is where things get really interesting. You can specify a schema using any of the mechanisms described earlier, which means you can see ALL of the data gathered using that schema by combining --data with --schema X (and -n 0 for everything).

Here are all of the dogs I've invented:

llm logs --schema 'name,age int,short_bio' --data -n 0

Output (here truncated):

{"name": "Zap", "age": 5, "short_bio": "Zap is a futuristic ..."}
{"name": "Zephyr", "age": 3, "short_bio": "Zephyr is an adventurous..."}
{"name": "Zylo", "age": 4, "short_bio": "Zylo is a unique ..."}

Some schemas gather multiple items, producing output that looks like this (from the tutorial):

{"items": [{"name": "Mark Zuckerberg", "organization": "...
{"items": [{"name": "Billy McFarland", "organization": "...

We can get back the individual objects by adding --data-key items. Here I'm also using the --schema t:people shortcut to specify the schema that was saved to the people template earlier on.

llm logs --schema t:people --data-key items

Output:

{"name": "Katy Perry", "organization": ...
{"name": "Gayle King", "organization": ...
{"name": "Lauren Sanchez", "organization": ...

This feature defaults to outputting newline-delimited JSON, but you can add the --data-array flag to get back a JSON array of objects instead.

... which means you can pipe it into sqlite-utils insert to create a SQLite database!

llm logs --schema t:people --data-key items --data-array | \
  sqlite-utils insert data.db people -

Add all of this together and we can construct a schema, run it against a bunch of sources and dump the resulting structured data into SQLite where we can explore it using SQL queries (and Datasette). It's a really powerful combination.

Using schemas from LLM's Python library

The most popular way to work with schemas in Python these days is with Pydantic, to the point that many of the official API libraries for models directly incorporate Pydantic for this purpose.

LLM depended on Pydantic already, and for this project I finally dropped my dual support for Pydantic v1 and v2 and committed to v2 only.

A key reason Pydantic is popular for this is that it's trivial to use it to build a JSON schema document:

import pydantic, json

class Dog(pydantic.BaseModel):
    name: str
    age: int
    bio: str

schema = Dog.model_json_schema()
print(json.dumps(schema, indent=2))

Outputs:

{
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "age": {
      "title": "Age",
      "type": "integer"
    },
    "bio": {
      "title": "Bio",
      "type": "string"
    }
  },
  "required": [
    "name",
    "age",
    "bio"
  ],
  "title": "Dog",
  "type": "object"
}

LLM's Python library doesn't require you to use Pydantic, but it supports passing either a Pydantic BaseModel subclass or a full JSON schema to the new model.prompt(schema=) parameter. Here's the usage example from the documentation:

import llm, json
from pydantic import BaseModel

class Dog(BaseModel):
    name: str
    age: int

model = llm.get_model("gpt-4o-mini")
response = model.prompt("Describe a nice dog", schema=Dog)
dog = json.loads(response.text())
print(dog)
# {"name":"Buddy","age":3}

What's next for LLM schemas?

So far I've implemented schema support for models from OpenAI, Anthropic and Gemini. The plugin author documentation includes details on how to add this to further plugins - I'd love to see one of the local model plugins implement this pattern as well.

Update llm-ollama now support schemas thanks to this PR by Adam Compton. And I've added support to llm-mistral.

I'm presenting a workshop at the NICAR 2025 data journalism conference next week about Cutting-edge web scraping techniques. LLM schemas is a great example of NDD - NICAR-Driven Development - where I'm churning out features I need for that conference (see also shot-scraper's new HAR support).

I expect the workshop will be a great opportunity to further refine the design and implementation of this feature!

I'm also going to be using this new feature to add multiple model support to my datasette-extract plugin, which provides a web UI for structured data extraction that writes the resulting records directly to a SQLite database table.

Tags: cli, data-journalism, projects, ai, annotated-release-notes, generative-ai, local-llms, llms, llm, mistral, gemini, ollama, structured-extraction

Claude 3.7 Sonnet and Claude Code

2025-02-24T20:25:39+00:00

Claude 3.7 Sonnet and Claude Code

Anthropic released Claude 3.7 Sonnet today - skipping the name "Claude 3.6" because the Anthropic user community had already started using that as the unofficial name for their October update to 3.5 Sonnet.

As you may expect, 3.7 Sonnet is an improvement over 3.5 Sonnet - and is priced the same, at $3/million tokens for input and $15/m output.

The big difference is that this is Anthropic's first "reasoning" model - applying the same trick that we've now seen from OpenAI o1 and o3, Grok 3, Google Gemini 2.0 Thinking, DeepSeek R1 and Qwen's QwQ and QvQ. The only big model families without an official reasoning model now are Mistral and Meta's Llama.

I'm still working on adding support to my llm-anthropic plugin but I've got enough working code that I was able to get it to draw me a pelican riding a bicycle. Here's the non-reasoning model:

And here's that same prompt but with "thinking mode" enabled:

Here's the transcript for that second one, which mixes together the thinking and the output tokens. I'm still working through how best to differentiate between those two types of token.

Claude 3.7 Sonnet has a training cut-off date of Oct 2024 - an improvement on 3.5 Haiku's July 2024 - and can output up to 64,000 tokens in thinking mode (some of which are used for thinking tokens) and up to 128,000 if you enable a special header:

Claude 3.7 Sonnet can produce substantially longer responses than previous models with support for up to 128K output tokens (beta)---more than 15x longer than other Claude models. This expanded capability is particularly effective for extended thinking use cases involving complex reasoning, rich code generation, and comprehensive content creation.

This feature can be enabled by passing an anthropic-beta header of output-128k-2025-02-19.

Anthropic's other big release today is a preview of Claude Code - a CLI tool for interacting with Claude that includes the ability to prompt Claude in terminal chat and have it read and modify files and execute commands. This means it can both iterate on code and execute tests, making it an extremely powerful "agent" for coding assistance.

Here's Anthropic's documentation on getting started with Claude Code, which uses OAuth (a first for Anthropic's API) to authenticate against your API account, so you'll need to configure billing.

Short version:

npm install -g @anthropic-ai/claude-code
claude

It can burn a lot of tokens so don't be surprised if a lengthy session with it adds up to single digit dollars of API spend.

Tags: cli, oauth, ai, generative-ai, llms, ai-assisted-programming, llm, anthropic, claude, ai-agents, pelican-riding-a-bicycle, llm-reasoning, llm-release, coding-agents, claude-code

files-to-prompt 0.6

2025-02-19T06:12:12+00:00

files-to-prompt 0.6

New release of my CLI tool for turning a whole directory of code into a single prompt ready to pipe or paste into an LLM.

Here are the full release notes:

New -m/--markdown option for outputting results as Markdown with each file in a fenced code block. #42
Support for reading a list of files from standard input. Thanks, Ankit Shankar. #44
Here's how to process just files modified within the last day:
find . -mtime -1 | files-to-prompt
You can also use the -0/--null flag to accept lists of file paths separated by null delimiters, which is useful for handling file names with spaces in them:
find . -name "*.txt" -print0 | files-to-prompt -0

I also have a potential fix for a reported bug concerning nested .gitignore files that's currently sitting in a PR. I'm waiting for someone else to confirm that it behaves as they would expect. I've left details in this issue comment, but the short version is that you can try out the version from the PR using this uvx incantation:

uvx --with git+https://github.com/simonw/files-to-prompt@nested-gitignore files-to-prompt

Tags: cli, projects, annotated-release-notes, llms, uv, files-to-prompt

LLM 0.22, the annotated release notes

2025-02-17T06:19:00+00:00

I released LLM 0.22 this evening. Here are the annotated release notes:

model.prompt(..., key=) for API keys
chatgpt-4o-latest
llm logs -s/--short
llm models -q gemini -q exp
llm embed-multi --prepend X
Everything else

model.prompt(..., key=) for API keys

Plugins that provide models that use API keys can now subclass the new llm.KeyModel and llm.AsyncKeyModel classes. This results in the API key being passed as a new key parameter to their .execute() methods, and means that Python users can pass a key as the model.prompt(..., key=) - see Passing an API key. Plugin developers should consult the new documentation on writing Models that accept API keys. #744

This is the big change. It's only relevant to you if you use LLM as a Python library and you need the ability to pass API keys for OpenAI, Anthropic, Gemini etc in yourself in Python code rather than setting them as an environment variable.

It turns out I need to do that for Datasette Cloud, where API keys are retrieved from individual customer's secret stores!

Thanks to this change, it's now possible to do things like this - the key= parameter to model.prompt() is new:

import llm
model = llm.get_model("gpt-4o-mini")
response = model.prompt("Surprise me!", key="my-api-key")
print(response.text())

Other plugins need to be updated to take advantage of this new feature. Here's the documentation for plugin developers - I've released llm-anthropic 0.13 and llm-gemini 0.11 implementing the new pattern.

chatgpt-4o-latest

New OpenAI model: chatgpt-4o-latest. This model ID accesses the current model being used to power ChatGPT, which can change without warning. #752

This model has actually been around since August 2024 but I had somehow missed it. chatgpt-4o-latest is a model alias that provides access to the current model that is being used for GPT-4o running on ChatGPT, which is not the same as the GPT-4o models usually available via the API. It got an upgrade last week so it's currently the alias that provides access to the most recently released OpenAI model.

Most OpenAI models such as gpt-4o provide stable date-based aliases like gpt-4o-2024-08-06 which effectively let you "pin" to that exact model version. OpenAI technical staff have confirmed that they don't change the model without updating that name.

The one exception is chatgpt-4o-latest - that one can change without warning and doesn't appear to have release notes at all.

It's also a little more expensive that gpt-4o - currently priced at $5/million tokens for input and $15/million for output, compared to GPT 4o's $2.50/$10.

It's a fun model to play with though! As of last week it appears to be very chatty and keen on using emoji. It also claims that it has a July 2024 training cut-off.

llm logs -s/--short

New llm logs -s/--short flag, which returns a greatly shortened version of the matching log entries in YAML format with a truncated prompt and without including the response. #737

The llm logs command lets you search through logged prompt-response pairs - I have 4,419 of them in my database, according to this command:

sqlite-utils tables "$(llm logs path)" --counts  | grep responses

By default it outputs the full prompts and responses as Markdown - and since I've started leaning more into long context models (some recent examples) my logs have been getting pretty hard to navigate.

The new -s/--short flag provides a much more concise YAML format. Here are some of my recent prompts that I've run using Google's Gemini 2.0 Pro experimental model - the -u flag includes usage statistics, and -n 4 limits the output to the most recent 4 entries:

llm logs --short -m gemini-2.0-pro-exp-02-05 -u -n 4

- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:30:48'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: '<documents> <document index="1"> <source>./index.md</source> <document_content>
    # uv An extremely fast Python package...'
  usage:
    input: 281812
    output: 1521
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:32:29'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: I want to set it globally so if I run uv run python anywhere on my computer
    I always get 3.13
  usage:
    input: 283369
    output: 1540
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:23:57'
  conversation: 01jm3cek8eb4z8tkqhf4trk98b
  prompt: '<documents> <document index="1"> <source>./LORA.md</source> <document_content>
    # Fine-Tuning with LoRA or QLoRA You c...'
  usage:
    input: 162885
    output: 2558
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:30:13'
  conversation: 01jm3csstrfygp35rk0y1w3rfc
  prompt: '<documents> <document index="1"> <source>huggingface_hub/__init__.py</source>
    <document_content> # Copyright 2020 The...'
  usage:
    input: 480216
    output: 1791

llm models -q gemini -q exp

Both llm models and llm embed-models now take multiple -q search fragments. You can now search for all models matching "gemini" and "exp" using llm models -q gemini -q exp. #748

I have over 100 models installed in LLM now across a bunch of different plugins. I added the -q option to help search through them a few months ago, and now I've upgraded it so you can pass it multiple times.

Want to see all the Gemini experimental models?

llm models -q gemini -q exp

Outputs:

GeminiPro: gemini-exp-1114
GeminiPro: gemini-exp-1121
GeminiPro: gemini-exp-1206
GeminiPro: gemini-2.0-flash-exp
GeminiPro: learnlm-1.5-pro-experimental
GeminiPro: gemini-2.0-flash-thinking-exp-1219
GeminiPro: gemini-2.0-flash-thinking-exp-01-21
GeminiPro: gemini-2.0-pro-exp-02-05 (aliases: g2)

For consistency I added the same options to the llm embed-models command, which lists available embedding models.

llm embed-multi --prepend X

New llm embed-multi --prepend X option for prepending a string to each value before it is embedded - useful for models such as nomic-embed-text-v2-moe that require passages to start with a string like "search_document: ". #745

This was inspired by my initial experiments with Nomic Embed Text V2 last week.

Everything else

The response.json() and response.usage() methods are now documented.

Someone asked a question about these methods online, which made me realize they weren't documented. I enjoy promptly turning questions like this into documentation!

Fixed a bug where conversations that were loaded from the database could not be continued using asyncio prompts. #742

This bug was reported by Romain Gehrig. It turned out not to be possible to execute a follow-up prompt in async mode if the previous conversation had been loaded from the database.

% llm 'hi' --async
Hello! How can I assist you today?
% llm 'now in french' --async -c
Error: 'async for' requires an object with __aiter__ method, got Response

I fixed the bug for the moment, but I'd like to make the whole mechanism of persisting and loading conversations from SQLite part of the documented and supported Python API - it's currently tucked away in CLI-specific internals which aren't safe for people to use in their own code.

New plugin for macOS users: llm-mlx, which provides extremely high performance access to a wide range of local models using Apple's MLX framework.

Technically not a part of the LLM 0.22 release, but I like using the release notes to help highlight significant new plugins and llm-mlx is fast coming my new favorite way to run models on my own machine.

The llm-claude-3 plugin has been renamed to llm-anthropic.

I wrote about this previously when I announced llm-anthropic. The new name prepares me for a world in which Anthropic release models that aren't called Claude 3 or Claude 3.5!

Tags: cli, projects, ai, annotated-release-notes, openai, generative-ai, chatgpt, llms, llm, anthropic, gemini

shot-scraper 1.6 with support for HTTP Archives

2025-02-13T21:02:37+00:00

shot-scraper 1.6 with support for HTTP Archives

New release of my shot-scraper CLI tool for taking screenshots and scraping web pages.

The big new feature is HTTP Archive (HAR) support. The new shot-scraper har command can now create an archive of a page and all of its dependents like this:

shot-scraper har https://datasette.io/

This produces a datasette-io.har file (currently 163KB) which is JSON representing the full set of requests used to render that page. Here's a copy of that file. You can visualize that here using ericduran.github.io/chromeHAR.

That JSON includes full copies of all of the responses, base64 encoded if they are binary files such as images.

You can add the --zip flag to instead get a datasette-io.har.zip file, containing JSON data in har.har but with the response bodies saved as separate files in that archive.

The shot-scraper multi command lets you run shot-scraper against multiple URLs in sequence, specified using a YAML file. That command now takes a --har option (or --har-zip or --har-file name-of-file), described in the documentation, which will produce a HAR at the same time as taking the screenshots.

Shots are usually defined in YAML that looks like this:

- output: example.com.png
  url: http://www.example.com/
- output: w3c.org.png
  url: https://www.w3.org/

You can now omit the output: keys and generate a HAR file without taking any screenshots at all:

- url: http://www.example.com/
- url: https://www.w3.org/

Run like this:

shot-scraper multi shots.yml --har

Which outputs:

Skipping screenshot of 'https://www.example.com/'
Skipping screenshot of 'https://www.w3.org/'
Wrote to HAR file: trace.har

shot-scraper is built on top of Playwright, and the new features use the browser.new_context(record_har_path=...) parameter.

Tags: cli, projects, python, scraping, playwright, shot-scraper

LLM 0.20

2025-01-23T04:55:16+00:00

LLM 0.20

New release of my LLM CLI tool and Python library. A bunch of accumulated fixes and features since the start of December, most notably:

Support for OpenAI's o1 model - a significant upgrade from o1-preview given its 200,000 input and 100,000 output tokens (o1-preview was 128,000/32,768). #676
Support for the gpt-4o-audio-preview and gpt-4o-mini-audio-preview models, which can accept audio input: llm -m gpt-4o-audio-preview -a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 #677
A new llm -x/--extract option which extracts and returns the contents of the first fenced code block in the response. This is useful for prompts that generate code. #681
A new llm models -q 'search' option for searching available models - useful if you've installed a lot of plugins. Searches are case insensitive. #700

Tags: cli, projects, ai, annotated-release-notes, openai, generative-ai, llms, llm, o1

Building Python tools with a one-shot prompt using uv run and Claude Projects

2024-12-19T07:00:37+00:00

I've written a lot about how I've been using Claude to build one-shot HTML+JavaScript applications via Claude Artifacts. I recently started using a similar pattern to create one-shot Python utilities, using a custom Claude Project combined with the dependency management capabilities of uv.

(In LLM jargon a "one-shot" prompt is a prompt that produces the complete desired result on the first attempt. Confusingly it also sometimes means a prompt that includes a single example of the desired output format. Here I'm using the first of those two definitions.)

I'll start with an example of a tool I built that way.

I had another round of battle with Amazon S3 today trying to figure out why a file in one of my buckets couldn't be accessed via a public URL.

Out of frustration I prompted Claude with a variant of the following (full transcript here):

I can't access the file at EXAMPLE_S3_URL. Write me a Python CLI tool using Click and boto3 which takes a URL of that form and then uses EVERY single boto3 trick in the book to try and debug why the file is returning a 404

It wrote me this script, which gave me exactly what I needed. I ran it like this:

uv run debug_s3_access.py \
  https://test-public-bucket-simonw.s3.us-east-1.amazonaws.com/0f550b7b28264d7ea2b3d360e3381a95.jpg

You can see the text output here.

Inline dependencies and uv run

Crucially, I didn't have to take any extra steps to install any of the dependencies that the script needed. That's because the script starts with this magic comment:

# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "click",
#     "boto3",
#     "urllib3",
#     "rich",
# ]
# ///

This is an example of inline script dependencies, a feature described in PEP 723 and implemented by uv run. Running the script causes uv to create a temporary virtual environment with those dependencies installed, a process that takes just a few milliseconds once the uv cache has been populated.

This even works if the script is specified by a URL! Anyone with uv installed can run the following command (provided you trust me not to have replaced the script with something malicious) to debug one of their own S3 buckets:

uv run http://tools.simonwillison.net/python/debug_s3_access.py \
  https://test-public-bucket-simonw.s3.us-east-1.amazonaws.com/0f550b7b28264d7ea2b3d360e3381a95.jpg

Writing these with the help of a Claude Project

The reason I can one-shot scripts like this now is that I've set up a Claude Project called "Python app". Projects can have custom instructions, and I used those to "teach" Claude how to take advantage of inline script dependencies:

You write Python tools as single files. They always start with this comment:
# /// script
# requires-python = ">=3.12"
# ///
These files can include dependencies on libraries such as Click. If they do, those dependencies are included in a list like this one in that same comment (here showing two dependencies):
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "click",
#     "sqlite-utils",
# ]
# ///

That's everything Claude needs to reliably knock out full-featured Python tools as single scripts which can be run directly using whatever dependencies Claude chose to include.

I didn't suggest that Claude use rich for the debug_s3_access.py script earlier but it decided to use it anyway!

I've only recently started experimenting with this pattern but it seems to work really well. Here's another example - my prompt was:

Starlette web app that provides an API where you pass in ?url= and it strips all HTML tags and returns just the text, using beautifulsoup

Here's the chat transcript and the raw code it produced. You can run that server directly on your machine (it uses port 8000) like this:

uv run https://gist.githubusercontent.com/simonw/08957a1490ebde1ea38b4a8374989cf8/raw/143ee24dc65ca109b094b72e8b8c494369e763d6/strip_html.py

Then visit http://127.0.0.1:8000/?url=https://simonwillison.net/ to see it in action.

Custom instructions

The pattern here that's most interesting to me is using custom instructions or system prompts to show LLMs how to implement new patterns that may not exist in their training data. uv run is less than a year old, but providing just a short example is enough to get the models to write code that takes advantage of its capabilities.

I have a similar set of custom instructions I use for creating single page HTML and JavaScript tools, again running in a Claude Project:

Never use React in artifacts - always plain HTML and vanilla JavaScript and CSS with minimal dependencies.

CSS should be indented with two spaces and should start like this:
<style>
* {
  box-sizing: border-box;
}
Inputs and textareas should be font size 16px. Font should always prefer Helvetica.

JavaScript should be two space indents and start like this:
<script type="module">
// code in here should not be indented at the first level

Most of the tools on my tools.simonwillison.net site were created using versions of this custom instructions prompt.

Tags: aws, cli, python, s3, ai, prompt-engineering, generative-ai, llms, ai-assisted-programming, claude, claude-artifacts, uv, rich, prompt-to-app, starlette

"Rules" that terminal programs follow

2024-12-12T20:37:07+00:00

"Rules" that terminal programs follow

Julia Evans wrote down the unwritten rules of terminal programs. Lots of details in here I hadn’t fully understood before, like REPL programs that exit only if you hit Ctrl+D on an empty line.

Tags: cli, julia-evans

LLM 0.19

2024-12-01T23:59:45+00:00

LLM 0.19

I just released version 0.19 of LLM, my Python library and CLI utility for working with Large Language Models.

I released 0.18 a couple of weeks ago adding support for calling models from Python asyncio code. 0.19 improves on that, and also adds a new mechanism for models to report their token usage.

LLM can log those usage numbers to a SQLite database, or make then available to custom Python code.

My eventual goal with these features is to implement token accounting as a Datasette plugin so I can offer AI features in my SaaS platform without worrying about customers spending unlimited LLM tokens.

Those 0.19 release notes in full:

Tokens used by a response are now logged to new input_tokens and output_tokens integer columns and a token_details JSON string column, for the default OpenAI models and models from other plugins that implement this feature. #610

llm prompt now takes a -u/--usage flag to display token usage at the end of the response.

llm logs -u/--usage shows token usage information for logged responses.

llm prompt ... --async responses are now logged to the database. #641

llm.get_models() and llm.get_async_models() functions, documented here. #640

response.usage() and async response await response.usage() methods, returning a Usage(input=2, output=1, details=None) dataclass. #644

response.on_done(callback) and await response.on_done(callback) methods for specifying a callback to be executed when a response has completed, documented here. #653

Fix for bug running llm chat on Windows 11. Thanks, Sukhbinder Singh. #495

I also released three new plugin versions that add support for the new usage tracking feature: llm-gemini 0.5, llm-claude-3 0.10 and llm-mistral 0.9.

Tags: cli, projects, releasenotes, releases, ai, generative-ai, llms, llm

Ask questions of SQLite databases and CSV/JSON files in your terminal

2024-11-25T01:33:03+00:00

I built a new plugin for my sqlite-utils CLI tool that lets you ask human-language questions directly of SQLite databases and CSV/JSON files on your computer.

It's called sqlite-utils-ask. Here's how you install it:

sqlite-utils install sqlite-utils-ask

It picks up API keys from an OPENAI_API_KEY environment variable, or you can install LLM and use llm keys set openai to store a key in a configuration file.

Then you can use it like this:

curl -O https://datasette.io/content.db
sqlite-utils ask content.db "how many sqlite-utils pypi downloads in 2024?"

This command will extract the SQL schema for the provided database file, send that through an LLM along with your question, get back a SQL query and attempt to run it to derive a result.

If all goes well it spits out an answer something like this:

SELECT SUM(downloads)
FROM stats
WHERE package = 'sqlite-utils' AND date >= '2024-01-01' AND date < '2025-01-01';

[
    {
        "SUM(downloads)": 4300221
    }
]

If the SQL query fails to execute (due to a syntax error of some kind) it passes that error back to the model for corrections and retries up to three times before giving up.

Add -v/--verbose to see the exact prompt it's using:

System prompt:
You will be given a SQLite schema followed by a question. Generate a single SQL
query to answer that question. Return that query in a ```sql ... ```
fenced code block.

Example: How many repos are there?
Answer:
```sql
select count(*) from repos
```

Prompt:
...
CREATE TABLE [stats] (
   [package] TEXT,
   [date] TEXT,
   [downloads] INTEGER,
   PRIMARY KEY ([package], [date])
);
...
how many sqlite-utils pypi downloads in 2024?

I've truncated the above to just the relevant table - it actually includes the full schema of every table in that database.

By default, the tool sends just that database schema and your question to the LLM. If you add the -e/--examples option it will also include five common values for each of the text columns in that schema with an average length less than 32 characters. This can sometimes help get a better result, for example sending values "CA" and "FL" and "TX" for a state column can tip the model of that it should use state abbreviations rather than full names in its queries.

Asking questions of CSV and JSON data

The core sqlite-utils CLI usually works against SQLite files directly, but three years ago I added the ability to run SQL queries against CSV and JSON files directly with the sqlite-utils memory command. This works by loading that data into an in-memory SQLite database before executing a SQL query.

I decided to reuse that mechanism to enable LLM prompts against CSV and JSON data directly as well.

The sqlite-utils ask-files command looks like this:

sqlite-utils ask-files transactions.csv "total sales by year"

This command accepts one or more files, and you can provide a mix of CSV, TSV and JSON. Each provided file will be imported into a different table, allowing the model to construct join queries where necessary.

Implementation notes

The core of the plugin is implemented as around 250 lines of Python, using the sqlite-utils register_commands() plugin hook to add the ask and ask-files commands.

It adds LLM as a dependency, and takes advantage of LLM's Python API to abstract over the details of talking to the models. This means sqlite-utils-ask can use any of the models supported by LLM or its plugins - if you want to run your prompt through Claude 3.5 Sonnet you can do this:

sqlite-utils install llm-claude-3
sqlite-utils ask content.db "count rows in news table" -m claude-3.5-sonnet

The plugin defaults to gpt-4o-mini initially to take advantage of that model's automatic prompt caching: if you run multiple questions against the same schema you'll end up sending the same lengthy prompt prefix multiple times, and OpenAI's prompt caching should automatically kick in and provide a 50% discount on those input tokens.

Then I ran the actual numbers and found that gpt-4o-mini is cheap enough that even without caching a 4,000 token prompt (that's a pretty large SQL schema) should cost less than a tenth of a cent. So those caching savings aren't worth anything at all!

Tags: cli, plugins, projects, sqlite, ai, sqlite-utils, generative-ai, llms, ai-assisted-programming, llm

Docling

2024-11-03T04:57:56+00:00

Docling

MIT licensed document extraction Python library from the Deep Search team at IBM, who released Docling v2 on October 16th.

Here's the Docling Technical Report paper from August, which provides details of two custom models: a layout analysis model for figuring out the structure of the document (sections, figures, text, tables etc) and a TableFormer model specifically for extracting structured data from tables.

Those models are available on Hugging Face.

Here's how to try out the Docling CLI interface using uvx (avoiding the need to install it first - though since it downloads models it will take a while to run the first time):

uvx docling mydoc.pdf --to json --to md

This will output a mydoc.json file with complex layout information and a mydoc.md Markdown file which includes Markdown tables where appropriate.

The Python API is a lot more comprehensive. It can even extract tables as Pandas DataFrames:

from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert("document.pdf")
for table in result.document.tables:
    df = table.export_to_dataframe()
    print(df)

I ran that inside uv run --with docling python. It took a little while to run, but it demonstrated that the library works.

Tags: cli, ibm, ocr, pdf, python, ai, hugging-face, uv

You can now run prompts against images, audio and video in your terminal using LLM

2024-10-29T15:09:38+00:00

I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama, Claude and Gemini.

The signature feature of 0.17 is that LLM can now be used to prompt multi-modal models - which means you can now use it to send images, audio and video files to LLMs that can handle them.

Processing an image with gpt-4o-mini

Here's an example. First, install LLM - using brew install llm or pipx install llm or uv tool install llm, pick your favourite. If you have it installed already you made need to upgrade to 0.17, e.g. with brew upgrade llm.

Obtain an OpenAI key (or an alternative, see below) and provide it to the tool:

llm keys set openai
# paste key here

And now you can start running prompts against images.

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg

The -a option stands for --attachment. Attachments can be specified as URLs, as paths to files on disk or as - to read from data piped into the tool.

The above example uses the default model, gpt-4o-mini. I got back this:

The image features a brown pelican standing on rocky terrain near a body of water. The pelican has a distinct coloration, with dark feathers on its body and a lighter-colored head. Its long bill is characteristic of the species, and it appears to be looking out towards the water. In the background, there are boats, suggesting a marina or coastal area. The lighting indicates it may be a sunny day, enhancing the scene's natural beauty.

Here's that image:

You can run llm logs --json -c for a hint of how much that cost:

      "usage": {
        "completion_tokens": 89,
        "prompt_tokens": 14177,
        "total_tokens": 14266,

Using my LLM pricing calculator that came to 0.218 cents - less than a quarter of a cent.

Let's run that again with gpt-4o. Add -m gpt-4o to specify the model:

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg \
  -m gpt-4o

The image shows a pelican standing on rocks near a body of water. The bird has a large, long bill and predominantly gray feathers with a lighter head and neck. In the background, there is a docked boat, giving the impression of a marina or harbor setting. The lighting suggests it might be sunny, highlighting the pelican's features.

That time it cost 435 prompt tokens (GPT-4o mini charges higher tokens per image than GPT-4o) and the total was 0.1787 cents.

Using a plugin to run audio and video against Gemini

Models in LLM are defined by plugins. The application ships with a default OpenAI plugin to get people started, but there are dozens of other plugins providing access to different models, including models that can run directly on your own device.

Plugins need to be upgraded to add support for multi-modal input - here's documentation on how to do that. I've shipped three plugins with support for multi-modal attachments so far: llm-gemini, llm-claude-3 and llm-mistral (for Pixtral).

So far these are all remote API plugins. It's definitely possible to build a plugin that runs attachments through local models but I haven't got one of those into good enough condition to release just yet.

The Google Gemini series are my favourite multi-modal models right now due to the size and breadth of content they support. Gemini models can handle images, audio and video!

Let's try that out. Start by installing llm-gemini:

llm install llm-gemini

Obtain a Gemini API key. These include a free tier, so you can get started without needing to spend any money. Paste that in here:

llm keys set gemini
# paste key here

The three Gemini 1.5 models are called Pro, Flash and Flash-8B. Let's try it with Pro:

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg \
  -m gemini-1.5-pro-latest

A brown pelican stands on a rocky surface, likely a jetty or breakwater, with blurred boats in the background. The pelican is facing right, and its long beak curves downwards. Its plumage is primarily grayish-brown, with lighter feathers on its neck and breast. [...]

Very detailed!

But let's do something a bit more interesting. I shared a 7m40s MP3 of a NotebookLM podcast a few weeks ago. Let's use Flash-8B - the cheapest Gemini model - to try and obtain a transcript.

llm 'transcript' \
  -a https://static.simonwillison.net/static/2024/video-scraping-pelicans.mp3 \
  -m gemini-1.5-flash-8b-latest

It worked!

Hey everyone, welcome back. You ever find yourself wading through mountains of data, trying to pluck out the juicy bits? It's like hunting for a single shrimp in a whole kelp forest, am I right? Oh, tell me about it. I swear, sometimes I feel like I'm gonna go cross-eyed from staring at spreadsheets all day. [...]

Full output here.

Once again, llm logs -c --json will show us the tokens used. Here it's 14754 prompt tokens and 1865 completion tokens. The pricing calculator says that adds up to... 0.0833 cents. Less than a tenth of a cent to transcribe a 7m40s audio clip.

There's a Python API too

Here's what it looks like to execute multi-modal prompts with attachments using the LLM Python library:

import llm

model = llm.get_model("gpt-4o-mini")
response = model.prompt(
    "Describe these images",
    attachments=[
        llm.Attachment(path="pelican.jpg"),
        llm.Attachment(
            url="https://static.simonwillison.net/static/2024/pelicans.jpg"
        ),
    ]
)

You can send multiple attachments with a single prompt, and both file paths and URLs are supported - or even binary content, using llm.Attachment(content=b'binary goes here').

Any model plugin becomes available to Python with the same interface, making this LLM library a useful abstraction layer to try out the same prompts against many different models, both local and remote.

What can we do with this?

I've only had this working for a couple of days and the potential applications are somewhat dizzying. It's trivial to spin up a Bash script that can do things like generate alt= text for every image in a directory, for example. Here's one Claude wrote just now:

#!/bin/bash
for img in *.{jpg,jpeg}; do
    if [ -f "$img" ]; then
        output="${img%.*}.txt"
        llm -m gpt-4o-mini 'return just the alt text for this image' "$img" > "$output"
    fi
done

On the #llm Discord channel Drew Breunig suggested this one-liner:

llm prompt -m gpt-4o "
tell me if it's foggy in this image, reply on a scale from
1-10 with 10 being so foggy you can't see anything and 1
being clear enough to see the hills in the distance.
Only respond with a single number." \
  -a https://cameras.alertcalifornia.org/public-camera-data/Axis-Purisma1/latest-frame.jpg

That URL is to a live webcam feed, so here's an instant GPT-4o vision powered weather report!

We can have so much fun with this stuff.

All of the usual AI caveats apply: it can make mistakes, it can hallucinate, safety filters may kick in and refuse to transcribe audio based on the content. A lot of work is needed to evaluate how well the models perform at different tasks. There's a lot still to explore here.

But at 1/10th of a cent for 7 minutes of audio at least those explorations can be plentiful and inexpensive!

Update 12th November 2024: If you want to try running prompts against images using a local model that runs on your own machine you can now do so using Ollama, llm-ollama and Llama 3.2 Vision.

Tags: cli, projects, ai, openai, generative-ai, local-llms, llms, llm, anthropic, claude, mistral, gemini, vision-llms, llm-pricing

python-imgcat

2024-10-28T05:13:30+00:00

python-imgcat

I was investigating options for displaying images in a terminal window (for multi-modal logging output of LLM) and I found this neat Python library for displaying images using iTerm 2.

It includes a CLI tool, which means you can run it without installation using uvx like this:

uvx imgcat filename.png

Via rich/discussions

Tags: cli, python, llm, uv

Run a prompt to generate and execute jq programs using llm-jq

2024-10-27T04:26:36+00:00

llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq command along with a human-language description of how you'd like to manipulate that JSON and have a jq program generated and executed for you on the fly.

Thomas Ptacek on Twitter:

The JQ CLI should just BE a ChatGPT client, so there's no pretense of actually understanding this syntax. Cut out the middleman, just look up what I'm trying to do, for me.

I couldn't resist writing a plugin. Here's an example of llm-jq in action:

llm install llm-jq
curl -s https://api.github.com/repos/simonw/datasette/issues | \
  llm jq 'count by user login, top 3'

This outputs the following:

[
  {
    "login": "simonw",
    "count": 11
  },
  {
    "login": "king7532",
    "count": 5
  },
  {
    "login": "dependabot[bot]",
    "count": 2
  }
]
group_by(.user.login) | map({login: .[0].user.login, count: length}) | sort_by(-.count) | .[0:3]

The JSON result is sent to standard output, the jq program it generated and executed is sent to standard error. Add the -s/--silent option to tell it not to output the program, or the -v/--verbose option for verbose output that shows the prompt it sent to the LLM as well.

Under the hood it passes the first 1024 bytes of the JSON piped to it plus the program description "count by user login, top 3" to the default LLM model (usually gpt-4o-mini unless you set another with e.g. llm models default claude-3.5-sonnet) and system prompt. It then runs jq in a subprocess and pipes in the full JSON that was passed to it.

Here's the system prompt it uses, adapted from my llm-cmd plugin:

Based on the example JSON snippet and the desired query, write a jq program

Return only the jq program to be executed as a raw string, no string delimiters wrapping it, no yapping, no markdown, no fenced code blocks, what you return will be passed to subprocess.check_output('jq', [...]) directly. For example, if the user asks: extract the name of the first person You return only: .people[0].name

I used Claude to figure out how to pipe content from the parent process to the child and detect and return the correct exit code.

Tags: cli, plugins, projects, thomas-ptacek, ai, jq, prompt-engineering, generative-ai, llms, ai-assisted-programming, llm

TIL: Using uv to develop Python command-line applications

2024-10-24T05:56:21+00:00

TIL: Using uv to develop Python command-line applications

I've been increasingly using uv to try out new software (via uvx) and experiment with new ideas, but I hadn't quite figured out the right way to use it for developing my own projects.

It turns out I was missing a few things - in particular the fact that there's no need to use uv pip at all when working with a local development environment, you can get by entirely on uv run (and maybe uv sync --extra test to install test dependencies) with no direct invocations of uv pip at all.

I bounced a few questions off Charlie Marsh and filled in the missing gaps - this TIL shows my new uv-powered process for hacking on Python CLI apps built using Click and my simonw/click-app cookecutter template.

Tags: cli, packaging, pip, python, til, cookiecutter, uv, astral, charlie-marsh

files-to-prompt 0.3

2024-09-09T05:57:35+00:00

files-to-prompt 0.3

New version of my files-to-prompt CLI tool for turning a bunch of files into a prompt suitable for piping to an LLM, described here previously.

It now has a -c/--cxml flag for outputting the files in Claude XML-ish notation (XML-ish because it's not actually valid XML) using the format Anthropic describe as recommended for long context:

files-to-prompt llm-*/README.md --cxml | llm -m claude-3.5-sonnet \
  --system 'return an HTML page about these plugins with usage examples' \
  > /tmp/fancy.html

Here's what that gave me.

The format itself looks something like this:

<documents>
<document index="1">
<source>llm-anyscale-endpoints/README.md</source>
<document_content>
# llm-anyscale-endpoints
...
</document_content>
</document>
</documents>

Tags: cli, projects, tools, ai, prompt-engineering, generative-ai, llms, anthropic, claude, files-to-prompt

LLM 0.15

2024-07-18T19:44:24+00:00

LLM 0.15

A new release of my LLM CLI tool for interacting with Large Language Models from the terminal (see this recent talk for plenty of demos).

This release adds support for the brand new GPT-4o mini:

llm -m gpt-4o-mini "rave about pelicans in Spanish"

It also sets that model as the default used by the tool if no other model is specified. This replaces GPT-3.5 Turbo, the default since the first release of LLM. 4o-mini is both cheaper and way more capable than 3.5 Turbo.

Tags: cli, projects, ai, openai, generative-ai, llms, llm