Simon Willison's Weblog: pytest

Tips for getting coding agents to write good Python tests

2026-01-26T23:55:29+00:00

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

Tags: hacker-news, python, testing, ai, pytest, generative-ai, llms, coding-agents

TIL: Subtests in pytest 9.0.0+

2025-12-05T06:03:29+00:00

TIL: Subtests in pytest 9.0.0+

I spotted an interesting new feature in the release notes for pytest 9.0.0: subtests.

I'm a big user of the pytest.mark.parametrize decorator - see Documentation unit tests from 2018 - so I thought it would be interesting to try out subtests and see if they're a useful alternative.

Short version: this parameterized test:

@pytest.mark.parametrize("setting", app.SETTINGS)
def test_settings_are_documented(settings_headings, setting):
    assert setting.name in settings_headings

Becomes this using subtests instead:

def test_settings_are_documented(settings_headings, subtests):
    for setting in app.SETTINGS:
        with subtests.test(setting=setting.name):
            assert setting.name in settings_headings

Why is this better? Two reasons:

It appears to run a bit faster
Subtests can be created programatically after running some setup code first

I had Claude Code port several tests to the new pattern. I like it.

Tags: python, testing, ai, pytest, til, generative-ai, llms, ai-assisted-programming, coding-agents, claude-code

Setting up a codebase for working with coding agents

2025-10-25T18:42:24+00:00

Someone on Hacker News asked for tips on setting up a codebase to be more productive with AI coding tools. Here's my reply:

Good automated tests which the coding agent can run. I love pytest for this - one of my projects has 1500 tests and Claude Code is really good at selectively executing just tests relevant to the change it is making, and then running the whole suite at the end.
Give them the ability to interactively test the code they are writing too. Notes on how to start a development server (for web projects) are useful, then you can have them use Playwright or curl to try things out.
I'm having great results from maintaining a GitHub issues collection for projects and pasting URLs to issues directly into Claude Code.
I actually don't think documentation is too important: LLMs can read the code a lot faster than you to figure out how to use it. I have comprehensive documentation across all of my projects but I don't think it's that helpful for the coding agents, though they are good at helping me spot if it needs updating.
Linters, type checkers, auto-formatters - give coding agents helpful tools to run and they'll use them.

For the most part anything that makes a codebase easier for humans to maintain turns out to help agents as well.

Update: Thought of another one: detailed error messages! If a manual or automated test fails the more information you can return back to the model the better, and stuffing extra data in the error message or assertion is a very inexpensive way to do that.

Tags: hacker-news, ai, pytest, generative-ai, llms, ai-assisted-programming, coding-agents

TIL: Testing different Python versions with uv with-editable and uv-test

2025-10-09T03:37:06+00:00

TIL: Testing different Python versions with uv with-editable and uv-test

While tinkering with upgrading various projects to handle Python 3.14 I finally figured out a universal uv recipe for running the tests for the current project in any specified version of Python:

uv run --python 3.14 --isolated --with-editable '.[test]' pytest

This should work in any directory with a pyproject.toml (or even a setup.py) that defines a test set of extra dependencies and uses pytest.

The --with-editable '.[test]' bit ensures that changes you make to that directory will be picked up by future test runs. The --isolated flag ensures no other environments will affect your test run.

I like this pattern so much I built a little shell script that uses it, shown here. Now I can change to any Python project directory and run:

uv-test

Or for a different Python version:

uv-test -p 3.11

I can pass additional pytest options too:

uv-test -p 3.11 -k permissions

Tags: python, testing, pytest, til, uv

Making PyPI's test suite 81% faster

2025-05-01T21:32:18+00:00

Making PyPI's test suite 81% faster

Fantastic collection of tips from Alexis Challande on speeding up a Python CI workflow.

I've used pytest-xdist to run tests in parallel (across multiple cores) before, but the following tips were new to me:

COVERAGE_CORE=sysmon pytest --cov=myproject tells coverage.py on Python 3.12 and higher to use the new sys.monitoring mechanism, which knocked their test execution time down from 58s to 27s.
Setting testpaths = ["tests/"] in pytest.ini lets pytest skip scanning other folders when trying to find tests.
python -X importtime ... shows a trace of exactly how long every package took to import. I could have done with this last week when I was trying to debug slow LLM startup time which turned out to be caused be heavy imports.

Via lobste.rs

Tags: performance, pypi, python, pytest

Smoke test your Django admin site

2025-03-13T15:02:09+00:00

Smoke test your Django admin site

Justin Duke demonstrates a neat pattern for running simple tests against your internal Django admin site: introspect every admin route via django.urls.get_resolver() and loop through them with @pytest.mark.parametrize to check they all return a 200 HTTP status code.

This catches simple mistakes with the admin configuration that trigger exceptions that might otherwise go undetected.

I rarely write automated tests against my own admin sites and often feel guilty about it. I wrote some notes on testing it with pytest-django fixtures a few years ago.

Tags: django, django-admin, python, testing, pytest

[red-knot] type inference/checking test framework

2024-10-16T20:43:55+00:00

[red-knot] type inference/checking test framework

Ruff maintainer Carl Meyer recently landed an interesting new design for a testing framework. It's based on Markdown, and could be described as a form of "literate testing" - the testing equivalent of Donald Knuth's literate programming.

A markdown test file is a suite of tests, each test can contain one or more Python files, with optionally specified path/name. The test writes all files to an in-memory file system, runs red-knot, and matches the resulting diagnostics against Type: and Error: assertions embedded in the Python source as comments.

Test suites are Markdown documents with embedded fenced blocks that look like this:

```py
reveal_type(1.0) # revealed: float
```

Tests can optionally include a path= specifier, which can provide neater messages when reporting test failures:

```py path=branches_unify_to_non_union_type.py
def could_raise_returns_str() -> str:
    return 'foo'
...
```

A larger example test suite can be browsed in the red_knot_python_semantic/resources/mdtest directory.

This document on control flow for exception handlers (from this PR) is the best example I've found of detailed prose documentation to accompany the tests.

The system is implemented in Rust, but it's easy to imagine an alternative version of this idea written in Python as a pytest plugin. This feels like an evolution of the old Python doctest idea, except that tests are embedded directly in Markdown rather than being embedded in Python code docstrings.

... and it looks like such plugins exist already. Here are two that I've found so far:

pytest-markdown-docs by Elias Freider and Modal Labs.
sphinx.ext.doctest is a core Sphinx extension for running test snippets in documentation.
pytest-doctestplus from the Scientific Python community, first released in 2011.

I tried pytest-markdown-docs by creating a doc.md file like this:

# Hello test doc

```py
assert 1 + 2 == 3
```

But this fails:

```py
assert 1 + 2 == 4
```

And then running it with uvx like this:

uvx --with pytest-markdown-docs pytest --markdown-docs

I got one pass and one fail:

_______ docstring for /private/tmp/doc.md __________
Error in code block:
```
10   assert 1 + 2 == 4
11   
```
Traceback (most recent call last):
  File "/private/tmp/tt/doc.md", line 10, in <module>
    assert 1 + 2 == 4
AssertionError

============= short test summary info ==============
FAILED doc.md::/private/tmp/doc.md
=========== 1 failed, 1 passed in 0.02s ============

I also just learned that the venerable Python doctest standard library module has the ability to run tests in documentation files too, with doctest.testfile("example.txt"): "The file content is treated as if it were a single giant docstring; the file doesn’t need to contain a Python program!"

Via Charlie Marsh

Tags: python, testing, markdown, rust, pytest, ruff, uv, astral, donald-knuth

An LLM TDD loop

2024-10-13T19:37:47+00:00

An LLM TDD loop

Super neat demo by David Winterbottom, who wrapped my LLM and files-to-prompt tools in a short Bash script that can be fed a file full of Python unit tests and an empty implementation file and will then iterate on that file in a loop until the tests pass.

Via @codeinthehole

Tags: python, testing, ai, pytest, generative-ai, llms, ai-assisted-programming, llm, files-to-prompt

Python Developers Survey 2023 Results

2024-09-03T02:47:45+00:00

Python Developers Survey 2023 Results

The seventh annual Python survey is out. Here are the things that caught my eye or that I found surprising:

25% of survey respondents had been programming in Python for less than a year, and 33% had less than a year of professional experience.

37% of Python developers reported contributing to open-source projects last year - a new question for the survey. This is delightfully high!

6% of users are still using Python 2. The survey notes:

Almost half of Python 2 holdouts are under 21 years old and a third are students. Perhaps courses are still using Python 2?

In web frameworks, Flask and Django neck and neck at 33% each, but FastAPI is a close third at 29%! Starlette is at 6%, but that's an under-count because it's the basis for FastAPI.

The most popular library in "other framework and libraries" was BeautifulSoup with 31%, then Pillow 28%, then OpenCV-Python at 22% (wow!) and Pydantic at 22%. Tkinter had 17%. These numbers are all a surprise to me.

pytest scores 52% for unit testing, unittest from the standard library just 25%. I'm glad to see pytest so widely used, it's my favourite testing tool across any programming language.

The top cloud providers are AWS, then Google Cloud Platform, then Azure... but PythonAnywhere (11%) took fourth place just ahead of DigitalOcean (10%). And Alibaba Cloud is a new entrant in sixth place (after Heroku) with 4%. Heroku's ending of its free plan dropped them from 14% in 2021 to 7% now.

Linux and Windows equal at 55%, macOS is at 29%. This was one of many multiple-choice questions that could add up to more than 100%.

In databases, SQLite usage was trending down - 38% in 2021 to 34% for 2023, but still in second place behind PostgreSQL, stable at 43%.

The survey incorporates quotes from different Python experts responding to the numbers, it's worth reading through the whole thing.

Via PSF news

Tags: open-source, postgresql, python, sqlite, surveys, pytest, psf, pydantic, starlette

Upgrading my cookiecutter templates to use python -m pytest

2024-08-17T05:12:47+00:00

Upgrading my cookiecutter templates to use python -m pytest

Every now and then I get caught out by weird test failures when I run pytest and it turns out I'm running the wrong installation of that tool, so my tests fail because that pytest is executing in a different virtual environment from the one needed by the tests.

The fix for this is easy: run python -m pytest instead, which guarantees that you will run pytest in the same environment as your currently active Python.

Yesterday I went through and updated every one of my cookiecutter templates (python-lib, click-app, datasette-plugin, sqlite-utils-plugin, llm-plugin) to use this pattern in their READMEs and generated repositories instead, to help spread that better recipe a little bit further.

Tags: projects, python, pytest, cookiecutter

inline-snapshot

2024-04-16T16:04:25+00:00

inline-snapshot

I'm a big fan of snapshot testing, where expected values are captured the first time a test suite runs and then asserted against in future runs. It's a very productive way to build a robust test suite.

inline-snapshot by Frank Hoffmann is a particularly neat implementation of the pattern. It defines a snapshot() function which you can use in your tests:

assert 1548 * 18489 == snapshot()

When you run that test using pytest --inline-snapshot=create the snapshot() function will be replaced in your code (using AST manipulation) with itself wrapping the repr() of the expected result:

assert 1548 * 18489 == snapshot(28620972)

If you modify the code and need to update the tests you can run pytest --inline-snapshot=fix to regenerate the recorded snapshot values.

Tags: python, testing, pytest

pytest-icdiff

2023-06-03T16:59:24+00:00

pytest-icdiff

This is neat: “pip install pytest-icdiff” provides an instant usability upgrade to the output of failed tests in pytest, especially if the assertions involve comparing larger strings or nested JSON objects.

Via @hynek

Tags: python, testing, pytest

pyfakefs usage

2023-02-01T22:37:42+00:00

pyfakefs usage

New to me pytest fixture library that provides a really easy way to mock Python’s filesystem functions—open(), os.path.listdir() and so on—so a test can run against a fake set of files. This looks incredibly useful.

Via Luke Plant

Tags: luke-plant, python, testing, pytest

mitsuhiko/insta

2022-10-31T01:06:44+00:00

mitsuhiko/insta

I asked for recommendations on Twitter for testing libraries in other languages that would give me the same level of delight that I get from pytest. Two people pointed me to insta by Armin Ronacher, a Rust testing framework for “snapshot testing” which automatically records reference values to your repository, so future tests can spot if they change.

Via @david_raznick

Tags: armin-ronacher, testing, rust, pytest

Running C unit tests with pytest

2022-02-12T17:14:35+00:00

Running C unit tests with pytest

Brilliant, detailed tutorial by Gabriele Tornetta on testing C code using pytest, which also doubles up as a ctypes tutorial. There’s a lot of depth here—in addition to exercising C code through ctypes, Gabriele shows how to run each test in a separate process so that segmentation faults don’t fail the entire suite, then adds code to run the compiler as part of the pytest run, and then shows how to use gdb trickery to generate more useful stack traces.

Via Hacker News

Tags: c, ctypes, testing, pytest

How I build a feature

2022-01-12T18:10:17+00:00

I'm maintaining a lot of different projects at the moment. I thought it would be useful to describe the process I use for adding a new feature to one of them, using the new sqlite-utils create-database command as an example.

I like each feature to be represented by what I consider to be the perfect commit - one that bundles together the implementation, the tests, the documentation and a link to an external issue thread.

Update 29th October 2022: I wrote more about the perfect commit.

The sqlite-utils create-database command is very simple: it creates a new, empty SQLite database file. You use it like this:

% sqlite-utils create-database empty.db

Everything starts with an issue

Every piece of work I do has an associated issue. This acts as ongoing work-in-progress notes and lets me record decisions, reference any research, drop in code snippets and sometimes even add screenshots and video - stuff that is really helpful but doesn't necessarily fit in code comments or commit messages.

Even if it's a tiny improvement that's only a few lines of code, I'll still open an issue for it - sometimes just a few minutes before closing it again as complete.

Any commits that I create that relate to an issue reference the issue number in their commit message. GitHub does a great job of automatically linking these together, bidirectionally so I can navigate from the commit to the issue or from the issue to the commit.

Having an issue also gives me something I can link to from my release notes.

In the case of the create-database command, I opened this issue in November when I had the idea for the feature.

I didn't do the work until over a month later - but because I had designed the feature in the issue comments I could get started on the implementation really quickly.

Development environment

Being able to quickly spin up a development environment for a project is crucial. All of my projects have a section in the README or the documentation describing how to do this - here's that section for sqlite-utils.

On my own laptop each project gets a directory, and I use pipenv shell in that directory to activate a directory-specific virtual environment, then pip install -e '.[test]' to install the dependencies and test dependencies.

Automated tests

All of my features are accompanied by automated tests. This gives me the confidence to boldly make changes to the software in the future without fear of breaking any existing features.

This means that writing tests needs to be as quick and easy as possible - the less friction here the better.

The best way to make writing tests easy is to have a great testing framework in place from the very beginning of the project. My cookiecutter templates (python-lib, datasette-plugin and click-app) all configure pytest and add a tests/ folder with a single passing test, to give me something to start adding tests to.

I can't say enough good things about pytest. Before I adopted it, writing tests was a chore. Now it's an activity I genuinely look forward to!

I'm not a religious adherent to writing the tests first - see How to cheat at unit tests with pytest and Black for more thoughts on that - but I'll write the test first if it's pragmatic to do so.

In the case of create-database, writing the test first felt like the right thing to do. Here's the test I started with:

def test_create_database(tmpdir):
    db_path = tmpdir / "test.db"
    assert not db_path.exists()
    result = CliRunner().invoke(
        cli.cli, ["create-database", str(db_path)]
    )
    assert result.exit_code == 0
    assert db_path.exists()

This test uses the tmpdir pytest fixture to provide a temporary directory that will be automatically cleaned up by pytest after the test run finishes.

It checks that the test.db file doesn't exist yet, then uses the Click framework's CliRunner utility to execute the create-database command. Then it checks that the command didn't throw an error and that the file has been created.

The I run the test, and watch it fail - because I haven't built the feature yet!

% pytest -k test_create_database

============ test session starts ============
platform darwin -- Python 3.8.2, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/simon/Dropbox/Development/sqlite-utils
plugins: cov-2.12.1, hypothesis-6.14.5
collected 808 items / 807 deselected / 1 selected                           

tests/test_cli.py F                                                   [100%]

================= FAILURES ==================
___________ test_create_database ____________

tmpdir = local('/private/var/folders/wr/hn3206rs1yzgq3r49bz8nvnh0000gn/T/pytest-of-simon/pytest-659/test_create_database0')

    def test_create_database(tmpdir):
        db_path = tmpdir / "test.db"
        assert not db_path.exists()
        result = CliRunner().invoke(
            cli.cli, ["create-database", str(db_path)]
        )
>       assert result.exit_code == 0
E       assert 1 == 0
E        +  where 1 = <Result SystemExit(1)>.exit_code

tests/test_cli.py:2097: AssertionError
========== short test summary info ==========
FAILED tests/test_cli.py::test_create_database - assert 1 == 0
===== 1 failed, 807 deselected in 0.99s ====

The -k option lets me run any test that match the search string, rather than running the full test suite. I use this all the time.

Other pytest features I often use:

pytest -x: runs the entire test suite but quits at the first test that fails
pytest --lf: re-runs any tests that failed during the last test run
pytest --pdb -x: open the Python debugger at the first failed test (omit the -x to open it at every failed test). This is the main way I interact with the Python debugger. I often use this to help write the tests, since I can add assert False and get a shell inside the test to interact with various objects and figure out how to best run assertions against them.

Implementing the feature

Test in place, it's time to implement the command. I added this code to my existing cli.py module:

@cli.command(name="create-database")
@click.argument(
    "path",
    type=click.Path(file_okay=True, dir_okay=False, allow_dash=False),
    required=True,
)
def create_database(path):
    "Create a new empty database file."
    db = sqlite_utils.Database(path)
    db.vacuum()

(I happen to know that the quickest way to create an empty SQLite database file is to run VACUUM against it.)

The test now passes!

I iterated on this implementation a little bit more, to add the --enable-wal option I had designed in the issue comments - and updated the test to match. You can see the final implementation in this commit: 1d64cd2e5b402ff957f9be2d9bb490d313c73989.

If I add a new test and it passes the first time, I’m always suspicious of it. I’ll deliberately break the test (change a 1 to a 2 for example) and run it again to make sure it fails, then change it back again.

Code formatting with Black

Black has increased my productivity as a Python developer by a material amount. I used to spend a whole bunch of brain cycles agonizing over how to indent my code, where to break up long function calls and suchlike. Thanks to Black I never think about this at all - I instinctively run black . in the root of my project and accept whatever style decisions it applies for me.

Linting

I have a few linters set up to run on every commit. I can run these locally too - how to do that is documented here - but I'm often a bit lazy and leave them to run in CI.

In this case one of my linters failed! I accidentally called the new command function create_table() when it should have been called create_database(). The code worked fine due to how the cli.command(name=...) decorator works but mypy complained about the redefined function name. I fixed that in a separate commit.

Documentation

My policy these days is that if a feature isn't documented it doesn't exist. Updating existing documentation isn't much work at all if the documentation already exists, and over time these incremental improvements add up to something really comprehensive.

For smaller projects I use a single README.md which gets displayed on both GitHub and PyPI (and the Datasette website too, for example on datasette.io/tools/git-history).

My larger projects, such as Datasette and sqlite-utils, use Read the Docs and reStructuredText with Sphinx instead.

I like reStructuredText mainly because it has really good support for internal reference links - something that is missing from Markdown, though it can be enabled using MyST.

sqlite-utils uses Sphinx. I have the sphinx-autobuild extension configured, which means I can run a live reloading server with the documentation like so:

cd docs
make livehtml

Any time I'm working on the documentation I have that server running, so I can hit "save" in VS Code and see a preview in my browser a few seconds later.

For Markdown documentation I use the VS Code preview pane directly.

The moment the documentation is live online, I like to add a link to it in a comment on the issue thread.

Committing the change

I run git diff a LOT while hacking on code, to make sure I haven’t accidentally changed something unrelated. This also helps spot things like rogue print() debug statements I may have added.

Before my final commit, I sometimes even run git diff | grep print to check for those.

My goal with the commit is to bundle the test, documentation and implementation. If those are the only files I've changed I do this:

git commit -a -m "sqlite-utils create-database command, closes #348"

If this completes the work on the issue I use "closes #N", which causes GitHub to close the issue for me. If it's not yet ready to close I use "refs #N" instead.

Sometimes there will be unrelated changes in my working directory. If so, I use git add <files> and then commit just with git commit -m message.

Branches and pull requests

create-database is a good example of a feature that can be implemented in a single commit, with no need to work in a branch.

For larger features, I'll work in a feature branch:

git checkout -b my-feature

I'll make a commit (often just labelled "WIP prototype, refs #N") and then push that to GitHub and open a pull request for it:

git push -u origin my-feature

I ensure the new pull request links back to the issue in its description, then switch my ongoing commentary to comments on the pull request itself.

I'll sometimes add a task checklist to the opening comment on the pull request, since tasks there get reflected in the GitHub UI anywhere that links to the PR. Then I'll check those off as I complete them.

An example of a PR I used like this is #361: --lines and --text and --convert and --import.

I don't like merge commits - I much prefer to keep my main branch history as linear as possible. I usually merge my PRs through the GitHub web interface using the squash feature, which results in a single, clean commit to main with the combined tests, documentation and implementation. Occasionally I will see value in keeping the individual commits, in which case I will rebase merge them.

Another goal here is to keep the main branch releasable at all times. Incomplete work should stay in a branch. This makes turning around and releasing quick bug fixes a lot less stressful!

Release notes, and a release

A feature isn't truly finished until it's been released to PyPI.

All of my projects are configured the same way: they use GitHub releases to trigger a GitHub Actions workflow which publishes the new release to PyPI. The sqlite-utils workflow for that is here in publish.yml.

My cookiecutter templates for new projects set up this workflow for me. I just need to create a PyPI token for the project and assign it as a repository secret. See the python-lib cookiecutter README for details.

To push out a new release, I need to increment the version number in setup.py and write the release notes.

I use semantic versioning - a new feature is a minor version bump, a breaking change is a major version bump (I try very hard to avoid these) and a bug fix or documentation-only update is a patch increment.

Since create-database was a new feature, it went out in release 3.21.

My projects that use Sphinx for documentation have changelog.rst files in their repositories. I add the release notes there, linking to the relevant issues and cross-referencing the new documentation. Then I ship a commit that bundles the release notes with the bumped version number, with a commit message that looks like this:

git commit -m "Release 3.21

Refs #348, #364, #366, #368, #371, #372, #374, #375, #376, #379"

Here's the commit for release 3.21.

Referencing the issue numbers in the release automatically adds a note to their issue threads indicating the release that they went out in.

I generate that list of issue numbers by pasting the release notes into an Observable notebook I built for the purpose: Extract issue numbers from pasted text. Observable is really great for building this kind of tiny interactive utility.

For projects that just have a README I write the release notes in Markdown and paste them directly into the GitHub "new release" form.

I like to duplicate the release notes to GiHub releases for my Sphinx changelog projects too. This is mainly so the datasette.io website will display the release notes on its homepage, which is populated at build time using the GitHub GraphQL API.

To convert my reStructuredText to Markdown I copy and paste the rendered HTML into this brilliant Paste to Markdown tool by Euan Goddard.

A live demo

When possible, I like to have a live demo that I can link to.

This is easiest for features in Datasette core. Datesette’s main branch gets deployed automatically to latest.datasette.io so I can often link to a demo there.

For Datasette plugins, I’ll deploy a fresh instance with the plugin (e.g. this one for datasette-graphql) or (more commonly) add it to my big latest-with-plugins.datasette.io instance - which tries to demonstrate what happens to Datasette if you install dozens of plugins at once (so far it works OK).

Here’s a demo of the datasette-copyable plugin running there: https://latest-with-plugins.datasette.io/github/commits.copyable

Tell the world about it

The last step is to tell the world (beyond the people who meticulously read the release notes) about the new feature.

Depending on the size of the feature, I might do this with a tweet like this one - usually with a screenshot and a link to the documentation. I often extend this into a short Twitter thread, which gives me a chance to link to related concepts and demos or add more screenshots.

For larger or more interesting feature I'll blog about them. I may save this for my weekly weeknotes, but sometimes for particularly exciting features I'll write up a dedicated blog entry. Some examples include:

I may even assemble a full set of annotated release notes on my blog, where I quote each item from the release in turn and provide some fleshed out examples plus background information on why I built it.

If it’s a new Datasette (or Datasette-adjacent) feature, I’ll try to remember to write about it in the next edition of the Datasette Newsletter.

Finally, if I learned a new trick while building a feature I might extract that into a TIL. If I do that I'll link to the new TIL from the issue thread.

More examples of this pattern

Here are a bunch of examples of commits that implement this pattern, combining the tests, implementation and documentation into a single unit:

sqlite-utils: adding —limit and —offset to sqlite-utils rows
sqlite-utils: --where and -p options for sqlite-utils convert
s3-credentials: s3-credentials policy command
datasette: db.execute_write_script() and db.execute_write_many()
datasette: ?_nosuggest=1 parameter for table views
datasette-graphql: GraphQL execution limits: time_limit_ms and num_queries_limit

Tags: git, github, software-engineering, testing, pytest, black, read-the-docs, github-issues

PAGNIs: Probably Are Gonna Need Its

2021-07-01T19:13:58+00:00

Luke Page has a great post up with his list of YAGNI exceptions.

YAGNI - You Ain't Gonna Need It - is a rule that says you shouldn't add a feature just because it might be useful in the future - only write code when it solves a direct problem.

When should you over-ride YAGNI? When the cost of adding something later is so dramatically expensive compared with the cost of adding it early on that it's worth taking the risk. On when you know from experience that an initial investment will pay off many times over.

Lukes's exceptions to YAGNI are well chosen: things like logging, API versioning, created_at timestamps and a bias towards "store multiple X for a user" (a many-to-many relationship) if there's any inkling that the system may need to support more than one.

Because I like attempting to coin phrases, I propose we call these PAGNIs - short for Probably Are Gonna Need Its.

Here are some of mine.

A kill-switch for your mobile apps

If you're building a mobile app that talks to your API, make sure to ship a kill-switch: a mechanism by which you can cause older versions of the application to show a "you must upgrade to continue using this application" screen when the app starts up.

In an ideal world, you'll never use this ability: you'll continue to build new features to the app and make backwards-compatible changes to the API forever, such that ancient app versions keep working and new app versions get to do new things.

But... sometimes that simply isn't possible. You might discover a security hole in the design of the application or API that can only be fixed by breaking backwards-compatibility - or maybe you're still maintaining a v1 API from five years ago to support a mobile application version that's only still installed by 30 users, and you'd like to not have to maintain double the amount of API code.

You can't add a kill-switch retroactively to apps that have already been deployed!

Apparently Firebase offers this to many Android apps, but if you're writing for iOS you need to provide this yourself.

Automated deploys

Nothing kills a side project like coming back to it in six months time and having to figure out how to deploy it again. Thanks to GitHub Actions and hosting providers like Google Cloud Run, Vercel, Heroku and Netlify setting up automated deployments is way easier now than it used to be. I have enough examples now that getting automated deployments working for a new project usually only takes a few minutes, and it pays off instantly.

Continuous Integration (and a test framework)

Similar to automated deployment in that GitHub Actions (and Circle CI and Travis before it) make this much less painful to setup than it used to be.

Introducing a test framework to an existing project can be extremely painful. Introducing it at the very start is easy - and it sets a precedent that code should be tested from day one.

These days I'm all about pytest, and I have various cookiecutter templates (datasette-plugin, click-app, python-lib) that configure it on my new projects (with a passing test) out of the box.

(Honestly, at this point in my career I consider continuous integration a DAGNI - Definitely Are Gonna Need It.)

One particularly worthwhile trick is making sure the tests can spin up their own isolated test databases - another thing which is pretty easy to setup early (Django does this for you) and harder to add later on. I extend that to other external data stores - I once put a significant amount of effort into setting up a mechanism for running tests against Elasticsearch and clearing out the data again afterwards, and it paid off multiple times over.

Even better: continuous deployment! When the tests pass, deploy. If you have automated deployment setup already adding this is pretty easy, and doing it from the very start of a project sets a strong cultural expectation that no-one will land code to the main branch until it's in a production-ready state and covered by unit tests.

(If continuous deployment to production is too scary for your project, a valuable middle-ground is continuous deployment to a staging environment. Having everyone on your team able to interact with a live demo of your current main branch is a huge group productivity boost.)

API pagination

Never build an API endpoint that isn't paginated. Any time you think "there will never be enough items in this list for it to be worth pagination" one of your users will prove you wrong.

This can be as simple as shipping an API which, even though it only returns a single page, has hard-coded JSON that looks like this:

{
  "results": [
    {"id": 1, "name": "One"},
    {"id": 2, "name": "Two"},
    {"id": 3, "name": "Three"}
  ],
  "next_url": null
}

But make sure you leave space for the pagination information! You'll regret it if you don't.

Detailed API logs

This is a trick I learned while porting VaccinateCA to Django. If you are building an API, having a mechanism that provides detailed logs - including the POST bodies passed to the API - is invaluable.

It's an inexpensive way of maintaining a complete record of what happened with your application - invaluable for debugging, but also for tricks like replaying past API traffic against a new implementation under test.

Logs like these may become infeasible at scale, but for a new project they'll probably add up to just a few MBs a day - and they're easy to prune or switch off later on if you need to.

VIAL uses a Django view decorator to log these directly to a PostgreSQL table. We've been running this for a few months and it's now our largest table, but it's still only around 2GB - easily worth it for the productivity boost it gives us.

(Don't log any sensitive data that you wouldn't want your development team having access to while debugging a problem. This may require clever redaction, or you can avoid logging specific endpoints entirely. Also: don't log authentication tokens that could be used to imitate users: decode them and log the user identifier instead.)

A bookmarkable interface for executing read-only SQL queries against your database

This one is very much exposing my biases (I just released Django SQL Dashboard 1.0 which provides exactly this for Django+PosgreSQL projects) but having used this for the past few months I can't see myself going back. Using bookmarked SQL queries to inform the implementation of new features is an incredible productivity boost. Here's an issue I worked on recently with 18 comments linking to illustrative SQL queries.

(On further thought: this isn't actually a great example of a PAGNI because it's not particularly hard to add this to a project at a later date.)

Driving down the cost

One trick with all of these things is that while they may seem quite expensive to implement, they get dramatically cheaper as you gain experience and gather more tools for helping put them into practice.

Any of the ideas I've shown here could take an engineering team weeks (if not months) to add to an existing project - but with the right tooling they can represent just an hour (or less) work at the start of a project. And they'll pay themselves off many, many times over in the future.

Tags: continuous-deployment, continuous-integration, definitions, software-engineering, testing, pytest, github-actions, django-sql-dashboard, yagni, pagni

Blazing fast CI with pytest-split and GitHub Actions

2021-02-22T19:06:40+00:00

Blazing fast CI with pytest-split and GitHub Actions

pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions.

Via @djm_

Tags: testing, pytest, github-actions

A cookiecutter template for writing Datasette plugins

2020-06-20T16:15:42+00:00

Datasette’s plugin system is one of the most interesting parts of the entire project. As I explained to Matt Asay in this interview, the great thing about plugins is that Datasette can gain new functionality overnight without me even having to review a pull request. I just need to get more people to write them!

datasette-plugin is my most recent effort to help make that as easy as possible. It’s a cookiecutter template that sets up the outline of a new plugin, combining various best patterns I’ve discovered over the past two years of writing my own plugins.

Once you’ve installed cookiecutter you can start building a new plugin by running:

cookiecutter gh:simonw/datasette-plugin

Cookiecutter will run a quick interactive session asking for a few details. It will then use those details to generate a new directory structure ready for you to start work on the plugin.

The datasette-plugin README describes the next steps. A couple of things are worth exploring in more detail.

Writing tests for plugins

I’m a big believer in automated testing: every single one of my plugins includes tests, and those test are run against every commit and must pass before new packages are shipped to PyPI.

In my experience the hardest part of writing tests is getting them started: setting up an initial test harness and ensuring that new tests can be easily written.

datasette-plugin adds pytest as a testing dependency and creates a tests/ folder with an initial, passing unit test in it.

The test confirms that the new plugin has been correctly installed, by running a request through a configured Datasette instance and hitting the /-/plugins.json introspection endpoint.

In doing so, it demonstrates how to run tests that interact with Datasette’s HTTP API. This is a very productive way to write tests.

The example test uses the HTTPX Python library. HTTPX offers a requests-style API but with a couple of crucial improvements. Firstly, it’s been built with asyncio support as a top-level concern. Secondly, it understands the ASGI protocol and can be run directly against an ASGI Python interface without needing to spin up an actual HTTP server. Since Datasette speaks ASGI this makes it the ideal tool for testing Datasette plugins.

Here’s that first test that gets created by the cookiecutter template:

from datasette.app import Datasette
import pytest
import httpx

@pytest.mark.asyncio
async def test_plugin_is_installed():
    app = Datasette([], memory=True).app()
    async with httpx.AsyncClient(app=app) as client:
        response = await client.get(
            "http://localhost/-/plugins.json"
        )
        assert 200 == response.status_code
        installed_plugins = {
            p["name"] for p in response.json()
        }
        assert "datasette-plugin-template-demo" in installed_plugins

My hope is that including a passing test that demonstrates how to execute test requests will make it much easier for plugin authors to start building out their own custom test suite.

Continuous integration with GitHub Actions

My favourite thing about GitHub Actions is that they’re enabled on every GitHub repository for free, without any extra configuration necessary.

The datasette-plugin template takes advantage of this. Not only does every new project get a passing test - it also gets a GitHub Action - in .github/workflows/test.yml - that executes the tests on every commit.

It even runs the test suite in parallel against Python 3.6, 3.7 and 3.8 - the versions currently supported by Datasette itself.

A second action in .github/workflows/publish.yml bakes in my opinions on the best way to manage plugin releases: it builds and ships a new package to PyPI every time a new tag (and corresponding GitHub release) is added to the repository.

For this to work you’ll need to create a PyPI API token and add it to your plugin’s GitHub repository as a PYPI_TOKEN secret. This is explained in the README.

Deploying a live demo of the template with GitHub Actions

Whenever possible, I like to ship my projects with live demos. The Datasette repository publishes a demo of the latest commit to https://latest.datasette.io/ on every commit. I try to do the same for my plugins, where it makes sense to do so.

What could a live demo of a cookiecutter template look like?

Ideally it would show a complete, generated project. I love GitHub’s code browsing interface, so a separate repository containing that generated project would be ideal.

So that’s what https://github.com/simonw/datasette-plugin-template-demo is: it’s a repository showing the most recent output of the latest version of the cookiecutter template that lives in https://github.com/simonw/datasette-plugin.

It’s powered by this GitHub Action, which runs on every push to the datasette-plugin repo, installs cookiecutter, uses cookiecutter against some fixed inputs to re-generate the project and then pushes the results up to datasette-plugin-template-demo as a new commit.

As a fun final touch, it uses the GitHub commit comments API to add a comment to the commit to datasette-plugin linking to the “browse” view on the resulting code in the datasette-plugin-template-demo repository. Here’s one of those commit comments.

Figuring out how to build this took quite a bit of work. Issue #4 has a blow-by-blow rundown of how I got it working.

I couldn’t resist tweeting about it:

Writing a GitHub Action for a repo that generates content for a second repo and pushes that content to the second repo and then posts a comment to the commit on the first repo that links to the newly committed code in the second repo
- Simon Willison (@simonw) June 19, 2020

Tags: github, plugins, projects, pypi, datasette, pytest, github-actions, cookiecutter

How to cheat at unit tests with pytest and Black

2020-02-11T06:56:55+00:00

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

The one thing I’ve learned about permissions code over the years is that it absolutely warrants comprehensive unit tests. This is not code that can afford to have dumb bugs, or regressions caused by future development!

I’ve become a big proponent of pytest over the past two years, but this is the first Django project that I’ve built using pytest from day one as opposed to relying on the Django test runner. It’s been a great opportunity to try out pytest-django, and I’m really impressed with it. It maintains my favourite things about Django’s test framework - smart usage of database transactions to reset the database and a handy test client object for sending fake HTTP requests - and adds all of that pytest magic that I’ve grown to love.

It also means I get to use my favourite trick for productively writing unit tests: the combination of pytest and Black, the “uncompromising Python code formatter”.

Cheating at unit tests

In pure test-driven development you write the tests first, and don’t start on the implementation until you’ve watched them fail.

Most of the time I find that this is a net loss on productivity. I tend to prototype my way to solutions, so I often find myself with rough running code before I’ve developed enough of a concrete implementation plan to be able to write the tests.

So… I cheat. Once I’m happy with the implementation I write the tests to match it. Then once I have the tests in place and I know what needs to change I can switch to using changes to the tests to drive the implementation.

In particular, I like using a rough initial implementation to help generate the tests in the first place.

Here’s how I do that with pytest. I’ll write a test that looks something like this:

def test_some_api(client):
    response = client.get("/some/api/")
    assert False == response.json()

Note that I’m using the pytest-django client fixture here, which magically passes a fully configured Django test client object to my test function.

I run this test, and it fails:

pytest -k test_some_api

(pytest -k blah runs just tests that contain blah in their name)

Now… I run the test again, but with the --pdb option to cause pytest to drop me into a debugger at the failure point:

$ pytest -k test_some_api --pdb
== test session starts ===
platform darwin -- Python 3.7.5, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
django: settings: config.test_settings (from ini)
...
client = <django.test.client.Client object at 0x10cfdb510>

    def test_some_api(client):
        response = client.get("/some/api/")
>       assert False == response.json()
E       assert False == {'this': ['is', 'an', 'example', 'api']}
core/test_docs.py:27: AssertionError
>> entering PDB >>
>> PDB post_mortem (IO-capturing turned off) >>
> core/test_docs.py(27)test_some_api()
-> assert False == response.json()
(Pdb) response.json()
{'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'}
(Pdb)

Running response.json() in the debugger dumps out the actual value to the console.

Then I copy that output - in this case {'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'} - and paste it into the test:

def test_some_api(client):
    response = client.get("/some/api/")
    assert {'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'} == response.json()

Finally, I run black . in my project root to reformat the test:

def test_some_api(client):
    response = client.get("/some/api/")
    assert {
        "this": ["is", "an", "example", "api"],
        "that_outputs": "JSON",
    } == response.json()

This last step means that no matter how giant and ugly the test comparison has become I’ll always get a neatly formatted test out of it.

I always eyeball the generated test to make sure that it’s what I would have written by hand if I wasn’t so lazy - then I commit it along with the implementation and move on to the next task.

I’ve used this technique to write many of the tests in both Datasette and sqlite-utils, and those are by far the best tested pieces of software I’ve ever released.

I started doing this around two years ago, and I’ve held off writing about it until I was confident I understood the downsides. I haven’t found any yet: I end up with a robust, comprehensive test suite and it takes me less than half the time to write the tests than if I’d been hand-crafting all of those comparisons from scratch.

Also this week

Working on Datasette Cloud has required a few minor releases to some of my open source projects:

Shipped datasette-auth-existing-cookies 0.6 and 0.6.1
Shipped sqlite-utils 2.2, 2.2.1, 2.3 and 2.3.1
Shipped Datasette 0.35 with a new utility method for plugins to render their own templates, which I’m now using in…
datasette-upload-csvs 0.2a - still very alpha, but at least it looks slightly nicer now

Unrelated to Datasette Cloud, I also shipped twitter-to-sqlite 0.16 with a new command for importing your Twitter friends (previously it only had a command for importing your followers).

In bad personal motivation news… I missed my weekly update to Niche Museums and lost my streak!

Tags: projects, python, testing, datasette, pytest, weeknotes, datasette-cloud, black

Porting Datasette to ASGI, and Turtles all the way down

2019-06-23T21:39:00+00:00

This evening I finally closed a Datasette issue that I opened more than 13 months ago: #272: Port Datasette to ASGI. A few notes on why this is such an important step for the project.

ASGI is the Asynchronous Server Gateway Interface standard. It’s been evolving steadily over the past few years under the guidance of Andrew Godwin. It’s intended as an asynchronous replacement for the venerable WSGI.

Turtles all the way down

Ten years ago at EuroDjangoCon 2009 in Prague I gave a talk entitled Django Heresies. After discussing some of the design decisions in Django that I didn’t think had aged well, I spent the last part of the talk talking about Turtles all the way down. I wrote that idea up here on my blog (see also these slides).

The key idea was that Django would be more interesting if the core Django contract - a function that takes a request and returns a response - was extended to more places in the framework. The top level site, the reusable applications, middleware and URL routing could all share that same contract. Everything could be composed from the same raw building blocks.

I’m excited about ASGI because it absolutely fits the turtles all the way down model.

The ASGI contract is an asynchronous function that takes three arguments:

async def application(scope, receive, send):
    ...

scope is a serializable dictionary providing the context for the current connection. receive is an awaitable which can be used to recieve incoming messages. send is an awaitable that can be used to send replies.

It’s a pretty low-level set of primitives (and less obvious than a simple request/response) - and that’s because ASGI is about more than just the standard HTTP request/response cycle. This contract works for HTTP, WebSockets and potentially any other protocol that needs to asynchronously send and receive data.

It’s an extremely elegant piece of protocol design, informed by Andrew’s experience with Django Channels, SOA protocols (we are co-workers at Eventbrite where we’ve both been heavily involved in Eventbrite’s SOA mechanism) and Andrew’s extensive conversations with other maintainers in the Python web community.

The ASGI protocol really is turtles all the way down - it’s a simple, well defined contract which can be composed together to implement all kinds of interesting web architectural patterns.

My asgi-cors library was my first attempt at building an ASGI turtle. The implementation is a simple Python decorator which, when applied to another ASGI callable, adds HTTP CORS headers based on the parameters you pass to the decorator. The library has zero installation dependencies (it has test dependencies on pytest and friends) and can be used on any HTTP ASGI project.

Building asgi-cors completely sold me on ASGI as the turtle pattern I had been desiring for over a decade!

Datasette plugins and ASGI

Which brings me to Datasette.

One of the most promising components of Datasette is its plugin mechanism. Based on pluggy (extracted from pytest), Datasette Plugins allow new features to be added to Datasette without needing to change the underlying code. This means new features can be built, packaged and shipped entirely independently of the core project. A list of currently available plugins can be found here.

WordPress is very solid blogging engine. Add in the plugin ecosystem around it and it can be used to build literally any CMS you can possibly imagine.

My dream for Datasette is to apply the same model: I want a strong core for publishing and exploring data that’s enhanced by plugins to solve a huge array of data analysis, visualization and API-backed problems.

Datasette has a range of plugin hooks already, but I’ve so far held back on implementing the most useful class of hooks: hooks that allow developers to add entirely new URL routes exposing completely custom functionality.

The reason I held back is that I wanted to be confident that the contract I was offering was something I would continue to support moving forward. A plugin system isn’t much good if the core implementation keeps on changing in backwards-incompatible ways.

ASGI is the exact contract I’ve been waiting for. It’s not quite ready yet, but you can follow #520: prepare_asgi plugin hook (thoughts and suggestions welcome!) to be the first to hear about this hook when it lands. I’m planning to use it to make my asgi-cors library available as a plugin, after which I’m excited to start exploring the idea of bringing authentication plugins to Datasette (and to the wider ASGI world in general).

I’m hoping that many Datasette ASGI plugins will exist in a form that allows them to be used by other ASGI applications as well.

I also plan to use ASGI to make components of Datasette itself available to other ASGI applications. If you just want a single instance of Datasette’s table view to be embedded somewhere in your URL configuration you should be able to do that by routing traffic directly to the ASGI-compatible view class.

I’m really excited about exploring the intersection of ASGI turtles-all-the-way-down and pluggy’s powerful mechanism for gluing components together. Both WSGI and Django’s reusable apps have attempted to create a reusable ecosystem in the past, to limited levels of success. Let’s see if ASGI can finally make the turtle dream come true.

parameterized

2019-02-19T21:05:05+00:00

parameterized

I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner.

Tags: python, testing, pytest

Documentation unit tests

2018-07-28T15:59:55+00:00

Or: Test-driven documentation.

Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.

One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.

When combined with a code review system (like Phabricator or GitHub pull requests) this pattern lets you enforce documentation updates as part of the review process: if a change doesn’t update the relevant documentation, point that out in your review!

Good code review systems also execute unit tests automatically and attach the results to the review. This provides an opportunity to have the tests enforce other aspects of the codebase: for example, running a linter so that no-one has to waste their time arguing over standardize coding style.

I’ve been experimenting with using unit tests to ensure that aspects of a project are covered by the documentation. I think it’s a very promising technique.

Introspect the code, introspect the docs

The key to this trick is introspection: interogating the code to figure out what needs to be documented, then parsing the documentation to see if each item has been covered.

I’ll use my Datasette project as an example. Datasette’s test_docs.py module contains three relevant tests:

test_config_options_are_documented checks that every one of Datasette’s configuration options are documented.
test_plugin_hooks_are_documented ensures all of the plugin hooks (powered by pluggy) are covered in the plugin documentation.
test_view_classes_are_documented iterates through all of the *View classes (corresponding to pages in the Datasette user interface) and makes sure they are covered.

In each case, the test uses introspection against the relevant code areas to figure out what needs to be documented, then runs a regular expression against the documentation to make sure it is mentioned in the correct place.

Obviously the tests can’t confirm the quality of the documentation, so they are easy to cheat: but they do at least protect against adding a new option but forgetting to document it.

Testing that Datasette’s view classes are covered

Datasette’s view classes use a naming convention: they all end in View. The current list of view classes is DatabaseView, TableView, RowView, IndexView and JsonDataView.

Since these classes are all imported into the datasette.app module (in order to be hooked up to URL routes) the easiest way to introspect them is to import that module, then run dir(app) and grab any class names that end in View. We can do that with a Python list comprehension:

from datasette import app
views = [v for v in dir(app) if v.endswith("View")]

I’m using reStructuredText labels to mark the place in the documentation that addresses each of these classes. This also ensures that each documentation section can be linked to, for example:

http://datasette.readthedocs.io/en/latest/pages.html#tableview

The reStructuredText syntax for that label looks like this:

.. _TableView:

Table
=====

The table page is the heart of Datasette...

We can extract these labels using a regular expression:

from pathlib import Path
import re

docs_path = Path(__file__).parent.parent / 'docs'
label_re = re.compile(r'\.\. _([^\s:]+):')

def get_labels(filename):
    contents = (docs_path / filename).open().read()
    return set(label_re.findall(contents))

Since Datasette’s documentation is spread across multiple *.rst files, and I want the freedom to document a view class in any one of them, I iterate through every file to find the labels and pull out the ones ending in View:

def documented_views():
    view_labels = set()
    for filename in docs_path.glob("*.rst"):
        for label in get_labels(filename):
            first_word = label.split("_")[0]
            if first_word.endswith("View"):
                view_labels.add(first_word)
    return view_labels

We now have a list of class names and a list of labels across all of our documentation. Writing a basic unit test comparing the two lists is trivial:

def test_view_documentation():
    view_labels = documented_views()
    view_classes = set(v for v in dir(app) if v.endswith("View"))
    assert view_labels == view_classes

Taking advantage of pytest

Datasette uses pytest for its unit tests, and documentation unit tests are a great opportunity to take advantage of some advanced pytest features.

Parametrization

The first of these is parametrization: pytest provides a decorator which can be used to execute a single test function multiple times, each time with different arguments.

This example from the pytest documentation shows how parametrization works:

import pytest
@pytest.mark.parametrize("test_input,expected", [
    ("3+5", 8),
    ("2+4", 6),
    ("6*9", 42),
])
def test_eval(test_input, expected):
    assert eval(test_input) == expected

pytest treats this as three separate unit tests, even though they share a single function definition.

We can combine this pattern with our introspection to execute an independent unit test for each of our view classes. Here’s what that looks like:

@pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
def test_view_classes_are_documented(view):
    assert view in documented_views()

Here’s the output from pytest if we execute just this unit test (and one of our classes is undocumented):

$ pytest -k test_view_classes_are_documented -v
=== test session starts ===
collected 249 items / 244 deselected

tests/test_docs.py::test_view_classes_are_documented[DatabaseView] PASSED [ 20%]
tests/test_docs.py::test_view_classes_are_documented[IndexView] PASSED [ 40%]
tests/test_docs.py::test_view_classes_are_documented[JsonDataView] PASSED [ 60%]
tests/test_docs.py::test_view_classes_are_documented[RowView] PASSED [ 80%]
tests/test_docs.py::test_view_classes_are_documented[TableView] FAILED [100%]

=== FAILURES ===

view = 'TableView'

    @pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
    def test_view_classes_are_documented(view):
>       assert view in documented_views()
E       AssertionError: assert 'TableView' in {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'}
E        +  where {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'} = documented_views()

tests/test_docs.py:77: AssertionError
=== 1 failed, 4 passed, 244 deselected in 1.13 seconds ===

Fixtures

There’s a subtle inefficiency in the above test: for every view class, it calls the documented_views() function - and that function then iterates through every *.rst file in the docs/ directory and uses a regular expression to extract the labels. With 5 view classes and 17 documentation files that’s 85 executions of get_labels(), and that number will only increase as Datasette’s code and documentation grow larger.

We can use pytest’s neat fixtures to reduce this to a single call to documented_views() that is shared across all of the tests. Here’s what that looks like:

@pytest.fixture(scope="session")
def documented_views():
    view_labels = set()
    for filename in docs_path.glob("*.rst"):
        for label in get_labels(filename):
            first_word = label.split("_")[0]
            if first_word.endswith("View"):
                view_labels.add(first_word)
    return view_labels

@pytest.mark.parametrize("view_class", [
    v for v in dir(app) if v.endswith("View")
])
def test_view_classes_are_documented(documented_views, view_class):
    assert view_class in documented_views

Fixtures in pytest are an example of dependency injection: pytest introspects every test_* function and checks if it has a function argument with a name matching something that has been annotated with the @pytest.fixture decorator. If it finds any matching arguments, it executes the matching fixture function and passes its return value in to the test function.

By default, pytest will execute the fixture function once for every test execution. In the above code we use the scope="session" argument to tell pytest that this particular fixture should be executed only once for every pytest command-line execution of the tests, and that single return value should be passed to every matching test.

What if you haven’t documented everything yet?

Adding unit tests to your documentation in this way faces an obvious problem: when you first add the tests, you may have to write a whole lot of documentation before they can all pass.

Having tests that protect against future code being added without documentation is only useful once you’ve added them to the codebase - but blocking that on documenting your existing features could prevent that benefit from ever manifesting itself.

Once again, pytest to the rescue. The @pytest.mark.xfail decorator allows you to mark a test as “expected to fail” - if it fails, pytest will take note but will not fail the entire test suite.

This means you can add deliberately failing tests to your codebase without breaking the build for everyone - perfect for tests that look for documentation that hasn’t yet been written!

I used xfail when I first added view documentation tests to Datasette, then removed it once the documentation was all in place. Any future code in pull requests without documentation will cause a hard test failure.

Here’s what the test output looks like when some of those tests are marked as “expected to fail”:

$ pytest tests/test_docs.py
collected 31 items

tests/test_docs.py ..........................XXXxx.                [100%]

============ 26 passed, 2 xfailed, 3 xpassed in 1.06 seconds ============

Since this reports both the xfailed and the xpassed counts, it shows how much work is still left to be done before the xfail decorator can be safely removed.

Structuring code for testable documentation

A benefit of comprehensive unit testing is that it encourages you to design your code in a way that is easy to test. In my experience this leads to much higher code quality in general: it encourages separation of concerns and cleanly decoupled components.

My hope is that documentation unit tests will have a similar effect. I’m already starting to think about ways of restructuring my code such that I can cleanly introspect it for the areas that need to be documented. I’m looking forward to discovering code design patterns that help support this goal.

Tags: design-patterns, documentation, restructuredtext, testing, datasette, pytest

Datasette plugins, and building a clustered map visualization

2018-04-20T15:41:11+00:00

Datasette now supports plugins!

Last Saturday I asked Twitter for examples of Python projects with successful plugin ecosystems. pytest was the clear winner: the pytest plugin compatibility table (an ingenious innovation that I would love to eventually copy for Datasette) lists 457 plugins, and even the core pytest system itself is built as a collection of default plugins that can be replaced or over-ridden.

Best of all: pytest’s plugin mechanism is available as a separate package: pluggy. And pluggy was exactly what I needed for Datasette.

You can follow the ongoing development of the feature in issue #14. This morning I released Datasette 0.20 with support for a number of different plugin hooks: plugins can add custom template tags and SQL functions, and can also bundle their own static assets, JavaScript, CSS and templates. The hooks are described in some detail in the Datasette Plugins documentation.

datasette-cluster-map

I also released my first plugin: datasette-cluster-map. Once installed, it looks out for database tables that have a latitude and longitude column. When it finds them, it draws all of the points on an interactive map using Leaflet and Leaflet.markercluster.

Let’s try it out on some polar bears!

The USGS Alaska Science Center have released a delightful set of data entitled Sensor and Location data from Ear Tag PTTs Deployed on Polar Bears in the Southern Beaufort Sea 2009 to 2011. It’s a collection of CSV files, which means it’s trivial to convert it to SQLite using my csvs-to-sqlite tool.

Having created the SQLite database, we can deploy it to a hosting account on Zeit Now alongside the new plugin like this:

# Make sure we have the latest datasette
pip3 install datasette --upgrade
# Deploy polar-bears.db to now with an increased default page_size
datasette publish now \
    --install=datasette-cluster-map \
    --extra-options "--page_size=500" \
    polar-bears.db

The --install option is new in Datasette 0.20 (it works for datasette publish heroku as well) - it tells the publishing provider to pip install the specified package. You can use it more than once to install multiple plugins, and it accepts a path to a zip file in addition to the name of a PyPI package.

Explore the full demo at https://datasette-cluster-map-demo.now.sh/polar-bears

Visualize any query on a map

Since the plugin inserts itself at the top of any Datasette table view with latitude and longitude columns, there are all sorts of neat tricks you can do with it.

I also loaded the San Francisco tree list (thanks, Department of Public Works) into the demo. Impressively, you can click “load all” on this page and Leaflet.markercluster will load in all 189,144 points and display them on the same map… and it works fine on my laptop and my phone. Computers in 2018 are pretty good!

But since it’s a Datasette table, we can filter it. Here’s a map of every New Zealand Xmas Tree in San Francisco (8,683 points). Here’s every tree where the Caretaker is Friends of the Urban Forest. Here’s every palm tree planted in 1990:

Update: This is an incorrect example: there are 21 matches on "palm avenue" because the FTS search index covers the address field - they're not actually palm trees. Here's a corrected query for palm trees planted in 1990.

The plugin currently only works against columns called latitude and longitude… but if your columns are called something else, don’t worry: you can craft a custom SQL query that aliases your columns and everything will work as intended. Here’s an example against some more polar bear data:

select *, "Capture Latitude" as latitude, "Capture Longitude" as longitude
from [USGS_WC_eartag_deployments_2009-2011]

Writing your own plugins

I’m really excited to see what people invent. If you want to have a go, your first stop should be the Plugins documentation. If you want an example of a simple plugin (including the all-important mechanism for packaging it up using setup.py) take a look at datasette-cluster-map on GitHub.

And if you have any thoughts, ideas or suggestions on how the plugin mechanism can be further employed please join the conversation on issue #14. I’ve literally just got started with Datasette’s plugin hooks, and I’m very keen to hear about things people want to build that aren’t yet supported.

Tags: maps, plugins, projects, visualization, datasette, pytest, leaflet

Simon Willison's Weblog: pytest

Tips for getting coding agents to write good Python tests

TIL: Subtests in pytest 9.0.0+

Setting up a codebase for working with coding agents

TIL: Testing different Python versions with uv with-editable and uv-test

Making PyPI's test suite 81% faster

Smoke test your Django admin site

[red-knot] type inference/checking test framework

An LLM TDD loop

Python Developers Survey 2023 Results

Upgrading my cookiecutter templates to use python -m pytest

inline-snapshot

pytest-icdiff

pyfakefs usage

mitsuhiko/insta

Running C unit tests with pytest

How I build a feature

Everything starts with an issue

Development environment

Automated tests

Implementing the feature

Code formatting with Black

Linting

Documentation

Committing the change

Branches and pull requests

Release notes, and a release

A live demo

Tell the world about it

More examples of this pattern

PAGNIs: Probably Are Gonna Need Its

A kill-switch for your mobile apps

Automated deploys

Continuous Integration (and a test framework)

API pagination

Detailed API logs

A bookmarkable interface for executing read-only SQL queries against your database

Driving down the cost

Blazing fast CI with pytest-split and GitHub Actions

A cookiecutter template for writing Datasette plugins

Writing tests for plugins

Continuous integration with GitHub Actions

Deploying a live demo of the template with GitHub Actions

How to cheat at unit tests with pytest and Black

Cheating at unit tests

Also this week

Porting Datasette to ASGI, and Turtles all the way down

Turtles all the way down

Datasette plugins and ASGI

Further reading

parameterized

Documentation unit tests

Introspect the code, introspect the docs

Testing that Datasette’s view classes are covered

Taking advantage of pytest

Parametrization

Fixtures

What if you haven’t documented everything yet?

Structuring code for testable documentation

Datasette plugins, and building a clustered map visualization

datasette-cluster-map

Visualize any query on a map

Writing your own plugins