Simon Willison's Weblog: My open source process

Your job is to deliver code you have proven to work

2025-12-18T14:49:38+00:00

In all of the debates about the value of AI-assistance in software development there's one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers - or open source maintainers - and expects the "code review" process to handle the rest.

This is rude, a waste of other people's time, and is honestly a dereliction of duty as a software developer.

Your job is to deliver code you have proven to work.

As software engineers we don't just crank out code - in fact these days you could argue that's what the LLMs are for. We need to deliver code that works - and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

How to prove it works

There are two steps to proving a piece of code works. Neither is optional.

The first is manual testing. If you haven't seen the code do the right thing yourself, that code doesn't work. If it does turn out to work, that's honestly just pure chance.

Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.

If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here's a recent example.

Some changes are harder to demonstrate. It's still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.

Once you've tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.

The second step in proving a change works is automated testing. This is so much easier now that we have LLM tooling, which means there's no excuse at all for skipping this step.

Your contribution should bundle the change with an automated test that proves the change works. That test should fail if you revert the implementation.

The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.

Don't be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I've done this myself I've quickly regretted it.

Make your coding agent prove it first

The most important trend in LLMs in 2025 has been the explosive growth of coding agents - tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.

To master these tools you need to learn how to get them to prove their changes work as well.

This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.

Since they're robots, automated tests and manual tests are effectively the same thing.

They do feel a little different though. When I'm working on CLI tools I'll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like Click's CLIRunner.

When working on CSS changes I'll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.

The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They'll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.

Developing good taste in testing code is another of those skills that differentiates a senior engineer.

The human provides the accountability

A computer can never be held accountable. That's your job as the human in the loop.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing code that is proven to work.

Next time you submit a PR, make sure you've included your evidence that it works as it should.

Tags: programming, careers, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, coding-agents

A selfish personal argument for releasing code as Open Source

2025-01-24T21:46:03+00:00

I'm the guest for the most recent episode of the Real Python podcast with Christopher Bailey, talking about Using LLMs for Python Development. We covered a lot of other topics as well - most notably my relationship with Open Source development over the years.

At 5m32s I presented what I think is the best version yet of my selfish personal argument for why it makes sense to default to releasing code as Open Source:

I didn't really get heavily back into open source until about maybe six years ago when I'd been working for a big company in the US, and I got frustrated that all of the code I was writing, I'd never get to use again.

I realized that one of the best things about open source software is that you can solve a problem once and then you can slap an open source license on that solution and you will never have to solve that problem ever again, no matter who's employing you in the future.

It's a sneaky way of solving a problem permanently.

Once I realized that I started open sourcing everything, like pretty much every piece of code I've written in the past six years, I've open sourced purely as a defense against me losing access to that code in the future.

Because I've written loads of code for employers that I don't get to use anymore - and how many times do you want to reinvent things?

I like to say that my interest in open source is actually really selfish. I figured something out. I never want to have to do this work ever again.

If I slap a license on it, write documentation for it so that I can remember what it does and write unit tests for it so it's easy for me to keep it working in the future, that's entirely beneficial to me.

The rest of the episode was a really great conversation - other topics we covered included:

4m40s: My first ever significant open source project - a PHP XML-RPC library that ended up in WordPress twenty years ago
10m08s: Benefits I've gained from starting a blog 22+ years ago
22m14s: How to get started using LLMs to write Python
36m55s: My workflow for using LLMs for code - for both the experimental research work (I called this the "Mise en place phase") and the follow-up where I actually write the finished code
55m14s: Why an SVG of a pelican riding a bicycle?
57m48s: Why saying "do it better" actually works!
1h0m24s: Cooking with LLMs! How to get a weirdly tasty guacamole recipe
1h08m52s: My latest thoughts on local models

Tags: open-source, podcasts, python, ai, generative-ai, llms, podcast-appearances

Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions

2024-01-16T21:59:56+00:00

I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.

I made some major upgrades to my python-lib cookiecutter template today. Here's what it can now do to help you get started with a new Python library:

Create a pyproject.toml file configured for use with setuptools. In my opinion this is the pattern with the current lowest learning curve - I wrote about that in detail in this TIL.
Add a skeleton README and an Apache 2.0 LICENSE file.
Create your_package/__init__.py for your code to go in.
Create tests/test_your_package.py with a skeleton test.
Include pytest as a test dependency.
Configure GitHub Actions with two workflows in .github/workflows - one for running the tests against Python 3.8 through 3.12, and one for publishing releases of your package to PyPI.

The changes I made today are that I switched from setup.py to pyproject.toml, and I made a big improvement to how the publishing workflow authenticates with PyPI.

Publishing to PyPI with Trusted Publishing

My previous version of this template required you to jump through quite a few hoops to get PyPI publishing to work. You needed to create a PyPI token that could publish a new package, then paste that token into a GitHub Actions secret, then publish the package, and then disable that token and create a new one dedicated to just updating this package in the future.

The new version is much simpler, thanks to PyPI's relatively new Trusted Publishers mechanism.

To publish a new package, you need to sign into PyPI and create a new "pending publisher". Effectively you tell PyPI "My GitHub repository myname/name-of-repo should be allowed to publish packages with the name name-of-package".

Here's that form for my brand new datasette-test library, the first library I published using this updated template:

Then create a release on GitHub, with a name that matches the version number from your pyproject.toml. Everything else should Just Work.

I wrote more about Trusted Publishing in this TIL.

Creating a package using a GitHub repository template

The most time consuming part of this project was getting my GitHub repository template to work properly.

There are two ways to use my cookiecutter template. You can use the cookiecutter command-line tool like this:

pipx install cookiecutter
cookiecutter gh:simonw/python-lib
# Answer a few questions here

But a more fun and convenient option is to use my GitHub repository template, simonw/python-lib-template-repository.

This lets you fill in a form on GitHub to create a new repository which will then execute the cookiecutter template for you and update itself with the result.

You can see an example of a repository created using this template at datasette/datasette-test.

Adding it all together

There are quite a lot of moving parts under the scenes here, but the end result is that anyone can now create a Python library with test coverage, GitHub CI and release automation by filling in a couple of forms and clicking some buttons.

For more details on how this all works, and how it's evolved over time:

A cookiecutter template for writing Datasette plugins from June 2020 describes my first experiments with cookiecutter
Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions from August 2021 describes my earliest attempts at using GitHub repository templates for this
How to build, test and publish an open source Python library is a ten minute talk I gave at PyGotham in November 2021. It describes setup.py in detail, which is no longer my preferred approach.

Tags: github, projects, pypi, python, github-actions, cookiecutter

Things I've learned about building CLI tools in Python

2023-09-30T00:12:19+00:00

I build a lot of command-line tools in Python. It’s become my favorite way of quickly turning a piece of code into something I can use myself and package up for other people to use too.

My biggest CLI projects are sqlite-utils, LLM, shot-scraper and Datasette - but I have dozens of others and I build new ones at the rate of at least one a month. A fun recent example is blip-caption, a tiny CLI wrapper around the Salesforce BLIP model that can generate usable captions for image files.

Here are some notes on what I’ve learned about designing and implementing CLI tools in Python so far.

Starting with a template

I build enough CLI apps that I developed my own Cookiecutter template for starting new ones.

That template is simonw/click-app. You can create a new application from that template directly on GitHub, too - I wrote more about that in Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions.

Arguments, options and conventions

Almost all of my tools are built using the Click Python library. Click encourages a specific way of designing CLI tools which I really like - I find myself annoyed at the various tools from other ecosystems that don’t stick to the conventions that Click encourages.

I’ll try to summarize those conventions here.

Commands have arguments and options. Arguments are positional - they are strings that you pass directly to the command, like data.db in datasette data.db. Arguments can be required or optional, and you can have commands which accept an unlimited number of arguments.
Options are, usually, optional. They are things like --port 8000. Options can also have a single character shortened version, such as -p 8000.
- Very occasionally I'll create an option that is required, usually because a command has so many positional arguments that forcing an option makes its usage easier to read.
Some options are flags - they don't take any additional parameters, they just switch something on. shot-scraper --retina is an example of this.
Flags with single character shortcuts can be easily combined - symbex -in fetch_data is short for symbex --imports --no-file fetch_data for example.
Some options take multiple parameters. datasette --setting sql_time_limit_ms 10000 is an example, taking both the name of the setting and the value it should be set to.
Commands can have sub-commands, each with their own family of commands. llm templates is an example of this, with llm templates list and llm templates show and several more.
Every command should have help text - the more detailed the better. This can be viewed by running llm --help - or for sub-commands, llm templates --help.

Click makes it absurdly easy and productive to build CLI tools that follow these conventions.

Consistency is everything

As CLI utilities get larger, they can end up with a growing number of commands and options.

The most important thing in designing these is consistency with other existing commands and options (example here) - and with related tools that your user may have used before.

I often turn to GPT-4 for help with this: I'll ask it for examples of existing CLI tools that do something similar to what I'm about to build, and see if there's anything in their option design that I can emulate.

Since my various projects are designed to complement each other I try to stay consistent between them as well - I'll often post an issue comment that says "similar to functionality in X", with a copy of the --help output for the tool I'm about to imitate.

CLI interfaces are an API - version appropriately

I try to stick to semantic versioning for my projects, bumping the major version number on breaking changes and the minor version number for new features.

The command-line interface to a tool is absolutely part of that documented API. If someone writes a Bash script or a GitHub Actions automation that uses one of my tools, I'm cautious to avoid breaking that without bumping my major version number.

Include usage examples in --help

A habit I've formed more recently is trying to always including a working example of the command in the --help for that command.

I find I use this a lot for tools I've developed myself. All of my tools have extensive online documentation, but I like to be able to consult --help without opening a browser for most of their functionality.

Here's one of my more involved examples - the help for the sqlite-utils convert command:

Usage: sqlite-utils convert [OPTIONS] DB_PATH TABLE COLUMNS... CODE

  Convert columns using Python code you supply. For example:

      sqlite-utils convert my.db mytable mycolumn \
          '"\n".join(textwrap.wrap(value, 10))' \
          --import=textwrap

  "value" is a variable with the column value to be converted.

  Use "-" for CODE to read Python code from standard input.

  The following common operations are available as recipe functions:

  r.jsonsplit(value, delimiter=',', type=<class 'str'>)

      Convert a string like a,b,c into a JSON array ["a", "b", "c"]

  r.parsedate(value, dayfirst=False, yearfirst=False, errors=None)

      Parse a date and convert it to ISO date format: yyyy-mm-dd
      
      - dayfirst=True: treat xx as the day in xx/yy/zz
      - yearfirst=True: treat xx as the year in xx/yy/zz
      - errors=r.IGNORE to ignore values that cannot be parsed
      - errors=r.SET_NULL to set values that cannot be parsed to null

  r.parsedatetime(value, dayfirst=False, yearfirst=False, errors=None)

      Parse a datetime and convert it to ISO datetime format: yyyy-mm-ddTHH:MM:SS
      
      - dayfirst=True: treat xx as the day in xx/yy/zz
      - yearfirst=True: treat xx as the year in xx/yy/zz
      - errors=r.IGNORE to ignore values that cannot be parsed
      - errors=r.SET_NULL to set values that cannot be parsed to null

  You can use these recipes like so:

      sqlite-utils convert my.db mytable mycolumn \
          'r.jsonsplit(value, delimiter=":")'

Options:
  --import TEXT                   Python modules to import
  --dry-run                       Show results of running this against first
                                  10 rows
  --multi                         Populate columns for keys in returned
                                  dictionary
  --where TEXT                    Optional where clause
  -p, --param <TEXT TEXT>...      Named :parameters for where clause
  --output TEXT                   Optional separate column to populate with
                                  the output
  --output-type [integer|float|blob|text]
                                  Column type to use for the output column
  --drop                          Drop original column afterwards
  --no-skip-false                 Don't skip falsey values
  -s, --silent                    Don't show a progress bar
  --pdb                           Open pdb debugger on first error
  -h, --help                      Show this message and exit.

Including --help in the online documentation

My larger tools tend to have extensive documentation independently of their help output. I update this documentation at the same time as the implementation and the tests, as described in The Perfect Commit.

I like to include the --help output in my documentation sites as well. This is mainly for my own purposes - having the help visible on a web page makes it much easier to review it and spot anything that needs updating.

Here are some example pages from my documentation that list --help output:

sqlite-utils CLI reference
LLM CLI reference
Datasette CLI reference
shot-scraper embeds help output on the relevant pages, e.g. shot-scraper shot --help
s3-credentials command help

All of these pages are maintained automatically using Cog. I described the pattern I use for this in Using cog to update --help in a Markdown README file, or you can view source on the Datasette CLI reference for a more involved example.

Tags: cli, python

Coping strategies for the serial project hoarder

2022-11-26T15:47:02+00:00

I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled "Massively increase your productivity on personal projects with comprehensive documentation and automated tests".

The alternative title for the talk was Coping strategies for the serial project hoarder.

I'm maintaining a lot of different projects at the moment. Somewhat unintuitively, the way I'm handling this is by scaling down techniques that I've seen working for large engineering teams spread out across multiple continents.

The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I'm working on at the same time.

You can watch the talk on YouTube (25 minutes). Alternatively, I've included a detailed annotated version of the slides and notes below.

This was the title I originally submitted to the conference. But I realized a better title was probably...

Coping strategies for the serial project hoarder

This video is a neat representation of my approach to personal projects: I always have a few on the go, but I can never resist the temptation to add even more.

My PyPI profile (which is only five years old) lists 185 Python packages that I've released. Technically I'm actively maintaining all of them, in that if someone reports a bug I'll push out a fix. Many of them receive new releases at least once a year.

Aside: I took this screenshot using shot-scraper with a little bit of extra JavaScript to hide a notification bar at the top of the page:

shot-scraper 'https://pypi.org/user/simonw/' \
--javascript "
    document.body.style.paddingTop = 0;
    document.querySelector(
        '#sticky-notifications'
    ).style.display = 'none';
  " --height 1000

How can one individual maintain 185 projects?

Surprisingly, I'm using techniques that I've scaled down from working at a company with hundreds of engineers.

I spent seven years at Eventbrite, during which time the engineering team grew to span three different continents. We had major engineering centers in San Francisco, Nashville, Mendoza in Argentina and Madrid in Spain.

Consider timezones: engineers in Madrid and engineers in San Francisco had almost no overlap in their working hours. Good asynchronous communication was essential.

Over time, I noticed that the teams that were most effective at this scale were the teams that had a strong culture of documentation and automated testing.

As I started to work on my own array of smaller personal projects, I found that the same discipline that worked for large teams somehow sped me up, when intuitively I would have expected it to slow me down.

I wrote an extended description of this in The Perfect Commit.

I've started structuring the majority of my work in terms of what I think of as "the perfect commit" - a commit that combines implementation, tests, documentation and a link to an issue thread.

As software engineers, it's important to note that our job generally isn't to write new software: it's to make changes to existing software.

As such, the commit is our unit of work. It's worth us paying attention to how we can make our commits as useful as possible.

Here's a recent example from one of my projects, Datasette.

It's a single commit which bundles together the implementation, some related documentation improvements and the tests that show it works. And it links back to an issue thread from the commit message.

Let's talk about each component in turn.

There's not much to be said about the implementation: your commit should change something!

It should only change one thing, but what that actually means varies on a case by case basis.

It should be a single change that can be documented, tested and explained independently of other changes.

(Being able to cleanly revert it is a useful property too.)

The goals of the tests that accompany a commit are to prove that the new implementation works.

If you apply the implementation the new tests should pass. If you revert it the tests should fail.

I often use git stash to try this out.

If you tell people they need to write tests for every single change they'll often push back that this is too much of a burden, and will harm their productivity.

But I find that the incremental cost of adding a test to an existing test suite keeps getting lower over time.

The hard bit of testing is getting a testing framework setup in the first place - with a test runner, and fixtures, and objects under test and suchlike.

Once that's in place, adding new tests becomes really easy.

So my personal rule is that every new project starts with a test. It doesn't really matter what that test does - what matters is that you can run pytest to run the tests, and you have an obvious place to start building more of them.

I maintain three cookiecutter templates to help with this, for the three kinds of projects I most frequently create:

simonw/python-lib for Python libraries
simonw/click-app for command line tools
simonw/datasette-plugin for Datasette plugins

Each of these templates creates a project with a setup.py file, a README, a test suite and GitHub Actions workflows to run those tests and ship tagged releases to PyPI.

I have a trick for running cookiecutter as part of creating a brand new repository on GitHub. I described that in Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions.

This is a hill that I will die on: your documentation must live in the same repository as your code!

You often see projects keep their documentation somewhere else, like in a wiki.

Inevitably it goes out of date. And my experience is that if your documentation is out of date people will lose trust in it, which means they'll stop reading it and stop contributing to it.

The gold standard of documentation has to be that it's reliably up to date with the code.

The only way you can do that is if the documentation and code are in the same repository.

This gives you versioned snapshots of the documentation that exactly match the code at that time.

More importantly, it means you can enforce it through code review. You can say in a PR "this is great, but don't forget to update this paragraph on this page of the documentation to reflect the change you're making".

If you do this you can finally get documentation that people learn to trust over time.

Another trick I like to use is something I call documentation unit tests.

The idea here is to use unit tests to enforce that concepts introspected from your code are at least mentioned in your documentation.

I wrote more about that in Documentation unit tests.

Here's an example. Datasette has a test that scans through each of the Datasette plugin hooks and checks that there is a heading for each one in the documentation.

The test itself is pretty simple: it uses pytest parametrization to look through every introspected plugin hook name, and for each one checks that it has a matching heading in the documentation.

The final component of my perfect commit is this: every commit must link to an issue thread.

I'll usually have these open in advance but sometimes I'll open an issue thread just so I can close it with a commit a few seconds later!

Here's the issue for the commit I showed earlier. It has 11 comments, and every single one of those comments is by me.

I have literally thousands of issues on GitHub that look like this: issue threads that are effectively me talking to myself about the changes that I'm making.

It turns out this a fantastic form of additional documentation.

What goes in an issue?

Background: the reasons for the change. In six months time you'll want to know why you did this.
State of play before-hand: embed existing code, link to existing docs. I like to start my issues with "I'm going to change this code right here" - that way if I come back the next day I don't have to repeat that little piece of research.
Links to things! Documentation, inspiration, clues found on StackOverflow. The idea is to capture all of the loose information floating around that topic.
Code snippets illustrating potential designs and false-starts.
Decisions. What did you consider? What did you decide? As programmers we make decisions constantly, all day, about everything. That work doesn't have to be invisible. Writing them down also avoids having to re-litigate them several months later when you've forgotten your original reasoning.
Screenshots - of everything! Animated screenshots even better. I even take screenshots of things like the AWS console to remind me what I did there.
When you close it: a link to the updated documentation and demo

The reason I love issues is that they're a form of documentation that I think of as temporal documentation.

Regular documentation comes with a big commitment: you have to keep it up to date in the future.

Issue comments skip that commitment entirely. They're displayed with a timestamp, in the context of the work you were doing at the time.

No-one will be upset or confused if you fail to keep them updated to match future changes.

So it's a commitment-free form of documentation, which I for one find incredibly liberating.

I think of this approach as issue driven development.

Everything you are doing is issue-first, and from that you drive the rest of the development process.

This is how it relates back to maintaining 185 projects at the same time.

With issue driven development you don't have to remember anything about any of these projects at all.

I've had issues where I did a bunch of design work in issue comments, then dropped it, then came back 12 months later and implemented that design - without having to rethink it.

I've had projects where I forgot that the project existed entirely! But I've found it again, and there's been an open issue, and I've been able to pick up work again.

It's a way of working where you treat it like every project is going to be maintained by someone else, and it's the classic cliche here that the somebody else is you in the future.

It horizontally scales you and lets you tackle way more interesting problems.

Programmers always complain when you interrupt them - there's this idea of "flow state" and that interrupting a programmer for a moment costs them half an hour in getting back up to speed.

This fixes that! It's much easier to get back to what you are doing if you have an issue thread that records where you've got to.

Issue driven development is my key productivity hack for taking on much more ambitious projects in much larger quantities.

Another way to think about this is to compare it to laboratory notebooks.

Here's a page from one by Leonardo da Vinci.

Great scientists and great engineers have always kept detailed notes.

We can use GitHub issues as a really quick and easy way to do the same thing!

Another thing I like to use these for is deep research tasks.

Here's an example, from when I was trying to figure out how to run my Python web application in an AWS Lambda function:

Figure out how to deploy Datasette to AWS Lambda using function URLs and Mangum

This took me 65 comments over the course of a few days... but by the end of that thread I'd figured out how to do it!

Here's the follow-up, with another 77 comments, in which I figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain.

I will never have to figure this out ever again! That's a huge win.

https://github.com/simonw/public-notes is a public repository where I keep some of these issue threads, transferred from my private notes repos using this trick.

The last thing I want to encourage you to do is this: if you do project, tell people what it is you did!

This counts for both personal and work projects. It's so easy to skip this step.

Once you've shipped a feature or built a project, it's so tempting to skip the step of spending half an hour or more writing about the work you have done.

But you are missing out on so much of the value of your work if you don't give other people a chance to understand what you did.

I wrote more about this here: What to blog about.

For projects with releases, release notes are a really good way to do this.

I like using GitHub releases for this - they're quick and easy to write, and I have automation setup for my projects such that creating release notes in GitHub triggers a build and release to PyPI.

I've done over 1,000 releases in this way. Having them automated is crucial, and having automation makes it really easy to ship releases more often.

Please make sure your release notes have dates on them. I need to know when your change went out, because if it's only a week old it's unlikely people will have upgraded to it yet, whereas a change from five years ago is probably safe to depend on.

I wrote more about writing better release notes here.

This is a mental trick which works really well for me. "No project of mine is finished until I've told people about it in some way" is a really useful habit to form.

Twitter threads are (or were) a great low-effort way to write about a project. Build a quick thread with some links and images, and maybe even a video.

Get a little unit about your project out into the world, and then you can stop thinking about it.

(I'm trying to do this on Mastodon now instead.)

Even better: get a blog! Having your own corner of the internet to write about the work that you are doing is a small investment that will pay off many times over.

("Nobody blogs anymore" I said in the talk... Phil Gyford disagrees with that meme so much that he launched a new blog directory to show how wrong it is.)

The enemy of projects, especially personal projects, is guilt.

The more projects you have, the more guilty you feel about working on any one of them - because you're not working on the others, and those projects haven't yet achieved their goals.

You have to overcome guilt if you're going to work on 185 projects at once!

This is the most important tip: avoid side projects with user accounts.

If you build something that people can sign into, that's not a side-project, it's an unpaid job. It's a very big responsibility, avoid at all costs!

Almost all of my projects right now are open source things that people can run on their own machines, because that's about as far away from user accounts as I can get.

I still have a responsibility for shipping security updates and things like that, but at least I'm not holding onto other people's data for them.

I feel like if your project is tested and documented, you have nothing to feel guilty about.

You have put a thing out into the world, and it has tests to show that it works, and it has documentation that explains what it is.

This means I can step back and say that it's OK for me to work on other things. That thing there is a unit that makes sense to people.

That's what I tell myself anyway! It's OK to have 185 projects provided they all have documentation and they all have tests.

Do that and the guilt just disappears. You can live guilt free!

You can follow me on Mastodon at @simon@simonwillison.net or on GitHub at github.com/simonw. Or subscribe to my blog at simonwillison.net!

From the Q&A:

You've tweeted about using GitHub Projects. Could you talk about that?
- GitHub Projects V2 is the perfect TODO list for me, because it lets me bring together issues from different repositories. I use a project called "Everything" on a daily basis (it's my browser default window) - I add issues to it that I plan to work on, including personal TODO list items as well as issues from my various public and private repositories. It's kind of like a cross between Trello and Airtable and I absolutely love it.
How did you move notes from the private to the public repo?
- GitHub doesn't let you do this. But there's a trick I use involving a temp repo which I switch between public and private to help transfer notes. More in this TIL.
Question about the perfect commit: do you commit your failing tests?
- I don't: I try to keep the commits that land on my main branch always passing. I'll sometimes write the failing test before the implementation and then commit them together. For larger projects I'll work in a branch and then squash-merge the final result into a perfect commit to main later on.

Tags: djangocon, documentation, productivity, my-talks, testing, annotated-talks, github-issues

The Perfect Commit

2022-10-29T20:41:01+00:00

For the last few years I've been trying to center my work around creating what I consider to be the Perfect Commit. This is a single commit that contains all of the following:

The implementation: a single, focused change
Tests that demonstrate the implementation works
Updated documentation reflecting the change
A link to an issue thread providing further context

Our job as software engineers generally isn't to write new software from scratch: we spend the majority of our time adding features and fixing bugs in existing software.

The commit is our principle unit of work. It deserves to be treated thoughtfully and with care.

Update 26th November 2022: My 25 minute talk Massively increase your productivity on personal projects with comprehensive documentation and automated tests describes this approach to software development in detail.

Implementation

Each commit should change a single thing.

The definition of "thing" here is left deliberately vague!

The goal is have something that can be easily reviewed, and that can be clearly understood in the future when revisited using tools like git blame or git bisect.

I like to keep my commit history linear, as I find that makes it much easier to comprehend later. This further reinforces the value of each commit being a single, focused change.

Atomic commits are also much easier to cleanly revert if something goes wrong - or to cherry-pick into other branches.

For things like web applications that can be deployed to production, a commit should be a unit that can be deployed. Aiming to keep the main branch in a deployable state is a good rule of thumb for deciding if a commit is a sensible atomic change or not.

Tests

The ultimate goal of tests is to increase your productivity. If your testing practices are slowing you down, you should consider ways to improve them.

In the longer term, this productivity improvement comes from gaining the freedom to make changes and stay confident that your change hasn't broken something else.

But tests can help increase productivity in the immediate short term as well.

How do you know when the change you have made is finished and ready to commit? It's ready when the new tests pass.

I find this reduces the time I spend second-guessing myself and questioning whether I've done enough and thought through all of the edge cases.

Without tests, there's a very strong possibility that your change will have broken some other, potentially unrelated feature. Your commit could be held up by hours of tedious manual testing. Or you could YOLO it and learn that you broke something important later!

Writing tests becomes far less time consuming if you already have good testing practices in place.

Adding a new test to a project with a lot of existing tests is easy: you can often find an existing test that has 90% of the pattern you need already worked out for you.

If your project has no tests at all, adding a test for your change will be a lot more work.

This is why I start every single one of my projects with a passing test. It doesn't matter what this test is - assert 1 + 1 == 2 is fine! The key thing is to get a testing framework in place, such that you can run a command (for me that's usually pytest) to execute the test suite - and you have an obvious place to add new tests in the future.

I use these cookiecutter templates for almost all of my new projects. They configure a testing framework with a single passing test and GitHub Actions workflows to exercise it all from the very start.

I'm not a huge advocate of test-first development, where tests are written before the code itself. What I care about is tests-included development, where the final commit bundles the tests and the implementation together. I wrote more about my approach to testing in How to cheat at unit tests with pytest and Black.

Documentation

If your project defines APIs that are meant to be used outside of your project, they need to be documented. In my work these projects are usually one of the following:

Python APIs (modules, functions and classes) that provide code designed to be imported into other projects.
Web APIs - usually JSON over HTTP these days - that provide functionality to be consumed by other applications.
Command line interface tools, such as those implemented using Click or Typer or argparse.

It is critical that this documentation must live in the same repository as the code itself.

This is important for a number of reasons.

Documentation is only valuable if people trust it. People will only trust it if they know that it is kept up to date.

If your docs live in a separate wiki somewhere it's easy for them to get out of date - but more importantly it's hard for anyone to quickly confirm if the documentation is being updated in sync with the code or not.

Documentation should be versioned. People need to be able to find the docs for the specific version of your software that they are using. Keeping it in the same repository as the code gives you synchronized versioning for free.

Documentation changes should be reviewed in the same way as your code. If they live in the same repository you can catch changes that need to be reflected in the documentation as part of your code review process.

And ideally, documentation should be tested. I wrote about my approach to doing this using Documentation unit tests. Executing example code in the documentation using a testing framework is a great idea too.

As with tests, writing documentation from scratch is much more work than incrementally modifying existing documentation.

Many of my commits include documentation that is just a sentence or two. This doesn't take very long to write, but it adds up to something very comprehensive over time.

How about end-user facing documentation? I'm still figuring that out myself. I created my shot-scraper tool to help automate the process of keeping screenshots up-to-date, but I've not yet found personal habits and styles for end-user documentation that I'm confident in.

A link to an issue

Every perfect commit should include a link to an issue thread that accompanies that change.

Sometimes I'll even open an issue seconds before writing the commit message, just to give myself something I can link to from the commit itself!

The reason I like issue threads is that they provide effectively unlimited space for commentary and background for the change that is being made.

Most of my issue threads are me talking to myself - sometimes with dozens of issue comments, all written by me.

Things that can go in an issue thread include:

Background: the reason for the change. I try to include this in the opening comment.
State of play before the change. I'll often link to the current version of the code and documentation. This is great for if I return to an open issue a few days later, as it saves me from having to repeat that initial research.
Links to things. So many links! Inspiration for the change, relevant documentation, conversations on Slack or Discord, clues found on StackOverflow.
Code snippets illustrating potential designs and false-starts. Use ```python ... ``` blocks to get syntax highlighting in your issue comments.
Decisions. What did you consider? What did you decide? As programmers we make hundreds of tiny decisions a day. Write them down! Then you'll never find yourself relitigating them in the future having forgotten your original reasoning.
Screenshots. What it looked like before, what it looked like after. Animated screenshots are even better! I use LICEcap to generate quick GIF screen captures or QuickTime to capture videos - both of which can be dropped straight into a GitHub issue comment.
Prototypes. I'll often paste a few lines of code copied from a Python console session. Sometimes I'll even paste in a block of HTML and CSS, or add a screenshot of a UI prototype.

After I've closed my issues I like to add one last comment that links to the updated documentation and ideally a live demo of the new feature.

An issue is more valuable than a commit message

I went through a several year phase of writing essays in my commit messages, trying to capture as much of the background context and thinking as possible.

My commit messages grew a lot shorter when I started bundling the updated documentation in the commit - since often much of the material I'd previously included in the commit message was now in that documentation instead.

As I extended my practice of writing issue threads, I found that they were a better place for most of this context than the commit messages themselves. They supported embedded media, were more discoverable and I could continue to extend them even after the commit had landed.

Today many of my commit messages are a single line summary and a link to an issue!

The biggest benefit of lengthy commit messages is that they are guaranteed to survive for as long as the repository itself. If you're going to use issue threads in the way I describe here it is critical that you consider their long term archival value.

I expect this to be controversial! I'm advocating for abandoning one of the core ideas of Git here - that each repository should incorporate a full, decentralized record of its history that is copied in its entirety when someone clones a repo.

I understand that philosophy. All I'll say here is that my own experience has been that dropping that requirement has resulted in a net increase in my overall productivity. Other people may reach a different conclusion.

If this offends you too much, you're welcome to construct an even more perfect commit that incorporates background information and additional context in an extended commit message as well.

One of the reasons I like GitHub Issues is that it includes a comprehensive API, which can be used to extract all of that data. I use my github-to-sqlite tool to maintain an ongoing archive of my issues and issue comments as a SQLite database file.

Not every commit needs to be "perfect"

I find that the vast majority of my work fits into this pattern, but there are exceptions.

Typo fix for some documentation or a comment? Just ship it, it's fine.

Bug fix that doesn't deserve documentation? Still bundle the implementation and the test plus a link to an issue, but no need to update the docs - especially if they already describe the expected bug-free behaviour.

Generally though, I find that aiming for implementation, tests, documentation and an issue link covers almost all of my work. It's a really good default model.

Write scrappy commits in a branch

If I'm writing more exploratory or experimental code it often doesn't make sense to work in this strict way. For those instances I'll usually work in a branch, where I can ship "WIP" commit messages and failing tests with abandon. I'll then squash-merge them into a single perfect commit (sometimes via a self-closed GitHub pull request) to keep my main branch as tidy as possible.

Some examples

Here are some examples of my commits that follow this pattern:

Upgrade Docker images to Python 3.11 for datasette #1853 - a pretty tiny change, but still includes tests, docs and an issue link.
sqlite-utils schema now takes optional tables for sqlite-utils #299
shot-scraper html command for shot-scraper #96
s3-credentials put-objects command for s3-credentials #68
Initial implementation for datasette-gunicorn #1 - this was the first commit to this repository, but I still bundled the tests, docs, implementation and a link to an issue.

Tags: code-review, definitions, documentation, git, github, software-engineering, testing, github-issues

Automating screenshots for the Datasette documentation using shot-scraper

2022-10-14T23:44:03+00:00

I released shot-scraper back in March as a tool for keeping screenshots in documentation up-to-date.

It's very easy for feature screenshots in documentation for a web application to drift out-of-date with the latest design of the software itself.

shot-scraper is a command-line tool that aims to solve this.

You can use it to take one-off screenshots like this:

shot-scraper https://latest.datasette.io/ --height 800

Or you can define multiple screenshots in a single YAML file - let's call this shots.yml:

- url: https://latest.datasette.io/
  height: 800
  output: index.png
- url: https://latest.datasette.io/fixtures
  height: 800
  output: database.png

And run them all at once like this:

shot-scraper multi shots.yml

This morning I used shot-scraper to replace all of the existing screenshots in the Datasette documentation with up-to-date, automated equivalents.

I decided to use this as an opportunity to create a more detailed tutorial for how to use shot-scraper for this kind of screenshot automation project.

Four screenshots to replace

Datasette's documentation included four screenshots that I wanted to replace with automated equivalents.

full_text_search.png illustrates the full-text search feature:

advanced_export.png displays Datasette's "advanced export" dialog:

binary_data.png displays just a small fragment of a table with binary download links:

facets.png demonstrates faceting against a table:

I'll walk through each screenshot in turn.

full_text_search.png

I decided to use a different example for the new screenshot, because I don't currently have a live instance for that table running against the most recent Datasette release.

I went with https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&_sort_desc=date - a search against the UK register of members interests for "hamper" (see Exploring the UK Register of Members Interests with SQL and Datasette).

The existing image in the documentation was 960 pixels wide, so I stuck with that and tried a few iterations until I found a height that I liked.

I installed shot-scraper and ran the following, in my /tmp directory:

shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&_sort_desc=date' \
  -h 585 \
  -w 960

This produced a register-of-members-interests-datasettes-com-regmem-items.png file which looked good when I opened it in Preview.

I turned that into the following YAML in my shots.yml file:

- url: https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&_sort_desc=date
  height: 585
  width: 960
  output: regmem-search.png

Running shot-scraper multi shots.yml against that file produced this regmem-search.png image:

advanced_export.png

This next image isn't a full page screenshot - it's just a small fragment of the page.

shot-scraper can take partial screenshots based on one or more CSS selectors. Given a CSS selector the tool draws a box around just that element and uses that to take the screenshot - adding optional padding.

Here's the recipe for the advanced export box - I used the same register-of-members-interests.datasettes.com example for it as this had enough rows to trigger all of the advanced options to be displayed:

shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper' \
  -s '#export' \
  -p 10

The -p 10 here specifies 10px of padding, needed to capture the drop shadow on the box.

Here's the equivalent YAML:

- url: https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper
  selector: "#export"
  output: advanced-export.png
  padding: 10

And the result:

binary_data.png

This screenshot required a different trick.

I wanted to take a screenshot of the table on this page.

The full table looks like this, with three rows:

I only wanted the first two of these to be shown in the screenshot though.

shot-scraper has the ability to execute JavaScript on the page before the screenshot is taken. This can be used to remove elements first.

Here's the JavaScript I came up with to remove all but the first two rows (actually the first three, because the table header counts as a row too):

Array.from(
  document.querySelectorAll('tr:nth-child(n+3)'),
  el => el.parentNode.removeChild(el)
);

I did it this way so that if I add any more rows to that test table in the future the code will still remove everything but the first two.

The CSS selector tr:nth-child(n+3) selects all rows that are not the first three (one header plus two content rows).

Here's how to run that from the command-line, and then take a 10 pixel padded screenshot of just the table on the page after it has been modified by the JavaScript:

shot-scraper 'https://latest.datasette.io/fixtures/binary_data' \
  -j 'Array.from(document.querySelectorAll("tr:nth-child(n+3)"), el => el.parentNode.removeChild(el));' \
  -s table -p 10

The YAML I added to shots.yml:

- url: https://latest.datasette.io/fixtures/binary_data
  selector: table
  javascript: |-
    Array.from(
      document.querySelectorAll('tr:nth-child(n+3)'),
      el => el.parentNode.removeChild(el)
    );
  padding: 10
  output: binary-data.png

And the resulting image:

facets.png

I left the most complex screenshot to last.

For the faceting screenshot, I wanted to include the "suggested facet" links at the top of the page, a set of active facets and then the first three rows of the following table.

But... the table has quite a lot of columns. For a neater screenshot I only wanted to include a subset of columns in the final shot.

Here's the screenshot I ended up taking:

And the YAML recipe:

- url: https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&_facet=party&_facet=state&_facet_size=10
  selectors_all:
  - .suggested-facets a
  - tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))
  padding: 10
  output: faceting-details.png

The key trick I'm using here is that selectors_all list.

The usual shot-scraper selector option finds the first element on the page matching the specified CSS selector and takes a screenshot of that.

--selector-all - or the YAML equivalent selectors_all - instead finds EVERY element that matches any of the specified selectors and draws a bounding box containing all of them.

I wanted that bounding box to surround a subset of the table cells on the page. I used this CSS selector to indicate that subset:

tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))

Here's what GPT-3 says if you ask it to explain the selector:

Explain this CSS selector:

tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))

This selector is selecting all table cells in rows that are not the fourth row or greater, and are not in columns that are the 11th column or greater.

(See also this TIL.)

Automating everything using GitHub Actions

Here's the full shots.yml YAML needed to generate all four of these screenshots:

- url: https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&_sort_desc=date
  height: 585
  width: 960
  output: regmem-search.png
- url: https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper
  selector: "#export"
  output: advanced-export.png
  padding: 10
- url: https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&_facet=party&_facet=state&_facet_size=10
  selectors_all:
  - .suggested-facets a
  - tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))
  padding: 10
  output: faceting-details.png
- url: https://latest.datasette.io/fixtures/binary_data
  selector: table
  javascript: |-
    Array.from(
      document.querySelectorAll('tr:nth-child(n+3)'),
      el => el.parentNode.removeChild(el)
    );
  padding: 10
  output: binary-data.png

Running shot-scraper shots shots.yml against this file takes all four screenshots.

But I want this to be fully automated! So I turned to GitHub Actions.

A while ago I created a template repository for setting up GitHub Actions to take screenshots using shot-scraper and write them back to the same repo. I wrote about that in Instantly create a GitHub repository to take screenshots of a web page.

I had previously used that recipe to create my datasette-screenshots repository - with its own shots.yml file.

So I added the new YAML to that existing file, committed the change, waited a minute and the result was all four images stored in that repository!

My datasette-screenshots workflow actually has two key changes from my default template. First, it takes every screenshot twice - once as a retina image and once as a regular image:

    - name: Take retina shots
      run: |
        shot-scraper multi shots.yml --retina
    - name: Take non-retina shots
      run: |
        mkdir -p non-retina
        cd non-retina
        shot-scraper multi ../shots.yml
        cd ..

This provides me with both a high quality image and a smaller, faster-loading image for each screenshot.

Secondly, it runs oxipng to optimize the PNGs before committing them to the repo:

    - name: Optimize PNGs
      run: |-
        oxipng -o 4 -i 0 --strip safe *.png
        oxipng -o 4 -i 0 --strip safe non-retina/*.png

The shot-scraper documentation describes this pattern in more detail.

With all of that in place, simply committing a change to the shots.yml file is enough to generate and store the new screenshots.

Linking to the images

One last problem to solve: I want to include these images in my documentation, which means I need a way to link to them.

I decided to use GitHub to host these directly, via the raw.githubusercontent.com domain - which is fronted by the Fastly CDN.

I care about up-to-date images, but I also want different versions of the Datasette documentation to reflect the corresponding design in their screenshots - so I needed a way to snapshot those screenshots to a known version.

Repository tags are one way to do this.

I tagged the datasette-screenshots repository with 0.62, since that's the version of Datasette that the screenshots were taken for.

This gave me the following URLs for the images:

To save on page loading time I decided to use the non-retina URLs for the two larger images.

Here's the commit that updated the Datasette documentation to link to these new images (and deleted the old images from the repo).

You can see the new images in the documentation on these pages:

Tags: documentation, datasette, github-actions, shot-scraper

Software engineering practices

2022-10-01T15:56:02+00:00

Gergely Orosz started a Twitter conversation asking about recommended "software engineering practices" for development teams.

(I really like his rejection of the term "best practices" here: I always feel it's prescriptive and misguiding to announce something as "best".)

I decided to flesh some of my replies out into a longer post.

Documentation in the same repo as the code

The most important characteristic of internal documentation is trust: do people trust that documentation both exists and is up-to-date?

If they don't, they won't read it or contribute to it.

The best trick I know of for improving the trustworthiness of documentation is to put it in the same repository as the code it documents, for a few reasons:

You can enforce documentation updates as part of your code review process. If a PR changes code in a way that requires documentation updates, the reviewer can ask for those updates to be included.
You get versioned documentation. If you're using an older version of a library you can consult the documentation for that version. If you're using the current main branch you can see documentation for that, without confusion over what corresponds to the most recent "stable" release.
You can integrate your documentation with your automated tests! I wrote about this in Documentation unit tests, which describes a pattern for introspecting code and then ensuring that the documentation at least has a section header that matches specific concepts, such as plugin hooks or configuration options.

Mechanisms for creating test data

When you work on large products, your customers will inevitably find surprising ways to stress or break your system. They might create an event with over a hundred different types of ticket for example, or an issue thread with a thousand comments.

These can expose performance issues that don't affect the majority of your users, but can still lead to service outages or other problems.

Your engineers need a way to replicate these situations in their own development environments.

One way to handle this is to provide tooling to import production data into local environments. This has privacy and security implications - what if a developer laptop gets stolen that happens to have a copy of your largest customer's data?

A better approach is to have a robust system in place for generating test data, that covers a variety of different scenarios.

You might have a button somewhere that creates an issue thread with a thousand fake comments, with a note referencing the bug that this helps emulate.

Any time a new edge case shows up, you can add a new recipe to that system. That way engineers can replicate problems locally without needing copies of production data.

Rock solid database migrations

The hardest part of large-scale software maintenance is inevitably the bit where you need to change your database schema.

(I'm confident that one of the biggest reasons NoSQL databases became popular over the last decade was the pain people had associated with relational databases due to schema changes. Of course, NoSQL database schema modifications are still necessary, and often they're even more painful!)

So you need to invest in a really good, version-controlled mechanism for managing schema changes. And a way to run them in production without downtime.

If you do not have this your engineers will respond by being fearful of schema changes. Which means they'll come up with increasingly complex hacks to avoid them, which piles on technical debt.

This is a deep topic. I mostly use Django for large database-backed applications, and Django has the best migration system I've ever personally experienced. If I'm working without Django I try to replicate its approach as closely as possible:

The database knows which migrations have already been applied. This means when you run the "migrate" command it can run just the ones that are still needed - important for managing multiple databases, e.g. production, staging, test and development environments.
A single command that applies pending migrations, and updates the database rows that record which migrations have been run.
Optional: rollbacks. Django migrations can be rolled back, which is great for iterating in a development environment but using that in production is actually quite rare: I'll often ship a new migration that reverses the change instead rather than using a rollback, partly to keep the record of the mistake in version control.

Even harder is the challenge of making schema changes without any downtime. I'm always interested in reading about new approaches for this - GitHub's gh-ost is a neat solution for MySQL.

An interesting consideration here is that it's rarely possible to have application code and database schema changes go out at the exact same instance in time. As a result, to avoid downtime you need to design every schema change with this in mind. The process needs to be:

Design a new schema change that can be applied without changing the application code that uses it.
Ship that change to production, upgrading your database while keeping the old code working.
Now ship new application code that uses the new schema.
Ship a new schema change that cleans up any remaining work - dropping columns that are no longer used, for example.

This process is a pain. It's difficult to get right. The only way to get good at it is to practice it a lot over time.

My rule is this: schema changes should be boring and common, as opposed to being exciting and rare.

Templates for new projects and components

If you're working with microservices, your team will inevitably need to build new ones.

If you're working in a monorepo, you'll still have elements of your codebase with similar structures - components and feature implementations of some sort.

Be sure to have really good templates in place for creating these "the right way" - with the right directory structure, a README and a test suite with a single, dumb passing test.

I like to use the Python cookiecutter tool for this. I've also used GitHub template repositories, and I even have a neat trick for combining the two.

These templates need to be maintained and kept up-to-date. The best way to do that is to make sure they are being used - every time a new project is created is a chance to revise the template and make sure it still reflects the recommended way to do things.

Automated code formatting

This one's easy. Pick a code formatting tool for your language - like Black for Python or Prettier for JavaScript (I'm so jealous of how Go has gofmt built in) - and run its "check" mode in your CI flow.

Don't argue with its defaults, just commit to them.

This saves an incredible amount of time in two places:

As an individual, you get back all of that mental energy you used to spend thinking about the best way to format your code and can spend it on something more interesting.
As a team, your code reviews can entirely skip the pedantic arguments about code formatting. Huge productivity win!

Tested, automated process for new development environments

The most painful part of any software project is inevitably setting up the initial development environment.

The moment your team grows beyond a couple of people, you should invest in making this work better.

At the very least, you need a documented process for creating a new environment - and it has to be known-to-work, so any time someone is onboarded using it they should be encouraged to fix any problems in the documentation or accompanying scripts as they encounter them.

Much better is an automated process: a single script that gets everything up and running. Tools like Docker have made this a LOT easier over the past decade.

I'm increasingly convinced that the best-in-class solution here is cloud-based development environments. The ability to click a button on a web page and have a fresh, working development environment running a few seconds later is a game-changer for large development teams.

Gitpod and Codespaces are two of the most promising tools I've tried in this space.

I've seen developers lose hours a week to issues with their development environment. Eliminating that across a large team is the equivalent of hiring several new full-time engineers!

Automated preview environments

Reviewing a pull request is a lot easier if you can actually try out the changes.

The best way to do this is with automated preview environments, directly linked to from the PR itself.

These are getting increasingly easy to offer. Vercel, Netlify, Render and Heroku all have features that can do this. Building a custom system on top of something like Google Cloud Run or Fly Machines is also possible with a bit of work.

This is another one of those things which requires some up-front investment but will pay itself off many times over through increased productivity and quality of reviews.

Tags: documentation, software-engineering, testing, version-control, zero-downtime, github-actions, technical-debt, gergely-orosz

Writing better release notes

2022-01-31T20:13:50+00:00

Release notes are an important part of the open source process. I've been thinking about these a lot recently, and I've assembled some thoughts on how to do a better job with them.

Write release notes. Seriously - if you want people to take advantage of the work you have been doing to improve your projects, you need to tell them about it!
Include the date. The date matters a lot, because I want to be able to determine how old a release is - especially important for the dependencies I am using.

It’s much more reasonable to assume people will be running as a minimum a version from 3 years ago than one that came out last week.
Make sure people can link to the release notes for a version. This can be a page-per-release or an anchor target on the releases page, but it needs to be possible and it needs to be discoverable.

There are some projects for which I have to view source on the HTML page to find the anchor links for the version headers - don’t make me do that!
For larger releases, break them up with headers. “New features” v.s. “Bug fixes” is a useful distinction.
Emphasize the highlights. It can be easy for the highlights of a larger release to get lost in a sea of bullet points. I’ll sometimes include an introductory paragraph highlighting the major themes.
Link to relevant documentation from the release notes. If I want to know more about a new feature that should be the best place for me to start.
Flesh them out with examples and screenshots. Most release notes are pretty dry - there’s no space limit on these things, so feel free to use all of the tools at your disposal to best answer the question “what has changed and why does this matter to me?”
Link to the associated issue thread. If I want to know more about a feature, and you’ve been using issues effectively, the issue link can tell me the entire story of the new feature in more detail than anything else.
Credit your contributors! If someone helped build a feature the release notes are a great place to give them a shout out.
Once shipped, let people know. I mainly use Twitter for this, but I also write about releases on my blog (in my weeknotes) and push out news about major releases through my project newsletter.

When I tweet my release notes (recent example) I include both a link to them and, if they’re short, a screenshot of the release notes with an alt= attribute duplicating their content.

GitHub Releases and GitHub Actions

I really like the GitHub releases feature. You can easily create new release attached to tags, and each release gets its own linkable page. Release notes are written in Markdown and you can edit them later on, expanding them further and fixing any typos or errors.

Releases pages also automatically link to a zip or .tar.gz file of your repository at that tag, and you can attach binary builds to that page too.

Plus they have a good API, and they integrate well with GitHub Actions.

I manage all of my Python package releases using this - I have an actions workflow which triggers on a new GitHub release, builds and packages that tag and then uploads it to PyPI - so all I have to do is click “new release” and fill in the form and my automation does the rest of the work.

I use the GitHub API to show links to my most recent releases on both the Datasette homepage and my personal GitHub profile.

Annotated release notes

Something I’ve been trying with my own projects is publishing annotated release notes to accompany the official ones.

The idea here is that while the official release notes succinctly document “what changed”, the annotated ones are a place where I can provide personal notes about the new features - the background on them, what I learned along the way and my own opinions on what they are useful for and why.

I enjoy writing these, but I’ve not yet got a great feel for if people find them useful or not - so I’m not ready to recommend them as a thing that other projects should aim to replicate. I like them though, so if you write them for your project I will look forward to reading them!

Some examples

I'm a huge fan of Django's release notes - they're some of the best I've ever seen. It's worth exploring both minor releases and major releases for an example of what this looks like when it's done really well.

For my own projects, I put the most effort into the release notes for Datasette and for sqlite-utils. Both of those projects have a dedicated page in their documentation.

sqlite-utils 3.10 and Datasette 0.44 are some of my better examples.

I also write about them on my blog. Here's the full series of annotated release notes for Datasette, and my write-ups of new features added to sqlite-utils.

My smaller projects use GitHub Releases rather than having a dedicated page in their documentation. datasette-graphql 0.10 and s3-credentials 0.9 are two good examples there.

Tags: open-source, releasenotes, writing

How I build a feature

2022-01-12T18:10:17+00:00

I'm maintaining a lot of different projects at the moment. I thought it would be useful to describe the process I use for adding a new feature to one of them, using the new sqlite-utils create-database command as an example.

I like each feature to be represented by what I consider to be the perfect commit - one that bundles together the implementation, the tests, the documentation and a link to an external issue thread.

Update 29th October 2022: I wrote more about the perfect commit.

The sqlite-utils create-database command is very simple: it creates a new, empty SQLite database file. You use it like this:

% sqlite-utils create-database empty.db

Everything starts with an issue

Every piece of work I do has an associated issue. This acts as ongoing work-in-progress notes and lets me record decisions, reference any research, drop in code snippets and sometimes even add screenshots and video - stuff that is really helpful but doesn't necessarily fit in code comments or commit messages.

Even if it's a tiny improvement that's only a few lines of code, I'll still open an issue for it - sometimes just a few minutes before closing it again as complete.

Any commits that I create that relate to an issue reference the issue number in their commit message. GitHub does a great job of automatically linking these together, bidirectionally so I can navigate from the commit to the issue or from the issue to the commit.

Having an issue also gives me something I can link to from my release notes.

In the case of the create-database command, I opened this issue in November when I had the idea for the feature.

I didn't do the work until over a month later - but because I had designed the feature in the issue comments I could get started on the implementation really quickly.

Development environment

Being able to quickly spin up a development environment for a project is crucial. All of my projects have a section in the README or the documentation describing how to do this - here's that section for sqlite-utils.

On my own laptop each project gets a directory, and I use pipenv shell in that directory to activate a directory-specific virtual environment, then pip install -e '.[test]' to install the dependencies and test dependencies.

Automated tests

All of my features are accompanied by automated tests. This gives me the confidence to boldly make changes to the software in the future without fear of breaking any existing features.

This means that writing tests needs to be as quick and easy as possible - the less friction here the better.

The best way to make writing tests easy is to have a great testing framework in place from the very beginning of the project. My cookiecutter templates (python-lib, datasette-plugin and click-app) all configure pytest and add a tests/ folder with a single passing test, to give me something to start adding tests to.

I can't say enough good things about pytest. Before I adopted it, writing tests was a chore. Now it's an activity I genuinely look forward to!

I'm not a religious adherent to writing the tests first - see How to cheat at unit tests with pytest and Black for more thoughts on that - but I'll write the test first if it's pragmatic to do so.

In the case of create-database, writing the test first felt like the right thing to do. Here's the test I started with:

def test_create_database(tmpdir):
    db_path = tmpdir / "test.db"
    assert not db_path.exists()
    result = CliRunner().invoke(
        cli.cli, ["create-database", str(db_path)]
    )
    assert result.exit_code == 0
    assert db_path.exists()

This test uses the tmpdir pytest fixture to provide a temporary directory that will be automatically cleaned up by pytest after the test run finishes.

It checks that the test.db file doesn't exist yet, then uses the Click framework's CliRunner utility to execute the create-database command. Then it checks that the command didn't throw an error and that the file has been created.

The I run the test, and watch it fail - because I haven't built the feature yet!

% pytest -k test_create_database

============ test session starts ============
platform darwin -- Python 3.8.2, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/simon/Dropbox/Development/sqlite-utils
plugins: cov-2.12.1, hypothesis-6.14.5
collected 808 items / 807 deselected / 1 selected                           

tests/test_cli.py F                                                   [100%]

================= FAILURES ==================
___________ test_create_database ____________

tmpdir = local('/private/var/folders/wr/hn3206rs1yzgq3r49bz8nvnh0000gn/T/pytest-of-simon/pytest-659/test_create_database0')

    def test_create_database(tmpdir):
        db_path = tmpdir / "test.db"
        assert not db_path.exists()
        result = CliRunner().invoke(
            cli.cli, ["create-database", str(db_path)]
        )
>       assert result.exit_code == 0
E       assert 1 == 0
E        +  where 1 = <Result SystemExit(1)>.exit_code

tests/test_cli.py:2097: AssertionError
========== short test summary info ==========
FAILED tests/test_cli.py::test_create_database - assert 1 == 0
===== 1 failed, 807 deselected in 0.99s ====

The -k option lets me run any test that match the search string, rather than running the full test suite. I use this all the time.

Other pytest features I often use:

pytest -x: runs the entire test suite but quits at the first test that fails
pytest --lf: re-runs any tests that failed during the last test run
pytest --pdb -x: open the Python debugger at the first failed test (omit the -x to open it at every failed test). This is the main way I interact with the Python debugger. I often use this to help write the tests, since I can add assert False and get a shell inside the test to interact with various objects and figure out how to best run assertions against them.

Implementing the feature

Test in place, it's time to implement the command. I added this code to my existing cli.py module:

@cli.command(name="create-database")
@click.argument(
    "path",
    type=click.Path(file_okay=True, dir_okay=False, allow_dash=False),
    required=True,
)
def create_database(path):
    "Create a new empty database file."
    db = sqlite_utils.Database(path)
    db.vacuum()

(I happen to know that the quickest way to create an empty SQLite database file is to run VACUUM against it.)

The test now passes!

I iterated on this implementation a little bit more, to add the --enable-wal option I had designed in the issue comments - and updated the test to match. You can see the final implementation in this commit: 1d64cd2e5b402ff957f9be2d9bb490d313c73989.

If I add a new test and it passes the first time, I’m always suspicious of it. I’ll deliberately break the test (change a 1 to a 2 for example) and run it again to make sure it fails, then change it back again.

Code formatting with Black

Black has increased my productivity as a Python developer by a material amount. I used to spend a whole bunch of brain cycles agonizing over how to indent my code, where to break up long function calls and suchlike. Thanks to Black I never think about this at all - I instinctively run black . in the root of my project and accept whatever style decisions it applies for me.

Linting

I have a few linters set up to run on every commit. I can run these locally too - how to do that is documented here - but I'm often a bit lazy and leave them to run in CI.

In this case one of my linters failed! I accidentally called the new command function create_table() when it should have been called create_database(). The code worked fine due to how the cli.command(name=...) decorator works but mypy complained about the redefined function name. I fixed that in a separate commit.

Documentation

My policy these days is that if a feature isn't documented it doesn't exist. Updating existing documentation isn't much work at all if the documentation already exists, and over time these incremental improvements add up to something really comprehensive.

For smaller projects I use a single README.md which gets displayed on both GitHub and PyPI (and the Datasette website too, for example on datasette.io/tools/git-history).

My larger projects, such as Datasette and sqlite-utils, use Read the Docs and reStructuredText with Sphinx instead.

I like reStructuredText mainly because it has really good support for internal reference links - something that is missing from Markdown, though it can be enabled using MyST.

sqlite-utils uses Sphinx. I have the sphinx-autobuild extension configured, which means I can run a live reloading server with the documentation like so:

cd docs
make livehtml

Any time I'm working on the documentation I have that server running, so I can hit "save" in VS Code and see a preview in my browser a few seconds later.

For Markdown documentation I use the VS Code preview pane directly.

The moment the documentation is live online, I like to add a link to it in a comment on the issue thread.

Committing the change

I run git diff a LOT while hacking on code, to make sure I haven’t accidentally changed something unrelated. This also helps spot things like rogue print() debug statements I may have added.

Before my final commit, I sometimes even run git diff | grep print to check for those.

My goal with the commit is to bundle the test, documentation and implementation. If those are the only files I've changed I do this:

git commit -a -m "sqlite-utils create-database command, closes #348"

If this completes the work on the issue I use "closes #N", which causes GitHub to close the issue for me. If it's not yet ready to close I use "refs #N" instead.

Sometimes there will be unrelated changes in my working directory. If so, I use git add <files> and then commit just with git commit -m message.

Branches and pull requests

create-database is a good example of a feature that can be implemented in a single commit, with no need to work in a branch.

For larger features, I'll work in a feature branch:

git checkout -b my-feature

I'll make a commit (often just labelled "WIP prototype, refs #N") and then push that to GitHub and open a pull request for it:

git push -u origin my-feature

I ensure the new pull request links back to the issue in its description, then switch my ongoing commentary to comments on the pull request itself.

I'll sometimes add a task checklist to the opening comment on the pull request, since tasks there get reflected in the GitHub UI anywhere that links to the PR. Then I'll check those off as I complete them.

An example of a PR I used like this is #361: --lines and --text and --convert and --import.

I don't like merge commits - I much prefer to keep my main branch history as linear as possible. I usually merge my PRs through the GitHub web interface using the squash feature, which results in a single, clean commit to main with the combined tests, documentation and implementation. Occasionally I will see value in keeping the individual commits, in which case I will rebase merge them.

Another goal here is to keep the main branch releasable at all times. Incomplete work should stay in a branch. This makes turning around and releasing quick bug fixes a lot less stressful!

Release notes, and a release

A feature isn't truly finished until it's been released to PyPI.

All of my projects are configured the same way: they use GitHub releases to trigger a GitHub Actions workflow which publishes the new release to PyPI. The sqlite-utils workflow for that is here in publish.yml.

My cookiecutter templates for new projects set up this workflow for me. I just need to create a PyPI token for the project and assign it as a repository secret. See the python-lib cookiecutter README for details.

To push out a new release, I need to increment the version number in setup.py and write the release notes.

I use semantic versioning - a new feature is a minor version bump, a breaking change is a major version bump (I try very hard to avoid these) and a bug fix or documentation-only update is a patch increment.

Since create-database was a new feature, it went out in release 3.21.

My projects that use Sphinx for documentation have changelog.rst files in their repositories. I add the release notes there, linking to the relevant issues and cross-referencing the new documentation. Then I ship a commit that bundles the release notes with the bumped version number, with a commit message that looks like this:

git commit -m "Release 3.21

Refs #348, #364, #366, #368, #371, #372, #374, #375, #376, #379"

Here's the commit for release 3.21.

Referencing the issue numbers in the release automatically adds a note to their issue threads indicating the release that they went out in.

I generate that list of issue numbers by pasting the release notes into an Observable notebook I built for the purpose: Extract issue numbers from pasted text. Observable is really great for building this kind of tiny interactive utility.

For projects that just have a README I write the release notes in Markdown and paste them directly into the GitHub "new release" form.

I like to duplicate the release notes to GiHub releases for my Sphinx changelog projects too. This is mainly so the datasette.io website will display the release notes on its homepage, which is populated at build time using the GitHub GraphQL API.

To convert my reStructuredText to Markdown I copy and paste the rendered HTML into this brilliant Paste to Markdown tool by Euan Goddard.

A live demo

When possible, I like to have a live demo that I can link to.

This is easiest for features in Datasette core. Datesette’s main branch gets deployed automatically to latest.datasette.io so I can often link to a demo there.

For Datasette plugins, I’ll deploy a fresh instance with the plugin (e.g. this one for datasette-graphql) or (more commonly) add it to my big latest-with-plugins.datasette.io instance - which tries to demonstrate what happens to Datasette if you install dozens of plugins at once (so far it works OK).

Here’s a demo of the datasette-copyable plugin running there: https://latest-with-plugins.datasette.io/github/commits.copyable

Tell the world about it

The last step is to tell the world (beyond the people who meticulously read the release notes) about the new feature.

Depending on the size of the feature, I might do this with a tweet like this one - usually with a screenshot and a link to the documentation. I often extend this into a short Twitter thread, which gives me a chance to link to related concepts and demos or add more screenshots.

For larger or more interesting feature I'll blog about them. I may save this for my weekly weeknotes, but sometimes for particularly exciting features I'll write up a dedicated blog entry. Some examples include:

I may even assemble a full set of annotated release notes on my blog, where I quote each item from the release in turn and provide some fleshed out examples plus background information on why I built it.

If it’s a new Datasette (or Datasette-adjacent) feature, I’ll try to remember to write about it in the next edition of the Datasette Newsletter.

Finally, if I learned a new trick while building a feature I might extract that into a TIL. If I do that I'll link to the new TIL from the issue thread.

More examples of this pattern

Here are a bunch of examples of commits that implement this pattern, combining the tests, implementation and documentation into a single unit:

sqlite-utils: adding —limit and —offset to sqlite-utils rows
sqlite-utils: --where and -p options for sqlite-utils convert
s3-credentials: s3-credentials policy command
datasette: db.execute_write_script() and db.execute_write_many()
datasette: ?_nosuggest=1 parameter for table views
datasette-graphql: GraphQL execution limits: time_limit_ms and num_queries_limit

Tags: git, github, software-engineering, testing, pytest, black, read-the-docs, github-issues

How to build, test and publish an open source Python library

2021-11-04T22:02:03+00:00

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

The video

PyGotham arrange for sign language interpretation for all of their talks, which is really cool. Since those take up a portion of the screen (and YouTube don't yet have a way to apply them as a different layer) I've also made available a copy of the original video without the sign language.

Update 29th July 2022: There is a more modern way to create Python packages that uses pyproject.toml files instead of setup.py. This tutorial on Packaging Python Projects from the PyPA shows how to do this in detail.

Packaging a single module

I used this code which I've been copying and pasting between my own projects for over a decade.

BaseConverter is a simple class that can convert an integer to a shortened character string and back again, for example:

>>> pid = BaseConverter("bcdfghkmpqrtwxyz")
>>> pid.from_int(1234)
"gxd"
>>> pid.to_int("gxd")
1234

To turn this into a library, first I created a pids/ directory (I've chosen that name because it's available on PyPI).

To turn that code into a package:

mkdir pids && cd pids
Create pids.py in that directory with the contents of this file

Create a new setup.py file in that folder containing the following:

from setuptools import setup

setup(
    name="pids",
    version="0.1",
    description="A tiny Python library for generating public IDs from integers",
    author="Simon Willison",
    url="https://github.com/simonw/...",
    license="Apache License, Version 2.0",    
    py_modules=["pids"],
)

Run python3 setup.py sdist to create the packaged source distribution, dist/pids-0.1.tar.gz

(I've since learned that it's better to run python3 -m build here instead, see Why you shouldn't invoke setup.py directly).

This is all it takes: a setup.py file with some metadata, then a single command to turn that into a packaged .tar.gz file.

Testing it in a Jupyter notebook

Having created that file, I demonstrated how it can be tested in a Jupyter notebook.

Jupyter has a %pip magic command which runs pip to install a package into the same environment as the current Jupyter kernel:

%pip install /Users/simon/Dropbox/Presentations/2021/pygotham/pids/dist/pids-0.1.tar.gz

Having done this, I could excute the library like so:

>>> import pids
>>> pids.pid.from_int(1234)
'gxd'

Uploading the package to PyPI

I used twine (pip install twine) to upload my new package to PyPI.

You need to create a PyPI account before running this command.

By default you need to paste in the PyPI account's username and password:

% twine upload dist/pids-0.1.tar.gz
Uploading distributions to https://upload.pypi.org/legacy/
Enter your username: simonw
Enter your password: 
Uploading pids-0.1.tar.gz
100%|██████████████████████████████████████| 4.16k/4.16k [00:00<00:00, 4.56kB/s]

View at:
https://pypi.org/project/pids/0.1/

The release is now live at https://pypi.org/project/pids/0.1/ - and anyone can run pip install pids to install it.

Adding documentation

If you visit the 0.1 release page you'll see the following message:

The author of this package has not provided a project description

To fix this I added a README.md file with some basic documentation, written in Markdown:

# pids

Create short public identifiers based on integer IDs.

## Installation

    pip install pids

## Usage

    from pids import pid
    public_id = pid.from_int(1234)
    # public_id is now "gxd"
    id = pid.to_int("gxd")
    # id is now 1234

Then I modified my setup.py file to look like this:

from setuptools import setup
import os

def get_long_description():
    with open(
        os.path.join(os.path.dirname(__file__), "README.md"),
        encoding="utf8",
    ) as fp:
        return fp.read()

setup(
    name="pids",
    version="0.1.1",
    long_description=get_long_description(),
    long_description_content_type="text/markdown",
    description="A tiny Python library for generating public IDs from integers",
    author="Simon Willison",
    url="https://github.com/simonw/...",
    license="Apache License, Version 2.0",    
    py_modules=["pids"],
)

The get_long_description() function reads that README.md file into a Python string.

The following two extra arguments to setup() add that as metadata visible to PyPI:

long_description=get_long_description(),
long_description_content_type="text/markdown",

I also updated the version number to 0.1.1 in preparation for a new release.

Running python3 setup.py sdist created a new file called dist/pids-0.1.1.tar.gz - I then uploaded that file using twine upload dist/pids-0.1.1.tar.gz which created a new release with a visible README at https://pypi.org/project/pids/0.1.1/

Adding some tests

I like using pytest for tests, so I added that as a test dependency by modifying setup.py to add the following line:

    extras_require={"test": ["pytest"]},

Next, I created a virtual environment and installed my package and its test dependencies into an "editable" mode like so:

# Create and activate environment
python3 -m venv venv
source venv/bin/activate
# Install editable module, plus pytest
pip install -e '.[test]'

Now I can run the tests!

(venv) pids % pytest
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /Users/simon/Dropbox/Presentations/2021/pygotham/pids
collected 0 items                                                              

============================ no tests ran in 0.01s =============================

There aren't any tests yet. I created a tests/ folder and dropped in a test_pids.py file that looked like this:

import pytest
import pids

def test_from_int():
    assert pids.pid.from_int(1234) == "gxd"

def test_to_int():
    assert pids.pid.to_int("gxd") == 1234

Running pytest in the project directory now runs those tests:

(venv) pids % pytest
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /Users/simon/Dropbox/Presentations/2021/pygotham/pids
collected 2 items                                                              

tests/test_pids.py ..                                                    [100%]

============================== 2 passed in 0.01s ===============================

Creating a GitHub repository

Time to publish the source code on GitHub.

I created a repository using the form at https://github.com/new

Having created the simonw/pids repository, I ran the following commands locally to push my code to it (mostly copy and pasted from the GitHub example):

git init
git add README.md pids.py setup.py tests/test_pids.py
git commit -m "first commit"
git branch -M main
git remote add origin git@github.com:simonw/pids.git
git push -u origin main

Running the tests with GitHub Actions

I copied in a .github/workflows folder from another project with two files, test.yml and publish.yml. The .github/workflows/test.yml file contained this:

name: Test

on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.6, 3.7, 3.8, 3.9]
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
        restore-keys: |
          ${{ runner.os }}-pip-
    - name: Install dependencies
      run: |
        pip install -e '.[test]'
    - name: Run tests
      run: |
        pytest

The matrix block there causes the job to run four times, on four different versions of Python.

The action steps do the following:

Checkout the current repository
Install the specified Python version
Configure GitHub's action caching mechanism for the ~/.cache/pip directory - this avoids installing the same files from PyPI over the internet the next time the workflow runs
Install the test dependencies
Run the tests

I added and pushed these new files:

git add .github
git commit -m "GitHub Actions"
git push

The Actions tab in my repository instantly ran the test suite, and when it passed added a green checkmark to my commit.

Publishing a new release using GitHub

The .github/workflows/publish.yml workflow is triggered by new GitHub releases. It tests them and then, if the tests pass, publishes them up to PyPI using twine.

The workflow looks like this:

name: Publish Python Package

on:
  release:
    types: [created]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.6, 3.7, 3.8, 3.9]
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
        restore-keys: |
          ${{ runner.os }}-pip-
    - name: Install dependencies
      run: |
        pip install -e '.[test]'
    - name: Run tests
      run: |
        pytest
  deploy:
    runs-on: ubuntu-latest
    needs: [test]
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-publish-pip-${{ hashFiles('**/setup.py') }}
        restore-keys: |
          ${{ runner.os }}-publish-pip-
    - name: Install dependencies
      run: |
        pip install setuptools wheel twine
    - name: Publish
      env:
        TWINE_USERNAME: __token__
        TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
      run: |
        python setup.py sdist bdist_wheel
        twine upload dist/*

It contains two jobs: the test job the tests again - we should never publish a package without first ensuring that the test suite passes - and then the deploy job runs python setup.py sdist bdist_wheel followed by twine upload dist/* to upload the resulting packages.

(My latest version of this uses python3 -m build here instead.)

Before publishing a package with this action, I needed to create a PYPI_TOKEN that the action could use to authenticate with my PyPI account.

I used the https://pypi.org/manage/account/token/ page to create that token:

I copied out the newly created token:

Then I used the "Settings -> Secrets" tab on the GitHub repository to add that as a secret called PYPI_TOKEN:

(I have since revoked the token that I used in the video, since it is visible on screen to anyone watching.)

I used the GitHub web interface to edit setup.py to bump the version number in that file up to 0.1.2, then I navigated to the releases tab in the repository, clicked "Draft new release" and created a release that would create a new 0.1.2 tag as part of the release process.

When I published the release, the publish.yml action started to run. After the tests had passed it pushed the new release to PyPI: https://pypi.org/project/pids/0.1.2/

Bonus: cookiecutter templates

I've published a lot of packages using this process - 143 and counting!

Rather than copy and paste in a setup.py each time, a couple of years ago I switched over to using cookiecutter templates.

I have three templates that I use today:

python-lib for standalone Python libraries
datasette-plugin for Datasette plugins
click-app for CLI applications built using the Click package

Back in August I figured out a way to make these available as GitHub repository templates, which I described in Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions. This means you can create a new GitHub repository that implements the test.yml and publish.yml pattern described in this talk with just a few clicks on the GitHub website.

Tags: github, open-source, pypi, python, my-talks, github-actions, annotated-talks

Open source projects: consider running office hours

2021-02-19T21:54:22+00:00

Back in December I decided to try something new for my Datasette open source project: Datasette Office Hours. The idea is simple: anyone can book a 25 minute conversation with me on a Friday to talk about the project. I’m interested in talking to people who are using Datasette, or who are considering using it, or who just want to have a chat.

I’ve now had 35 conversations and it’s been absolutely fantastic. I’ve talked to people in Iceland, Burundi, Finland, Singapore, Bulgaria and dozens of other places around the world. I’ve seen my software applied to applications ranging from historic cemetery records to library collections to open city data. It’s been thrilling.

I’d like to encourage more open source project maintainers to consider doing something similar.

("Office hours" is a term used at some universities for periods of time when you can drop in to talk with a lecturer. In Germany they use the term "Sprechstunde" for "speaking hours" which I think is better!)

Reasons to do this

A challenge of open source is that it's easy to be starved of feedback. People might file bug reports if something breaks, but other than that it can feel like publishing software into a void.

Hearing directly from people who are using your stuff is incredibly motivational. It’s also an amazing source of ideas and feedback on where the project should go next.

In the startup world “talk to your users and potential customers” is advice that becomes a constant drumbeat… because it’s really effective, but it’s also hard to bring up the courage to do!

Talking to users in open source is similarly valuable. And it turns out, especially in these pandemic times, people really do want to talk to you. Office hours is an extremely low-friction way of putting up a sign that says “let’s have a conversation”.

How I do this: 25 minute slots, via Calendly

I’m using Calendly to make 20 minute slots available every Friday between 9am and 5pm Pacific Time (with 12:30-1:30 set aside for lunch), with a ten minute buffer between slots.

In practice, I treat these as 25 minute slots. This gives me five 5 minutes break in between conversations, and also means it’s possible to stretch to 30 minutes if we get to a key topic just before the time slot ends.

I configured Calendly to allow a maximum of five bookings on any Friday. This feels right to me - conversations with five different people can be pretty mentally tiring, and cutting off after five still gives me a good chance to get other work done during the day.

I use Calendly’s Zoom integration, which automatically sends out a calendar invite to both myself and my conversation partner and schedules a Zoom room that’s linked to from the invite. All I have to do is click the link at the appropriate time.

About one in fifteen conversations ends up cancelled. That’s completely fine - I get half an hour of my day back and we can usually reschedule for another week.

How about making some money?

I’ve been having some fascinating conversations on Twitter recently about the challenges of taking an open source project and turning it into a full-time job, earning a salary good enough to avoid the siren call of working for a FAANG company.

People pointed me to a few good examples of open source maintainers who charge for video conference consulting sessions - Graphile’s Benjie and CSS Wizardry's Harry Roberts both let you book paid sessions with them directly.

I really like this as an opportunity for earning money against an open source project, and I think it could complement office hours nicely: 25 minutes on a Friday free on a first-come, first-served basis could then up-sell to a 1.5 hours paid consulting session, which could then lead to larger consulting contracts.

Try this yourself

If you're tempted to try office hours for your own project, getting started is easy. I'm using the Calendly free plan, but their paid plans (which include the ability to attach Stripe or PayPal payments to bookings) are reasonably priced. I've been promoting my sessions via Twitter, the datasette.io website and the Datasette Newsletter.

This is one of those ideas I wish I'd had sooner. It's quickly become a highlight of my week.

Update 5th March 2021: The original headline of this piece was "Open source projects should run office hours". This was being misinterpreted as yet another demand for free labor from open source maintainers, so I changed it to "Open source projects: consider running office hours" - less pithy, but it better reflects my actual message here.

Tags: open-source, datasette

How to cheat at unit tests with pytest and Black

2020-02-11T06:56:55+00:00

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

The one thing I’ve learned about permissions code over the years is that it absolutely warrants comprehensive unit tests. This is not code that can afford to have dumb bugs, or regressions caused by future development!

I’ve become a big proponent of pytest over the past two years, but this is the first Django project that I’ve built using pytest from day one as opposed to relying on the Django test runner. It’s been a great opportunity to try out pytest-django, and I’m really impressed with it. It maintains my favourite things about Django’s test framework - smart usage of database transactions to reset the database and a handy test client object for sending fake HTTP requests - and adds all of that pytest magic that I’ve grown to love.

It also means I get to use my favourite trick for productively writing unit tests: the combination of pytest and Black, the “uncompromising Python code formatter”.

Cheating at unit tests

In pure test-driven development you write the tests first, and don’t start on the implementation until you’ve watched them fail.

Most of the time I find that this is a net loss on productivity. I tend to prototype my way to solutions, so I often find myself with rough running code before I’ve developed enough of a concrete implementation plan to be able to write the tests.

So… I cheat. Once I’m happy with the implementation I write the tests to match it. Then once I have the tests in place and I know what needs to change I can switch to using changes to the tests to drive the implementation.

In particular, I like using a rough initial implementation to help generate the tests in the first place.

Here’s how I do that with pytest. I’ll write a test that looks something like this:

def test_some_api(client):
    response = client.get("/some/api/")
    assert False == response.json()

Note that I’m using the pytest-django client fixture here, which magically passes a fully configured Django test client object to my test function.

I run this test, and it fails:

pytest -k test_some_api

(pytest -k blah runs just tests that contain blah in their name)

Now… I run the test again, but with the --pdb option to cause pytest to drop me into a debugger at the failure point:

$ pytest -k test_some_api --pdb
== test session starts ===
platform darwin -- Python 3.7.5, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
django: settings: config.test_settings (from ini)
...
client = <django.test.client.Client object at 0x10cfdb510>

    def test_some_api(client):
        response = client.get("/some/api/")
>       assert False == response.json()
E       assert False == {'this': ['is', 'an', 'example', 'api']}
core/test_docs.py:27: AssertionError
>> entering PDB >>
>> PDB post_mortem (IO-capturing turned off) >>
> core/test_docs.py(27)test_some_api()
-> assert False == response.json()
(Pdb) response.json()
{'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'}
(Pdb)

Running response.json() in the debugger dumps out the actual value to the console.

Then I copy that output - in this case {'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'} - and paste it into the test:

def test_some_api(client):
    response = client.get("/some/api/")
    assert {'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'} == response.json()

Finally, I run black . in my project root to reformat the test:

def test_some_api(client):
    response = client.get("/some/api/")
    assert {
        "this": ["is", "an", "example", "api"],
        "that_outputs": "JSON",
    } == response.json()

This last step means that no matter how giant and ugly the test comparison has become I’ll always get a neatly formatted test out of it.

I always eyeball the generated test to make sure that it’s what I would have written by hand if I wasn’t so lazy - then I commit it along with the implementation and move on to the next task.

I’ve used this technique to write many of the tests in both Datasette and sqlite-utils, and those are by far the best tested pieces of software I’ve ever released.

I started doing this around two years ago, and I’ve held off writing about it until I was confident I understood the downsides. I haven’t found any yet: I end up with a robust, comprehensive test suite and it takes me less than half the time to write the tests than if I’d been hand-crafting all of those comparisons from scratch.

Also this week

Working on Datasette Cloud has required a few minor releases to some of my open source projects:

Shipped datasette-auth-existing-cookies 0.6 and 0.6.1
Shipped sqlite-utils 2.2, 2.2.1, 2.3 and 2.3.1
Shipped Datasette 0.35 with a new utility method for plugins to render their own templates, which I’m now using in…
datasette-upload-csvs 0.2a - still very alpha, but at least it looks slightly nicer now

Unrelated to Datasette Cloud, I also shipped twitter-to-sqlite 0.16 with a new command for importing your Twitter friends (previously it only had a command for importing your followers).

In bad personal motivation news… I missed my weekly update to Niche Museums and lost my streak!

Tags: projects, python, testing, datasette, pytest, weeknotes, datasette-cloud, black

Documentation unit tests

2018-07-28T15:59:55+00:00

Or: Test-driven documentation.

Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.

One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.

When combined with a code review system (like Phabricator or GitHub pull requests) this pattern lets you enforce documentation updates as part of the review process: if a change doesn’t update the relevant documentation, point that out in your review!

Good code review systems also execute unit tests automatically and attach the results to the review. This provides an opportunity to have the tests enforce other aspects of the codebase: for example, running a linter so that no-one has to waste their time arguing over standardize coding style.

I’ve been experimenting with using unit tests to ensure that aspects of a project are covered by the documentation. I think it’s a very promising technique.

Introspect the code, introspect the docs

The key to this trick is introspection: interogating the code to figure out what needs to be documented, then parsing the documentation to see if each item has been covered.

I’ll use my Datasette project as an example. Datasette’s test_docs.py module contains three relevant tests:

test_config_options_are_documented checks that every one of Datasette’s configuration options are documented.
test_plugin_hooks_are_documented ensures all of the plugin hooks (powered by pluggy) are covered in the plugin documentation.
test_view_classes_are_documented iterates through all of the *View classes (corresponding to pages in the Datasette user interface) and makes sure they are covered.

In each case, the test uses introspection against the relevant code areas to figure out what needs to be documented, then runs a regular expression against the documentation to make sure it is mentioned in the correct place.

Obviously the tests can’t confirm the quality of the documentation, so they are easy to cheat: but they do at least protect against adding a new option but forgetting to document it.

Testing that Datasette’s view classes are covered

Datasette’s view classes use a naming convention: they all end in View. The current list of view classes is DatabaseView, TableView, RowView, IndexView and JsonDataView.

Since these classes are all imported into the datasette.app module (in order to be hooked up to URL routes) the easiest way to introspect them is to import that module, then run dir(app) and grab any class names that end in View. We can do that with a Python list comprehension:

from datasette import app
views = [v for v in dir(app) if v.endswith("View")]

I’m using reStructuredText labels to mark the place in the documentation that addresses each of these classes. This also ensures that each documentation section can be linked to, for example:

http://datasette.readthedocs.io/en/latest/pages.html#tableview

The reStructuredText syntax for that label looks like this:

.. _TableView:

Table
=====

The table page is the heart of Datasette...

We can extract these labels using a regular expression:

from pathlib import Path
import re

docs_path = Path(__file__).parent.parent / 'docs'
label_re = re.compile(r'\.\. _([^\s:]+):')

def get_labels(filename):
    contents = (docs_path / filename).open().read()
    return set(label_re.findall(contents))

Since Datasette’s documentation is spread across multiple *.rst files, and I want the freedom to document a view class in any one of them, I iterate through every file to find the labels and pull out the ones ending in View:

def documented_views():
    view_labels = set()
    for filename in docs_path.glob("*.rst"):
        for label in get_labels(filename):
            first_word = label.split("_")[0]
            if first_word.endswith("View"):
                view_labels.add(first_word)
    return view_labels

We now have a list of class names and a list of labels across all of our documentation. Writing a basic unit test comparing the two lists is trivial:

def test_view_documentation():
    view_labels = documented_views()
    view_classes = set(v for v in dir(app) if v.endswith("View"))
    assert view_labels == view_classes

Taking advantage of pytest

Datasette uses pytest for its unit tests, and documentation unit tests are a great opportunity to take advantage of some advanced pytest features.

Parametrization

The first of these is parametrization: pytest provides a decorator which can be used to execute a single test function multiple times, each time with different arguments.

This example from the pytest documentation shows how parametrization works:

import pytest
@pytest.mark.parametrize("test_input,expected", [
    ("3+5", 8),
    ("2+4", 6),
    ("6*9", 42),
])
def test_eval(test_input, expected):
    assert eval(test_input) == expected

pytest treats this as three separate unit tests, even though they share a single function definition.

We can combine this pattern with our introspection to execute an independent unit test for each of our view classes. Here’s what that looks like:

@pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
def test_view_classes_are_documented(view):
    assert view in documented_views()

Here’s the output from pytest if we execute just this unit test (and one of our classes is undocumented):

$ pytest -k test_view_classes_are_documented -v
=== test session starts ===
collected 249 items / 244 deselected

tests/test_docs.py::test_view_classes_are_documented[DatabaseView] PASSED [ 20%]
tests/test_docs.py::test_view_classes_are_documented[IndexView] PASSED [ 40%]
tests/test_docs.py::test_view_classes_are_documented[JsonDataView] PASSED [ 60%]
tests/test_docs.py::test_view_classes_are_documented[RowView] PASSED [ 80%]
tests/test_docs.py::test_view_classes_are_documented[TableView] FAILED [100%]

=== FAILURES ===

view = 'TableView'

    @pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
    def test_view_classes_are_documented(view):
>       assert view in documented_views()
E       AssertionError: assert 'TableView' in {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'}
E        +  where {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'} = documented_views()

tests/test_docs.py:77: AssertionError
=== 1 failed, 4 passed, 244 deselected in 1.13 seconds ===

Fixtures

There’s a subtle inefficiency in the above test: for every view class, it calls the documented_views() function - and that function then iterates through every *.rst file in the docs/ directory and uses a regular expression to extract the labels. With 5 view classes and 17 documentation files that’s 85 executions of get_labels(), and that number will only increase as Datasette’s code and documentation grow larger.

We can use pytest’s neat fixtures to reduce this to a single call to documented_views() that is shared across all of the tests. Here’s what that looks like:

@pytest.fixture(scope="session")
def documented_views():
    view_labels = set()
    for filename in docs_path.glob("*.rst"):
        for label in get_labels(filename):
            first_word = label.split("_")[0]
            if first_word.endswith("View"):
                view_labels.add(first_word)
    return view_labels

@pytest.mark.parametrize("view_class", [
    v for v in dir(app) if v.endswith("View")
])
def test_view_classes_are_documented(documented_views, view_class):
    assert view_class in documented_views

Fixtures in pytest are an example of dependency injection: pytest introspects every test_* function and checks if it has a function argument with a name matching something that has been annotated with the @pytest.fixture decorator. If it finds any matching arguments, it executes the matching fixture function and passes its return value in to the test function.

By default, pytest will execute the fixture function once for every test execution. In the above code we use the scope="session" argument to tell pytest that this particular fixture should be executed only once for every pytest command-line execution of the tests, and that single return value should be passed to every matching test.

What if you haven’t documented everything yet?

Adding unit tests to your documentation in this way faces an obvious problem: when you first add the tests, you may have to write a whole lot of documentation before they can all pass.

Having tests that protect against future code being added without documentation is only useful once you’ve added them to the codebase - but blocking that on documenting your existing features could prevent that benefit from ever manifesting itself.

Once again, pytest to the rescue. The @pytest.mark.xfail decorator allows you to mark a test as “expected to fail” - if it fails, pytest will take note but will not fail the entire test suite.

This means you can add deliberately failing tests to your codebase without breaking the build for everyone - perfect for tests that look for documentation that hasn’t yet been written!

I used xfail when I first added view documentation tests to Datasette, then removed it once the documentation was all in place. Any future code in pull requests without documentation will cause a hard test failure.

Here’s what the test output looks like when some of those tests are marked as “expected to fail”:

$ pytest tests/test_docs.py
collected 31 items

tests/test_docs.py ..........................XXXxx.                [100%]

============ 26 passed, 2 xfailed, 3 xpassed in 1.06 seconds ============

Since this reports both the xfailed and the xpassed counts, it shows how much work is still left to be done before the xfail decorator can be safely removed.

Structuring code for testable documentation

A benefit of comprehensive unit testing is that it encourages you to design your code in a way that is easy to test. In my experience this leads to much higher code quality in general: it encourages separation of concerns and cleanly decoupled components.

My hope is that documentation unit tests will have a similar effect. I’m already starting to think about ways of restructuring my code such that I can cleanly introspect it for the areas that need to be documented. I’m looking forward to discovering code design patterns that help support this goal.

Tags: design-patterns, documentation, restructuredtext, testing, datasette, pytest