<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: My open source process</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/series/open-source-process.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-12-18T14:49:38+00:00</updated><author><name>Simon Willison</name></author><entry><title>Your job is to deliver code you have proven to work</title><link href="https://simonwillison.net/2025/Dec/18/code-proven-to-work/#atom-series" rel="alternate"/><published>2025-12-18T14:49:38+00:00</published><updated>2025-12-18T14:49:38+00:00</updated><id>https://simonwillison.net/2025/Dec/18/code-proven-to-work/#atom-series</id><summary type="html">
    &lt;p&gt;In all of the debates about the value of AI-assistance in software development there's one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers - or open source maintainers - and expects the "code review" process to handle the rest.&lt;/p&gt;
&lt;p&gt;This is rude, a waste of other people's time, and is honestly a dereliction of duty as a software developer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Your job is to deliver code you have proven to work.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As software engineers we don't just crank out code - in fact these days you could argue that's what the LLMs are for. We need to deliver &lt;em&gt;code that works&lt;/em&gt; - and we need to include &lt;em&gt;proof&lt;/em&gt; that it works as well.  Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.&lt;/p&gt;
&lt;h4 id="how-to-prove-it-works"&gt;How to prove it works&lt;/h4&gt;
&lt;p&gt;There are two steps to proving a piece of code works. Neither is optional.&lt;/p&gt;
&lt;p&gt;The first is &lt;strong&gt;manual testing&lt;/strong&gt;. If you haven't seen the code do the right thing yourself, that code doesn't work. If it does turn out to work, that's honestly just pure chance.&lt;/p&gt;
&lt;p&gt;Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.&lt;/p&gt;
&lt;p&gt;If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here's a &lt;a href="https://github.com/simonw/llm-gemini/issues/116#issuecomment-3666551798"&gt;recent example&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some changes are harder to demonstrate. It's still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.&lt;/p&gt;
&lt;p&gt;Once you've tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.&lt;/p&gt;
&lt;p&gt;The second step in proving a change works is &lt;strong&gt;automated testing&lt;/strong&gt;. This is so much easier now that we have LLM tooling, which means there's no excuse at all for skipping this step.&lt;/p&gt;
&lt;p&gt;Your contribution should &lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/"&gt;bundle the change&lt;/a&gt; with an automated test that proves the change works. That test should fail if you revert the implementation.&lt;/p&gt;
&lt;p&gt;The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.&lt;/p&gt;
&lt;p&gt;Don't be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I've done this myself I've quickly regretted it.&lt;/p&gt;
&lt;h4 id="make-your-coding-agent-prove-it-first"&gt;Make your coding agent prove it first&lt;/h4&gt;
&lt;p&gt;The most important trend in LLMs in 2025 has been the explosive growth of &lt;strong&gt;coding agents&lt;/strong&gt; - tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.&lt;/p&gt;
&lt;p&gt;To master these tools you need to learn how to get them to &lt;em&gt;prove their changes work&lt;/em&gt; as well.&lt;/p&gt;
&lt;p&gt;This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.&lt;/p&gt;
&lt;p&gt;Since they're robots, automated tests and manual tests are effectively the same thing.&lt;/p&gt;
&lt;p&gt;They do feel a little different though. When I'm working on CLI tools I'll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like &lt;a href="https://click.palletsprojects.com/en/stable/testing/"&gt;Click's CLIRunner&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When working on CSS changes I'll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.&lt;/p&gt;
&lt;p&gt;The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They'll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.&lt;/p&gt;
&lt;p&gt;Developing good taste in testing code is another of those skills that differentiates a senior engineer.&lt;/p&gt;
&lt;h4 id="the-human-provides-the-accountability"&gt;The human provides the accountability&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/2025/Feb/3/a-computer-can-never-be-held-accountable/"&gt;A computer can never be held accountable&lt;/a&gt;. That's your job as the human in the loop.&lt;/p&gt;
&lt;p&gt;Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing &lt;em&gt;code that is proven to work&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Next time you submit a PR, make sure you've included your evidence that it works as it should.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/programming"&gt;programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="programming"/><category term="careers"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/><category term="vibe-coding"/><category term="coding-agents"/></entry><entry><title>A selfish personal argument for releasing code as Open Source</title><link href="https://simonwillison.net/2025/Jan/24/selfish-open-source/#atom-series" rel="alternate"/><published>2025-01-24T21:46:03+00:00</published><updated>2025-01-24T21:46:03+00:00</updated><id>https://simonwillison.net/2025/Jan/24/selfish-open-source/#atom-series</id><summary type="html">
    &lt;p&gt;I'm the guest for the most recent episode of the Real Python podcast with Christopher Bailey, talking about &lt;a href="https://realpython.com/podcasts/rpp/236/"&gt;Using LLMs for Python Development&lt;/a&gt;. We covered a &lt;em&gt;lot&lt;/em&gt; of other topics as well - most notably my relationship with Open Source development over the years.&lt;/p&gt;
&lt;p&gt;At &lt;a href="https://realpython.com/podcasts/rpp/236/#t=332"&gt;5m32s&lt;/a&gt; I presented what I think is the best version yet of my selfish personal argument for why it makes sense to default to releasing code as Open Source:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I didn't really get heavily back into open source until about maybe six years ago when I'd been working for a big company in the US, and I got frustrated that all of the code I was writing, I'd never get to use again.&lt;/p&gt;
&lt;p&gt;I realized that one of the best things about open source software is that you can solve a problem once and then you can slap an open source license on that solution and you will &lt;em&gt;never&lt;/em&gt; have to solve that problem ever again, no matter who's employing you in the future.&lt;/p&gt;
&lt;p&gt;It's a sneaky way of solving a problem permanently.&lt;/p&gt;
&lt;p&gt;Once I realized that I started open sourcing everything, like pretty much every piece of code I've written in the past six years, I've open sourced purely as a defense against me losing access to that code in the future.&lt;/p&gt;
&lt;p&gt;Because I've written loads of code for employers that I don't get to use anymore - and how many times do you want to reinvent things?&lt;/p&gt;
&lt;p&gt;I like to say that my interest in open source is actually really selfish. I figured something out. I never want to have to do this work ever again.&lt;/p&gt;
&lt;p&gt;If I slap a license on it, write documentation for it so that I can remember what it does and write unit tests for it so it's easy for me to keep it working in the future, that's entirely beneficial to me.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The rest of the episode was a really great conversation - other topics we covered included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=244"&gt;4m40s&lt;/a&gt;: My first ever significant open source project - a PHP XML-RPC library that ended up in WordPress twenty years ago&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=608"&gt;10m08s&lt;/a&gt;: Benefits I've gained from starting a blog 22+ years ago&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=1334"&gt;22m14s&lt;/a&gt;: How to get started using LLMs to write Python&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=2215"&gt;36m55s&lt;/a&gt;: My workflow for using LLMs for code - for both the experimental research work (I called this the "Mise en place phase") and the follow-up where I actually write the finished code&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=3314"&gt;55m14s&lt;/a&gt;: Why an SVG of a pelican riding a bicycle?&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=3468"&gt;57m48s&lt;/a&gt;: Why saying "do it better" actually works!&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/podcasts/rpp/236/#t=3624"&gt;1h0m24s&lt;/a&gt;: Cooking with LLMs! How to get a weirdly tasty guacamole recipe&lt;/li&gt;
&lt;li&gt;&lt;a href="https://realpython.com/podcasts/rpp/236/#t=4132"&gt;1h08m52s&lt;/a&gt;: My latest thoughts on local models&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/podcasts"&gt;podcasts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/podcast-appearances"&gt;podcast-appearances&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-source"/><category term="podcasts"/><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="podcast-appearances"/></entry><entry><title>Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions</title><link href="https://simonwillison.net/2024/Jan/16/python-lib-pypi/#atom-series" rel="alternate"/><published>2024-01-16T21:59:56+00:00</published><updated>2024-01-16T21:59:56+00:00</updated><id>https://simonwillison.net/2024/Jan/16/python-lib-pypi/#atom-series</id><summary type="html">
    &lt;p&gt;I use &lt;a href="https://github.com/cookiecutter/cookiecutter"&gt;cookiecutter&lt;/a&gt; to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.&lt;/p&gt;
&lt;p&gt;I made some major upgrades to my &lt;a href="https://github.com/simonw/python-lib"&gt;python-lib&lt;/a&gt; cookiecutter template today. Here's what it can now do to help you get started with a new Python library:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create a &lt;code&gt;pyproject.toml&lt;/code&gt; file configured for use with &lt;code&gt;setuptools&lt;/code&gt;. In my opinion this is the pattern with the current lowest learning curve - I wrote about that &lt;a href="https://til.simonwillison.net/python/pyproject"&gt;in detail in this TIL&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Add a skeleton &lt;code&gt;README&lt;/code&gt; and an Apache 2.0 &lt;code&gt;LICENSE&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;Create &lt;code&gt;your_package/__init__.py&lt;/code&gt; for your code to go in.&lt;/li&gt;
&lt;li&gt;Create &lt;code&gt;tests/test_your_package.py&lt;/code&gt; with a skeleton test.&lt;/li&gt;
&lt;li&gt;Include &lt;code&gt;pytest&lt;/code&gt; as a test dependency.&lt;/li&gt;
&lt;li&gt;Configure GitHub Actions with two workflows in &lt;code&gt;.github/workflows&lt;/code&gt; - one for running the tests against Python 3.8 through 3.12, and one for publishing releases of your package to PyPI.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The changes I made today are that I switched from &lt;code&gt;setup.py&lt;/code&gt; to &lt;code&gt;pyproject.toml&lt;/code&gt;, and I made a big improvement to how the publishing workflow authenticates with PyPI.&lt;/p&gt;
&lt;h4 id="pypi-trusted-publishing"&gt;Publishing to PyPI with Trusted Publishing&lt;/h4&gt;
&lt;p&gt;My previous version of this template required you to jump through &lt;a href="https://github.com/simonw/python-lib/blob/c28bd8cf822455fd464c253daf4ef4b430758588/README.md#publishing-your-library-as-a-package-to-pypi"&gt;quite a few hoops&lt;/a&gt; to get PyPI publishing to work. You needed to create a PyPI token that could publish a new package, then paste that token into a GitHub Actions secret, then publish the package, and then disable that token and create a new one dedicated to just updating this package in the future.&lt;/p&gt;
&lt;p&gt;The new version is much simpler, thanks to PyPI's relatively new &lt;a href="https://docs.pypi.org/trusted-publishers/"&gt;Trusted Publishers&lt;/a&gt; mechanism.&lt;/p&gt;
&lt;p&gt;To publish a new package, you need to sign into PyPI and &lt;a href="https://pypi.org/manage/account/publishing/"&gt;create a new "pending publisher"&lt;/a&gt;. Effectively you tell PyPI "My GitHub repository &lt;code&gt;myname/name-of-repo&lt;/code&gt; should be allowed to publish packages with the name &lt;code&gt;name-of-package&lt;/code&gt;".&lt;/p&gt;
&lt;p&gt;Here's that form for my brand new &lt;a href="https://github.com/datasette/datasette-test"&gt;datasette-test&lt;/a&gt; library, the first library I published using this updated template:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-test.png" alt="Screenshot of the create pending publisher form on PyPI. PyPI Project Name is set to datasette-test. Owner is set to datasette. Repository name is datasette-test. Workflow name is publish.yml. Environment name is release." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Then create a release on GitHub, with a name that matches the version number from your &lt;code&gt;pyproject.toml&lt;/code&gt;. Everything else should Just Work.&lt;/p&gt;
&lt;p&gt;I wrote &lt;a href="https://til.simonwillison.net/pypi/pypi-releases-from-github"&gt;more about Trusted Publishing in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="github-repository-template"&gt;Creating a package using a GitHub repository template&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/python-lib/issues/6"&gt;most time consuming part&lt;/a&gt; of this project was getting my GitHub repository template to work properly.&lt;/p&gt;
&lt;p&gt;There are two ways to use my cookiecutter template. You can use the cookiecutter command-line tool like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pipx install cookiecutter
cookiecutter gh:simonw/python-lib
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Answer a few questions here&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But a more fun and convenient option is to use my GitHub repository template, &lt;a href="https://github.com/simonw/python-lib-template-repository"&gt;simonw/python-lib-template-repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This lets you &lt;a href="https://github.com/new?template_name=python-lib-template-repository&amp;amp;template_owner=simonw"&gt;fill in a form&lt;/a&gt; on GitHub to create a new repository which will then execute the cookiecutter template for you and update itself with the result.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/template-repo-create.jpg" alt="Create a new repository form. I'm using the python-lib-template-repository template, and it asks for my repository name (my-new-python-library) and description." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;You can see an example of a repository created using this template at &lt;a href="https://github.com/datasette/datasette-test/tree/8d5f8262dc3a88f3c6d97f0cef3b55264cabc695"&gt;datasette/datasette-test&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="adding-it-all-together"&gt;Adding it all together&lt;/h4&gt;
&lt;p&gt;There are quite a lot of moving parts under the scenes here, but the end result is that anyone can now create a Python library with test coverage, GitHub CI and release automation by filling in a couple of forms and clicking some buttons.&lt;/p&gt;
&lt;p&gt;For more details on how this all works, and how it's evolved over time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2020/Jun/20/cookiecutter-plugins/"&gt;A cookiecutter template for writing Datasette plugins&lt;/a&gt; from June 2020 describes my first experiments with cookiecutter&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions&lt;/a&gt; from August 2021 describes my earliest attempts at using GitHub repository templates for this&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2021/Nov/4/publish-open-source-python-library/"&gt;How to build, test and publish an open source Python library&lt;/a&gt; is a ten minute talk I gave at PyGotham in November 2021. It describes &lt;code&gt;setup.py&lt;/code&gt; in detail, which is no longer my preferred approach.&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pypi"&gt;pypi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cookiecutter"&gt;cookiecutter&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="github"/><category term="projects"/><category term="pypi"/><category term="python"/><category term="github-actions"/><category term="cookiecutter"/></entry><entry><title>Things I've learned about building CLI tools in Python</title><link href="https://simonwillison.net/2023/Sep/30/cli-tools-python/#atom-series" rel="alternate"/><published>2023-09-30T00:12:19+00:00</published><updated>2023-09-30T00:12:19+00:00</updated><id>https://simonwillison.net/2023/Sep/30/cli-tools-python/#atom-series</id><summary type="html">
    &lt;p&gt;I build a lot of command-line tools in Python. It’s become my favorite way of quickly turning a piece of code into something I can use myself and package up for other people to use too.&lt;/p&gt;
&lt;p&gt;My biggest CLI  projects are &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt;, &lt;a href="https://shot-scraper.datasette.io/en/stable/"&gt;shot-scraper&lt;/a&gt; and &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; - but I have dozens of others and I build new ones at the rate of at least one a month. A fun recent example is &lt;a href="https://github.com/simonw/blip-caption"&gt;blip-caption&lt;/a&gt;, a tiny CLI wrapper around the Salesforce BLIP model that can generate usable captions for image files.&lt;/p&gt;
&lt;p&gt;Here are some notes on what I’ve learned about designing and implementing CLI tools in Python so far.&lt;/p&gt;
&lt;h4 id="starting-with-a-template"&gt;Starting with a template&lt;/h4&gt;
&lt;p&gt;I build enough CLI apps that I developed my own &lt;a href="https://github.com/cookiecutter/cookiecutter"&gt;Cookiecutter&lt;/a&gt; template for starting new ones.&lt;/p&gt;
&lt;p&gt;That template is &lt;a href="https://github.com/simonw/click-app"&gt;simonw/click-app&lt;/a&gt;. You can create a new application from that template directly on GitHub, too - I wrote more about that in &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="arguments-options-and-conventions"&gt;Arguments, options and conventions&lt;/h4&gt;
&lt;p&gt;Almost all of my tools are built using the &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; Python library. Click encourages a specific way of designing CLI tools which I really like - I find myself annoyed at the various tools from other ecosystems that don’t stick to the conventions that Click encourages.&lt;/p&gt;
&lt;p&gt;I’ll try to summarize those conventions here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commands have arguments and options. Arguments are positional - they are strings that you pass directly to the command, like &lt;code&gt;data.db&lt;/code&gt; in &lt;code&gt;datasette data.db&lt;/code&gt;. Arguments can be required or optional, and you can have commands which accept an unlimited number of arguments.&lt;/li&gt;
&lt;li&gt;Options are, usually, optional. They are things like &lt;code&gt;--port 8000&lt;/code&gt;. Options can also have a single character shortened version, such as &lt;code&gt;-p 8000&lt;/code&gt;.
&lt;ul&gt;
&lt;li&gt;Very occasionally I'll create an option that is required, usually because a command has so many positional arguments that forcing an option makes its usage easier to read.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Some options are flags - they don't take any additional parameters, they just switch something on. &lt;code&gt;shot-scraper --retina&lt;/code&gt; is an example of this.&lt;/li&gt;
&lt;li&gt;Flags with single character shortcuts can be easily combined - &lt;code&gt;symbex -in fetch_data&lt;/code&gt; is short for &lt;code&gt;symbex --imports --no-file fetch_data&lt;/code&gt; &lt;a href="https://github.com/simonw/symbex/blob/1.4/README.md#usage"&gt;for example&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Some options take multiple parameters. &lt;code&gt;datasette --setting sql_time_limit_ms 10000&lt;/code&gt; is an example, taking both the name of the setting and the value it should be set to.&lt;/li&gt;
&lt;li&gt;Commands can have sub-commands, each with their own family of commands. &lt;a href="https://llm.datasette.io/en/stable/templates.html"&gt;llm templates&lt;/a&gt; is an example of this, with &lt;code&gt;llm templates list&lt;/code&gt; and &lt;code&gt;llm templates show&lt;/code&gt; and &lt;a href="https://llm.datasette.io/en/stable/help.html#llm-templates-help"&gt;several more&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Every command should have help text - the more detailed the better. This can be viewed by running &lt;code&gt;llm --help&lt;/code&gt; - or for sub-commands, &lt;code&gt;llm templates --help&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Click makes it absurdly easy and productive to build CLI tools that follow these conventions.&lt;/p&gt;
&lt;h4 id="consistency-is-everything"&gt;Consistency is everything&lt;/h4&gt;
&lt;p&gt;As CLI utilities get larger, they can end up with a growing number of commands and options.&lt;/p&gt;
&lt;p&gt;The most important thing in designing these is &lt;em&gt;consistency&lt;/em&gt; with other existing commands and options (&lt;a href="https://github.com/simonw/llm/issues/160#issuecomment-1682995315"&gt;example here&lt;/a&gt;) - and with related tools that your user may have used before.&lt;/p&gt;
&lt;p&gt;I often turn to GPT-4 for help with this: I'll ask it for examples of existing CLI tools that do something similar to what I'm about to build, and see if there's anything in their option design that I can emulate.&lt;/p&gt;
&lt;p&gt;Since my various projects are designed to complement each other I try to stay consistent between them as well - I'll often post an issue comment that says "similar to functionality in X", with a copy of the &lt;code&gt;--help&lt;/code&gt; output for the tool I'm about to imitate.&lt;/p&gt;
&lt;h4 id="cli-interfaces-are-an-api---version-appropriately"&gt;CLI interfaces are an API - version appropriately&lt;/h4&gt;
&lt;p&gt;I try to stick to &lt;a href="https://semver.org/"&gt;semantic versioning&lt;/a&gt; for my projects, bumping the major version number on breaking changes and the minor version number for new features.&lt;/p&gt;
&lt;p&gt;The command-line interface to a tool is absolutely part of that documented API. If someone writes a Bash script or a GitHub Actions automation that uses one of my tools, I'm cautious to avoid breaking that without bumping my major version number.&lt;/p&gt;
&lt;h4 id="include-usage-examples-in---help"&gt;Include usage examples in --help&lt;/h4&gt;
&lt;p&gt;A habit I've formed more recently is trying to always including a working example of the command in the &lt;code&gt;--help&lt;/code&gt; for that command.&lt;/p&gt;
&lt;p&gt;I find I use this a lot for tools I've developed myself. All of my tools have extensive online documentation, but I like to be able to consult &lt;code&gt;--help&lt;/code&gt; without opening a browser for most of their functionality.&lt;/p&gt;
&lt;p&gt;Here's one of my more involved examples - the help for the &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#converting-data-in-columns"&gt;sqlite-utils convert&lt;/a&gt; command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Usage: sqlite-utils convert [OPTIONS] DB_PATH TABLE COLUMNS... CODE

  Convert columns using Python code you supply. For example:

      sqlite-utils convert my.db mytable mycolumn \
          '"\n".join(textwrap.wrap(value, 10))' \
          --import=textwrap

  "value" is a variable with the column value to be converted.

  Use "-" for CODE to read Python code from standard input.

  The following common operations are available as recipe functions:

  r.jsonsplit(value, delimiter=',', type=&amp;lt;class 'str'&amp;gt;)

      Convert a string like a,b,c into a JSON array ["a", "b", "c"]

  r.parsedate(value, dayfirst=False, yearfirst=False, errors=None)

      Parse a date and convert it to ISO date format: yyyy-mm-dd
      
      - dayfirst=True: treat xx as the day in xx/yy/zz
      - yearfirst=True: treat xx as the year in xx/yy/zz
      - errors=r.IGNORE to ignore values that cannot be parsed
      - errors=r.SET_NULL to set values that cannot be parsed to null

  r.parsedatetime(value, dayfirst=False, yearfirst=False, errors=None)

      Parse a datetime and convert it to ISO datetime format: yyyy-mm-ddTHH:MM:SS
      
      - dayfirst=True: treat xx as the day in xx/yy/zz
      - yearfirst=True: treat xx as the year in xx/yy/zz
      - errors=r.IGNORE to ignore values that cannot be parsed
      - errors=r.SET_NULL to set values that cannot be parsed to null

  You can use these recipes like so:

      sqlite-utils convert my.db mytable mycolumn \
          'r.jsonsplit(value, delimiter=":")'

Options:
  --import TEXT                   Python modules to import
  --dry-run                       Show results of running this against first
                                  10 rows
  --multi                         Populate columns for keys in returned
                                  dictionary
  --where TEXT                    Optional where clause
  -p, --param &amp;lt;TEXT TEXT&amp;gt;...      Named :parameters for where clause
  --output TEXT                   Optional separate column to populate with
                                  the output
  --output-type [integer|float|blob|text]
                                  Column type to use for the output column
  --drop                          Drop original column afterwards
  --no-skip-false                 Don't skip falsey values
  -s, --silent                    Don't show a progress bar
  --pdb                           Open pdb debugger on first error
  -h, --help                      Show this message and exit.
&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id="including---help-in-the-online-documentation"&gt;Including --help in the online documentation&lt;/h4&gt;
&lt;p&gt;My larger tools tend to have extensive documentation independently of their help output. I update this documentation at the same time as the implementation and the tests, as described in &lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/"&gt;The Perfect Commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I like to include the &lt;code&gt;--help&lt;/code&gt; output in my documentation sites as well. This is mainly for my own purposes - having the help visible on a web page makes it much easier to review it and spot anything that needs updating.&lt;/p&gt;
&lt;p&gt;Here are some example pages from my documentation that list &lt;code&gt;--help&lt;/code&gt; output:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sqlite-utils.datasette.io/en/stable/cli-reference.html"&gt;sqlite-utils CLI reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://llm.datasette.io/en/stable/help.html"&gt;LLM CLI reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/stable/cli-reference.html"&gt;Datasette CLI reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shot-scraper&lt;/code&gt; embeds help output on the relevant pages, e.g. &lt;a href="https://shot-scraper.datasette.io/en/stable/screenshots.html#shot-scraper-shot-help"&gt;shot-scraper shot --help&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://s3-credentials.readthedocs.io/en/stable/help.html"&gt;s3-credentials command help&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these pages are maintained automatically using &lt;a href="https://github.com/nedbat/cog"&gt;Cog&lt;/a&gt;. I described the pattern I use for this in &lt;a href="https://til.simonwillison.net/python/cog-to-update-help-in-readme"&gt;Using cog to update --help in a Markdown README file&lt;/a&gt;, or you can &lt;a href="https://github.com/simonw/datasette/blob/1.0a7/docs/cli-reference.rst?plain=1"&gt;view source&lt;/a&gt; on the Datasette CLI reference for a more involved example.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="python"/></entry><entry><title>Coping strategies for the serial project hoarder</title><link href="https://simonwillison.net/2022/Nov/26/productivity/#atom-series" rel="alternate"/><published>2022-11-26T15:47:02+00:00</published><updated>2022-11-26T15:47:02+00:00</updated><id>https://simonwillison.net/2022/Nov/26/productivity/#atom-series</id><summary type="html">
    &lt;p&gt;I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled "Massively increase your productivity on personal projects with comprehensive documentation and automated tests".&lt;/p&gt;
&lt;p&gt;The alternative title for the talk was &lt;em&gt;Coping strategies for the serial project hoarder&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I'm maintaining a &lt;em&gt;lot&lt;/em&gt; of different projects at the moment. Somewhat unintuitively, the way I'm handling this is by scaling down techniques that I've seen working for large engineering teams spread out across multiple continents.&lt;/p&gt;
&lt;p&gt;The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I'm working on at the same time.&lt;/p&gt;
&lt;p&gt;You can watch the talk &lt;a href="https://www.youtube.com/watch?v=GLkRK2rJGB0"&gt;on YouTube&lt;/a&gt; (25 minutes). Alternatively, I've included a detailed annotated version of the slides and notes below.&lt;/p&gt;
&lt;div class="resp-container"&gt;
  &lt;iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen" frameborder="0" height="315" src="https://www.youtube-nocookie.com/embed/GLkRK2rJGB0" width="560"&gt; &lt;/iframe&gt;
&lt;/div&gt;
&lt;!-- cutoff --&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.001.jpeg" alt="Title slide: Massively increase your productivity on personal projects with comprehensive documentation and automated tests - Simon Willison, DjangoCon US 2022" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This was the title I originally submitted to the conference. But I realized a better title was probably...&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.003.jpeg" alt="Same title slide, but the title has been replaced" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Coping strategies for the serial project hoarder&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.004.jpeg" alt="A static frame from a video: a monkey sits on some steps stuffing itself with several pastries. In the longer video the monkey is handed more and more pastries and can't resist trying to hold and eat all of them at once, no matter how many it receives." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://twitter.com/devisridhar/status/1576170527882121217"&gt;This video&lt;/a&gt; is a neat representation of my approach to personal projects: I always have a few on the go, but I can never resist the temptation to add even more.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.005.jpeg" alt="A screenshot of my profile on PyPI - my join date is Oct 26, 2017 and I have 185 pojects listed." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://pypi.org/user/simonw/"&gt;My PyPI profile&lt;/a&gt; (which is only five years old) lists 185 Python packages that I've released. Technically I'm actively maintaining all of them, in that if someone reports a bug I'll push out a fix. Many of them receive new releases at least once a year.&lt;/p&gt;
&lt;p&gt;Aside: I took this screenshot using &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; with a little bit of extra JavaScript to hide a notification bar at the top of the page:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;shot-scraper &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://pypi.org/user/simonw/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
--javascript &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;    document.body.style.paddingTop = 0;&lt;/span&gt;
&lt;span class="pl-s"&gt;    document.querySelector(&lt;/span&gt;
&lt;span class="pl-s"&gt;        '#sticky-notifications'&lt;/span&gt;
&lt;span class="pl-s"&gt;    ).style.display = 'none';&lt;/span&gt;
&lt;span class="pl-s"&gt;  &lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; --height 1000&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.006.jpeg" alt="A map of the world with the Eventbrite logo overlaid on it. There are pins on San Francisco, Nashville, Mendoza and Madrid." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;How can one individual maintain 185 projects?&lt;/p&gt;
&lt;p&gt;Surprisingly, I'm using techniques that I've scaled down from working at a company with hundreds of engineers.&lt;/p&gt;
&lt;p&gt;I spent seven years at Eventbrite, during which time the engineering team grew to span three different continents. We had major engineering centers in San Francisco, Nashville, Mendoza in Argentina and Madrid in Spain.&lt;/p&gt;
&lt;p&gt;Consider timezones: engineers in Madrid and engineers in San Francisco had almost no overlap in their working hours. Good asynchronous communication was essential.&lt;/p&gt;
&lt;p&gt;Over time, I noticed that the teams that were most effective at this scale were the teams that had a strong culture of documentation and automated testing.&lt;/p&gt;
&lt;p&gt;As I started to work on my own array of smaller personal projects, I found that the same discipline that worked for large teams somehow sped me up, when intuitively I would have expected it to slow me down.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.007.jpeg" alt="The perfect commit: Implementation + tests + documentation and a link to an issue thread" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I wrote an extended description of this in &lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/"&gt;The Perfect Commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I've started structuring the majority of my work in terms of what I think of as "the perfect commit" - a commit that combines implementation, tests, documentation and a link to an issue thread.&lt;/p&gt;
&lt;p&gt;As software engineers, it's important to note that our job generally isn't to write new software: it's to make changes to existing software.&lt;/p&gt;
&lt;p&gt;As such, the commit is our unit of work. It's worth us paying attention to how we can make our commits as useful as possible.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.008.jpeg" alt="Screenshot of a commit on GitHub: the title is Async support for prepare_jinja2_environment, closes #1809" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/commit/ddc999ad1296e8c69cffede3e367dda059b8adad"&gt;a recent example&lt;/a&gt; from one of my projects, Datasette.&lt;/p&gt;
&lt;p&gt;It's a single commit which bundles together the implementation, some related documentation improvements and the tests that show it works. And it links back to an issue thread from the commit message.&lt;/p&gt;
&lt;p&gt;Let's talk about each component in turn.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.009.jpeg" alt="Implementation: it should just do one thing (thing here is deliberately vague)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;There's not much to be said about the implementation: your commit should change something!&lt;/p&gt;
&lt;p&gt;It should only change one thing, but what that actually means varies on a case by case basis.&lt;/p&gt;
&lt;p&gt;It should be a single change that can be documented, tested and explained independently of other changes.&lt;/p&gt;
&lt;p&gt;(Being able to cleanly revert it is a useful property too.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.010.jpeg" alt="Tests: prove that the implementation works. Pass if the new implementation is correct, fail otherwise." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The goals of the tests that accompany a commit are to prove that the new implementation works.&lt;/p&gt;
&lt;p&gt;If you apply the implementation the new tests should pass. If you revert it the tests should fail.&lt;/p&gt;
&lt;p&gt;I often use &lt;code&gt;git stash&lt;/code&gt; to try this out.&lt;/p&gt;
&lt;p&gt;If you tell people they need to write tests for &lt;em&gt;every single change&lt;/em&gt; they'll often push back that this is too much of a burden, and will harm their productivity.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.011.jpeg" alt="Every project should start with a test. assert 1 + 1 == 2 is fine! Adding tests to an existing test suite is SO MUCH less work than starting a new test suite from scratch." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;But I find that the incremental cost of adding a test to an existing test suite keeps getting lower over time.&lt;/p&gt;
&lt;p&gt;The hard bit of testing is getting a testing framework setup in the first place - with a test runner, and fixtures, and objects under test and suchlike.&lt;/p&gt;
&lt;p&gt;Once that's in place, adding new tests becomes really easy.&lt;/p&gt;
&lt;p&gt;So my personal rule is that every new project starts with a test. It doesn't really matter what that test does - what matters is that you can run &lt;code&gt;pytest&lt;/code&gt; to run the tests, and you have an obvious place to start building more of them.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.012.jpeg" alt="Cookiecutter repo templates: simonw/python-lib, simonw/click-app, simonw/datasette-plugin" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I maintain three &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; templates to help with this, for the three kinds of projects I most frequently create:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/python-lib"&gt;simonw/python-lib&lt;/a&gt; for Python libraries&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/click-app"&gt;simonw/click-app&lt;/a&gt; for command line tools&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-plugin"&gt;simonw/datasette-plugin&lt;/a&gt; for Datasette plugins&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these templates creates a project with a &lt;code&gt;setup.py&lt;/code&gt; file, a README, a test suite and GitHub Actions workflows to run those tests and ship tagged releases to PyPI.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.013.jpeg" alt="Screenshot of the GitHub page to create a new repsoitory from python-lib-template-repository, which asks for a repository name, a description string and if the new repo should be public or private." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I have a trick for running &lt;code&gt;cookiecutter&lt;/code&gt; as part of creating a brand new repository on GitHub. I described that in &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.014.jpeg" alt="Documentation: Same repository as the code! Document changes that impact external developers. Update the docs in the same commit as the change. Catch missing documentation updates in PR / code review" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is a hill that I will die on: your documentation must live in the same repository as your code!&lt;/p&gt;
&lt;p&gt;You often see projects keep their documentation somewhere else, like in a wiki.&lt;/p&gt;
&lt;p&gt;Inevitably it goes out of date. And my experience is that if your documentation is out of date people will lose trust in it, which means they'll stop reading it and stop contributing to it.&lt;/p&gt;
&lt;p&gt;The gold standard of documentation has to be that it's reliably up to date with the code.&lt;/p&gt;
&lt;p&gt;The only way you can do that is if the documentation and code are in the same repository.&lt;/p&gt;
&lt;p&gt;This gives you versioned snapshots of the documentation that exactly match the code at that time.&lt;/p&gt;
&lt;p&gt;More importantly, it means you can enforce it through code review. You can say in a PR "this is great, but don't forget to update this paragraph on this page of the documentation to reflect the change you're making".&lt;/p&gt;
&lt;p&gt;If you do this you can finally get documentation that people learn to trust over time.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.015.jpeg" alt="Bonus trick: documentation unit tests" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another trick I like to use is something I call documentation unit tests.&lt;/p&gt;
&lt;p&gt;The idea here is to use unit tests to enforce that concepts introspected from your code are at least mentioned in your documentation.&lt;/p&gt;
&lt;p&gt;I wrote more about that in &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.016.jpeg" alt="Screenshot showing pytest running 26 passing tests, each with a name like test_plugin_hook_are_documented[filters_from_request]" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's an example. Datasette has &lt;a href="https://github.com/simonw/datasette/blob/0.63.1/tests/test_docs.py#L41-L53"&gt;a test&lt;/a&gt; that scans through each of the Datasette plugin hooks and checks that there is a heading for each one in the documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.017.jpeg" alt="Screenshot of the code linked to above" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The test itself is pretty simple: it uses &lt;code&gt;pytest&lt;/code&gt; parametrization to look through every introspected plugin hook name, and for each one checks that it has a matching heading in the documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="issue-thread"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.018.jpeg" alt="Everything links to an issue thread" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The final component of my perfect commit is this: every commit must link to an issue thread.&lt;/p&gt;
&lt;p&gt;I'll usually have these open in advance but  sometimes I'll open an issue thread just so I can close it with a commit a few seconds later!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.019.jpeg" alt="A screenshot of the issue titled prepare_jinja_enviroment() hook should take datasette argument - it has 11 comments" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/issues/1809"&gt;the issue&lt;/a&gt; for the commit I showed earlier. It has 11 comments, and every single one of those comments is by me.&lt;/p&gt;
&lt;p&gt;I have literally thousands of issues on GitHub that look like this: issue threads that are effectively me talking to myself about the changes that I'm making.&lt;/p&gt;
&lt;p&gt;It turns out this a fantastic form of additional documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.020.jpeg" alt="What goes in an issue?" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;What goes in an issue?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Background: the reasons for the change. In six months time you'll want to know why you did this.&lt;/li&gt;
&lt;li&gt;State of play before-hand: embed existing code, link to existing docs. I like to start my issues with "I'm going to change this code right here" - that way if I come back the next day I don't have to repeat that little piece of research.&lt;/li&gt;
&lt;li&gt;Links to things! Documentation, inspiration, clues found on StackOverflow. The idea is to capture all of the loose information floating around that topic.&lt;/li&gt;
&lt;li&gt;Code snippets illustrating potential designs and false-starts.&lt;/li&gt;
&lt;li&gt;Decisions. What did you consider? What did you decide? As programmers we make decisions constantly, all day, about everything. That work doesn't have to be invisible. Writing them down also avoids having to re-litigate them several months later when you've forgotten your original  reasoning.&lt;/li&gt;
&lt;li&gt;Screenshots - of everything! Animated screenshots even better. I even take screenshots of things like the AWS console to remind me what I did there.&lt;/li&gt;
&lt;li&gt;When you close it: a link to the updated documentation and demo&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.021.jpeg" alt="Temporal documentation. It's timestamped and contextual. You don't need to commit to keeping it up-to-date in the future (but you can add more comments if you like)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The reason I love issues is that they're a form of documentation that I think of as &lt;em&gt;temporal documentation&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Regular documentation comes with a big commitment: you have to keep it up to date in the future.&lt;/p&gt;
&lt;p&gt;Issue comments skip that commitment entirely. They're displayed with a timestamp, in the context of the work you were doing at the time.&lt;/p&gt;
&lt;p&gt;No-one will be upset or confused if you fail to keep them updated to match future changes.&lt;/p&gt;
&lt;p&gt;So it's a commitment-free form of documentation, which I for one find incredibly liberating.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.022.jpeg" alt="Issue driven development" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I think of this approach as &lt;em&gt;issue driven development&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Everything you are doing is issue-first, and from that you drive the rest of the development process.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.023.jpeg" alt="Don't remember anything: you can go back to a project in six months and pick up right where you left off" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is how it relates back to maintaining 185 projects at the same time.&lt;/p&gt;
&lt;p&gt;With issue driven development you &lt;em&gt;don't have to remember anything&lt;/em&gt; about any of these projects at all.&lt;/p&gt;
&lt;p&gt;I've had issues where I did a bunch of design work in issue comments, then dropped it, then came back 12 months later and implemented that design - without having to rethink it.&lt;/p&gt;
&lt;p&gt;I've had projects where I forgot that the project existed entirely! But I've found it again, and there's been an open issue, and I've been able to pick up work again.&lt;/p&gt;
&lt;p&gt;It's a way of working where you treat it like every project is going to be maintained by someone else, and it's the classic cliche here that the somebody else is you in the future.&lt;/p&gt;
&lt;p&gt;It horizontally scales you and lets you tackle way more interesting problems.&lt;/p&gt;
&lt;p&gt;Programmers always complain when you interrupt them - there's this idea of "flow state" and that interrupting a programmer for a moment costs them half an hour in getting back up to speed.&lt;/p&gt;
&lt;p&gt;This fixes that! It's much easier to get back to what you are doing if you have an issue thread that records where you've got to.&lt;/p&gt;
&lt;p&gt;Issue driven development is my key productivity hack for taking on much more ambitious projects in much larger quantities.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.024.jpeg" alt="Laboratory notebooks - and a picture of a page from one by Leonardo da Vinci" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another way to think about this is to compare it to laboratory notebooks.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://en.wikipedia.org/wiki/Studies_of_the_Fetus_in_the_Womb"&gt;a page&lt;/a&gt; from one by Leonardo da Vinci.&lt;/p&gt;
&lt;p&gt;Great scientists and great engineers have always kept detailed notes.&lt;/p&gt;
&lt;p&gt;We can use GitHub issues as a really quick and easy way to do the same thing!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.025.jpeg" alt="Issue: Figure out how to deploy Datasette to AWS lambda using function URLs and Mangum" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another thing I like to use these for is deep research tasks.&lt;/p&gt;
&lt;p&gt;Here's an example, from when I was trying to figure out how to run my Python web application in an AWS Lambda function:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/public-notes/issues/6"&gt;Figure out how to deploy Datasette to AWS Lambda using function URLs and Mangum&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This took me 65 comments over the course of a few days... but by the end of that thread I'd figured out how to do it!&lt;/p&gt;
&lt;p&gt;Here's the follow-up, with another 77 comments, in which I &lt;a href="https://github.com/simonw/public-notes/issues/1"&gt;figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I will never have to figure this out ever again! That's a huge win.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.026.jpeg" alt="simonw/public-notes/issues" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/public-notes"&gt;https://github.com/simonw/public-notes&lt;/a&gt; is a public repository where I keep some of these issue threads, transferred from my private notes repos &lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;using this trick&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.027.jpeg" alt="Tell people what you did! (It's so easy to skip this step)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The last thing I want to encourage you to do is this: if you do project, tell people what it is you did!&lt;/p&gt;
&lt;p&gt;This counts for both personal and work projects. It's so easy to skip this step.&lt;/p&gt;
&lt;p&gt;Once you've shipped a feature or built a project, it's so tempting to skip the step of spending half an hour or more writing about the work you have done.&lt;/p&gt;
&lt;p&gt;But you are missing out on &lt;em&gt;so much&lt;/em&gt; of the value of your work if you don't give other people a chance to understand what you did.&lt;/p&gt;
&lt;p&gt;I wrote more about this here: &lt;a href="https://simonwillison.net/2022/Nov/6/what-to-blog-about/"&gt;What to blog about&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.028.jpeg" alt="Release notes (with dates)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;For projects with releases, release notes are a really good way to do this.&lt;/p&gt;
&lt;p&gt;I like using GitHub releases for this - they're quick and easy to write, and I have automation setup for my projects such that creating release notes in GitHub triggers a build and release to PyPI.&lt;/p&gt;
&lt;p&gt;I've done over 1,000 releases in this way. Having them automated is crucial, and having automation makes it really easy to ship releases more often.&lt;/p&gt;
&lt;p&gt;Please make sure your release notes have dates on them. I need to know when your change went out, because if it's only a week old it's unlikely people will have upgraded to it yet, whereas a change from five years ago is probably safe to depend on.&lt;/p&gt;
&lt;p&gt;I wrote more about &lt;a href="https://simonwillison.net/2022/Jan/31/release-notes/"&gt;writing better release notes&lt;/a&gt; here.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.029.jpeg" alt="Expand your definition of done to include writing about what you did" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is a mental trick which works really well for me. "No project of mine is finished until I've told people about it in some way" is a really useful habit to form.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.030.jpeg" alt="Twitter threads (embed images + links + videos)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Twitter threads are (or were) a great low-effort way to write about a project. Build a quick thread with some links and images, and maybe even a video.&lt;/p&gt;
&lt;p&gt;Get a little unit about your project out into the world, and then you can stop thinking about it.&lt;/p&gt;
&lt;p&gt;(I'm trying to do this &lt;a href="https://simonwillison.net/2022/Nov/5/mastodon/"&gt;on Mastodon now&lt;/a&gt; instead.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.031.jpeg" alt="Get a blog" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Even better: get a blog! Having your own corner of the internet to write about the work that you are doing is a small investment that will pay off many times over.&lt;/p&gt;
&lt;p&gt;("Nobody blogs anymore" I said in the talk... Phil Gyford disagrees with that meme so much that he launched &lt;a href="https://ooh.directory/blog/2022/welcome/"&gt;a new blog directory&lt;/a&gt; to show how wrong it is.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.032.jpeg" alt="GUILT is the enemy of projects" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The enemy of projects, especially personal projects, is &lt;em&gt;guilt&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The more projects you have, the more guilty you feel about working on any one of them - because you're not working on the others, and those projects haven't yet achieved their goals.&lt;/p&gt;
&lt;p&gt;You have to overcome guilt if you're going to work on 185 projects at once!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="avoid-user-accounts"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.033.jpeg" alt="Avoid side projects with user accounts. If i has user accounts it's not a side-project, it's an unpaid job." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is the most important tip: avoid side projects with user accounts.&lt;/p&gt;
&lt;p&gt;If you build something that people can sign into, that's not a side-project, it's an unpaid job. It's a very big responsibility, avoid at all costs!&lt;/p&gt;
&lt;p&gt;Almost all of my projects right now are open source things that people can run on their own machines, because that's about as far away from user accounts as I can get.&lt;/p&gt;
&lt;p&gt;I still have a responsibility for shipping security updates and things like that, but at least I'm not holding onto other people's data for them.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.034.jpeg" alt="If your project is tested and documented, you have nothing to feel guilty about. That's what I tell myself anyway!" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I feel like if your project is tested and documented, &lt;em&gt;you have nothing to feel guilty about&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;You have put a thing out into the world, and it has tests to show that it works, and it has documentation that explains what it is.&lt;/p&gt;
&lt;p&gt;This means I can step back and say that it's OK for me to work on other things. That thing there is a unit that makes sense to people.&lt;/p&gt;
&lt;p&gt;That's what I tell myself anyway! It's OK to have 185 projects provided they all have documentation and they all have tests.&lt;/p&gt;
&lt;p&gt;Do that and the guilt just disappears. You can live guilt free!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.035.jpeg" alt="Thank you - simonwillison.net - twitter.com/simonw / github.com/simonw" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;You can follow me on Mastodon at &lt;a href="https://fedi.simonwillison.net/@simon"&gt;@simon@simonwillison.net&lt;/a&gt; or on GitHub at &lt;a href="https://github.com/simonw"&gt;github.com/simonw&lt;/a&gt;. Or subscribe to my blog at &lt;a href="https://simonwillison.net/"&gt;simonwillison.net&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;From the Q&amp;amp;A:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You've tweeted about using GitHub Projects. Could you talk about that?
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects"&gt;GitHub Projects V2&lt;/a&gt; is the perfect TODO list for me, because it lets me bring together issues from different repositories. I use a project called "Everything" on a daily basis (it's my browser default window) - I add issues to it that I plan to work on, including personal TODO list items as well as issues from my various public and private repositories. It's kind of like a cross between Trello and Airtable and I absolutely love it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;How did you move notes from the private to the public repo?
&lt;ul&gt;
&lt;li&gt;GitHub doesn't let you do this. But there's a trick I use involving a &lt;code&gt;temp&lt;/code&gt; repo which I switch between public and private to help transfer notes. More in &lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;this TIL&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Question about the perfect commit: do you commit your failing tests?
&lt;ul&gt;
&lt;li&gt;I don't: I try to keep the commits that land on my &lt;code&gt;main&lt;/code&gt; branch always passing. I'll sometimes write the failing test before the implementation and  then commit them together. For larger projects I'll work in a branch and then squash-merge the final result into a perfect commit to main later on.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/div&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/djangocon"&gt;djangocon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/productivity"&gt;productivity&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="djangocon"/><category term="documentation"/><category term="productivity"/><category term="my-talks"/><category term="testing"/><category term="annotated-talks"/><category term="github-issues"/></entry><entry><title>The Perfect Commit</title><link href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#atom-series" rel="alternate"/><published>2022-10-29T20:41:01+00:00</published><updated>2022-10-29T20:41:01+00:00</updated><id>https://simonwillison.net/2022/Oct/29/the-perfect-commit/#atom-series</id><summary type="html">
    &lt;p&gt;For the last few years I've been trying to center my work around creating what I consider to be the &lt;em&gt;Perfect Commit&lt;/em&gt;. This is a single commit that contains all of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;implementation&lt;/strong&gt;: a single, focused change&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests&lt;/strong&gt; that demonstrate the implementation works&lt;/li&gt;
&lt;li&gt;Updated &lt;strong&gt;documentation&lt;/strong&gt; reflecting the change&lt;/li&gt;
&lt;li&gt;A link to an &lt;strong&gt;issue thread&lt;/strong&gt; providing further context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our job as software engineers generally isn't to write new software from scratch: we spend the majority of our time adding features and fixing bugs in existing software.&lt;/p&gt;
&lt;p&gt;The commit is our principle unit of work. It deserves to be treated thoughtfully and with care.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 26th November 2022&lt;/strong&gt;: My 25 minute talk &lt;a href="https://simonwillison.net/2022/Nov/26/productivity/"&gt;Massively increase your productivity on personal projects with comprehensive documentation and automated tests&lt;/a&gt; describes this approach to software development in detail.&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#implementation"&gt;Implementation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#tests"&gt;Tests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#documentation"&gt;Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#link-to-an-issue"&gt;A link to an issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#issue-over-commit-message"&gt;An issue is more valuable than a commit message&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#not-all-perfect"&gt;Not every commit needs to be "perfect"&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#scrappy-branches"&gt;Write scrappy commits in a branch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#examples"&gt;Some examples&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="implementation"&gt;Implementation&lt;/h4&gt;
&lt;p&gt;Each commit should change a single thing.&lt;/p&gt;
&lt;p&gt;The definition of "thing" here is left deliberately vague!&lt;/p&gt;
&lt;p&gt;The goal is have something that can be easily reviewed, and that can be clearly understood in the future when revisited using tools like &lt;code&gt;git blame&lt;/code&gt; or &lt;a href="https://til.simonwillison.net/git/git-bisect"&gt;git bisect&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I like to keep my commit history linear, as I find that makes it much easier to comprehend later. This further reinforces the value of each commit being a single, focused change.&lt;/p&gt;
&lt;p&gt;Atomic commits are also much easier to cleanly revert if something goes wrong - or to cherry-pick into other branches.&lt;/p&gt;
&lt;p&gt;For things like web applications that can be deployed to production, a commit should be a unit that can be deployed. Aiming to keep the main branch in a deployable state is a good rule of thumb for deciding if a commit is a sensible atomic change or not.&lt;/p&gt;
&lt;h4 id="tests"&gt;Tests&lt;/h4&gt;
&lt;p&gt;The ultimate goal of tests is to &lt;em&gt;increase&lt;/em&gt; your productivity. If your testing practices are slowing you down, you should consider ways to improve them.&lt;/p&gt;
&lt;p&gt;In the longer term, this productivity improvement comes from gaining the freedom to make changes and stay confident that your change hasn't broken something else.&lt;/p&gt;
&lt;p&gt;But tests can help increase productivity in the immediate short term as well.&lt;/p&gt;
&lt;p&gt;How do you know when the change you have made is finished and ready to commit? It's ready when the new tests pass.&lt;/p&gt;
&lt;p&gt;I find this reduces the time I spend second-guessing myself and questioning whether I've done enough and thought through all of the edge cases.&lt;/p&gt;
&lt;p&gt;Without tests, there's a very strong possibility that your change will have broken some other, potentially unrelated feature. Your commit could be held up by hours of tedious manual testing. Or you could &lt;abbr title="You Only Live Once"&gt;YOLO&lt;/abbr&gt; it and learn that you broke something important later!&lt;/p&gt;
&lt;p&gt;Writing tests becomes far less time consuming if you already have good testing practices in place.&lt;/p&gt;
&lt;p&gt;Adding a new test to a project with a lot of existing tests is easy: you can often find an existing test that has 90% of the pattern you need already worked out for you.&lt;/p&gt;
&lt;p&gt;If your project has no tests at all, adding a test for your change will be a lot more work.&lt;/p&gt;
&lt;p&gt;This is why I start every single one of my projects with a passing test. It doesn't matter what this test is - &lt;code&gt;assert 1 + 1 == 2&lt;/code&gt; is fine! The key thing is to get a testing framework in place, such that you can run a command (for me that's usually &lt;code&gt;pytest&lt;/code&gt;) to execute the test suite - and you have an obvious place to add new tests in the future.&lt;/p&gt;
&lt;p&gt;I use &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;these cookiecutter templates&lt;/a&gt; for almost all of my new projects. They configure a testing framework with a single passing test and GitHub Actions workflows to exercise it all from the very start.&lt;/p&gt;
&lt;p&gt;I'm not a huge advocate of test-first development, where tests are written before the code itself. What I care about is tests-included development, where the final commit bundles the tests and the implementation together. I wrote more about my approach to testing in &lt;a href="https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/"&gt;How to cheat at unit tests with pytest and Black&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="documentation"&gt;Documentation&lt;/h4&gt;
&lt;p&gt;If your project defines APIs that are meant to be used outside of your project, they need to be documented. In my work these projects are usually one of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python APIs (modules, functions and classes) that provide code designed to be imported into other projects.&lt;/li&gt;
&lt;li&gt;Web APIs - usually JSON over HTTP these days - that provide functionality to be consumed by other applications.&lt;/li&gt;
&lt;li&gt;Command line interface tools, such as those implemented using &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; or &lt;a href="https://typer.tiangolo.com/"&gt;Typer&lt;/a&gt; or &lt;a href="https://docs.python.org/3/library/argparse.html"&gt;argparse&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is critical that this documentation &lt;strong&gt;must live in the same repository as the code itself&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is important for a number of reasons.&lt;/p&gt;
&lt;p&gt;Documentation is only valuable &lt;strong&gt;if people trust it&lt;/strong&gt;. People will only trust it if they know that it is kept up to date.&lt;/p&gt;
&lt;p&gt;If your docs live in a separate wiki somewhere it's easy for them to get out of date - but more importantly it's hard for anyone to quickly confirm if the documentation is being updated in sync with the code or not.&lt;/p&gt;
&lt;p&gt;Documentation should be &lt;strong&gt;versioned&lt;/strong&gt;. People need to be able to find the docs for the specific version of your software that they are using. Keeping it in the same repository as the code gives you synchronized versioning for free.&lt;/p&gt;
&lt;p&gt;Documentation changes should be &lt;strong&gt;reviewed&lt;/strong&gt; in the same way as your code. If they live in the same repository you can catch changes that need to be reflected in the documentation as part of your code review process.&lt;/p&gt;
&lt;p&gt;And ideally, documentation should be &lt;strong&gt;tested&lt;/strong&gt;. I wrote about my approach to doing this using &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;. Executing example code in the documentation using a testing framework is a great idea too.&lt;/p&gt;
&lt;p&gt;As with tests, writing documentation from scratch is much more work than incrementally modifying existing documentation.&lt;/p&gt;
&lt;p&gt;Many of my commits include documentation that is just a sentence or two. This doesn't take very long to write, but it adds up to something very comprehensive over time.&lt;/p&gt;
&lt;p&gt;How about end-user facing documentation? I'm still figuring that out myself. I created my &lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;shot-scraper tool&lt;/a&gt; to help automate the process of keeping screenshots up-to-date, but I've not yet found personal habits and styles for end-user documentation that I'm confident in.&lt;/p&gt;
&lt;h4 id="link-to-an-issue"&gt;A link to an issue&lt;/h4&gt;
&lt;p&gt;Every perfect commit should include a link to an issue thread that accompanies that change.&lt;/p&gt;
&lt;p&gt;Sometimes I'll even open an issue seconds before writing the commit message, just to give myself something I can link to from the commit itself!&lt;/p&gt;
&lt;p&gt;The reason I like issue threads is that they provide effectively unlimited space for commentary and background for the change that is being made.&lt;/p&gt;
&lt;p&gt;Most of my issue threads are me talking to myself - sometimes with dozens of issue comments, all written by me.&lt;/p&gt;
&lt;p&gt;Things that can go in an issue thread include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Background&lt;/strong&gt;: the reason for the change. I try to include this in the opening comment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State of play&lt;/strong&gt; before the change. I'll often link to the current version of the code and documentation. This is great for if I return to an open issue a few days later, as it saves me from having to repeat that initial research.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Links to things&lt;/strong&gt;. So many links! Inspiration for the change, relevant documentation, conversations on Slack or Discord, clues found on StackOverflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code snippets&lt;/strong&gt; illustrating potential designs and false-starts. Use &lt;code&gt;```python ... ```&lt;/code&gt; blocks to get syntax highlighting in your issue comments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decisions&lt;/strong&gt;. What did you consider? What did you decide? As programmers we make hundreds of tiny decisions a day. Write them down! Then you'll never find yourself relitigating them in the future having forgotten your original reasoning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screenshots&lt;/strong&gt;. What it looked like before, what it looked like after. Animated screenshots are even better! I use &lt;a href="https://www.cockos.com/licecap/"&gt;LICEcap&lt;/a&gt; to generate quick GIF screen captures or QuickTime to capture videos - both of which can be dropped straight into a GitHub issue comment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototypes&lt;/strong&gt;. I'll often paste a few lines of code copied from a Python console session. Sometimes I'll even paste in a block of HTML and CSS, or add a screenshot of a UI prototype.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After I've closed my issues I like to add one last comment that links to the updated documentation and ideally a live demo of the new feature.&lt;/p&gt;
&lt;h4 id="issue-over-commit-message"&gt;An issue is more valuable than a commit message&lt;/h4&gt;
&lt;p&gt;I went through a several year phase of writing essays in my commit messages, trying to capture as much of the background context and thinking as possible.&lt;/p&gt;
&lt;p&gt;My commit messages grew a lot shorter when I started bundling the updated documentation in the commit - since often much of the material I'd previously included in the commit message was now in that documentation instead.&lt;/p&gt;
&lt;p&gt;As I extended my practice of writing issue threads, I found that they were a better place for most of this context than the commit messages themselves. They supported embedded media, were more discoverable and I could continue to extend them even after the commit had landed.&lt;/p&gt;
&lt;p&gt;Today many of my commit messages are a single line summary and a link to an issue!&lt;/p&gt;
&lt;p&gt;The biggest benefit of lengthy commit messages is that they are guaranteed to survive for as long as the repository itself. If you're going to use issue threads in the way I describe here it is critical that you consider their long term archival value.&lt;/p&gt;
&lt;p&gt;I expect this to be controversial! I'm advocating for abandoning one of the core ideas of Git here - that each repository should incorporate a full, decentralized record of its history that is copied in its entirety when someone clones a repo.&lt;/p&gt;
&lt;p&gt;I understand that philosophy. All I'll say here is that my own experience has been that dropping that requirement has resulted in a net increase in my overall productivity. Other people may reach a different conclusion.&lt;/p&gt;
&lt;p&gt;If this offends you too much, you're welcome to construct an &lt;em&gt;even more perfect commit&lt;/em&gt; that incorporates background information and additional context in an extended commit message as well.&lt;/p&gt;
&lt;p&gt;One of the reasons I like GitHub Issues is that it includes a comprehensive API, which can be used to extract all of that data. I use my &lt;a href="https://github.com/dogsheep/github-to-sqlite"&gt;github-to-sqlite tool&lt;/a&gt; to maintain an ongoing archive of my issues and issue comments as a SQLite database file.&lt;/p&gt;
&lt;h4 id="not-all-perfect"&gt;Not every commit needs to be "perfect"&lt;/h4&gt;
&lt;p&gt;I find that the vast majority of my work fits into this pattern, but there are exceptions.&lt;/p&gt;
&lt;p&gt;Typo fix for some documentation or a comment? Just ship it, it's fine.&lt;/p&gt;
&lt;p&gt;Bug fix that doesn't deserve documentation? Still bundle the implementation and the test plus a link to an issue, but no need to update the docs - especially if they already describe the expected bug-free behaviour.&lt;/p&gt;
&lt;p&gt;Generally though, I find that aiming for implementation, tests, documentation and an issue link covers almost all of my work. It's a really good default model.&lt;/p&gt;
&lt;h4 id="scrappy-branches"&gt;Write scrappy commits in a branch&lt;/h4&gt;
&lt;p&gt;If I'm writing more exploratory or experimental code it often doesn't make sense to work in this strict way. For those instances I'll usually work in a branch, where I can ship "WIP" commit messages and failing tests with abandon. I'll then squash-merge them into a single perfect commit (sometimes via a self-closed GitHub pull request) to keep my main branch as tidy as possible.&lt;/p&gt;
&lt;h4 id="examples"&gt;Some examples&lt;/h4&gt;
&lt;p&gt;Here are some examples of my commits that follow this pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/commit/9676b2deb07cff20247ba91dad3e84a4ab0b00d1"&gt;Upgrade Docker images to Python 3.11&lt;/a&gt; for &lt;a href="https://github.com/simonw/datasette/issues/1853"&gt;datasette #1853&lt;/a&gt; - a pretty tiny change, but still includes tests, docs and an issue link.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/sqlite-utils/commit/ab8d4aad0c42f905640981f6f24bc1e37205ae62"&gt;sqlite-utils schema now takes optional tables&lt;/a&gt; for &lt;a href="https://github.com/simonw/sqlite-utils/issues/299"&gt;sqlite-utils #299&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/shot-scraper/commit/5048e21a1ca5accedfeca6ac25a16a38dc240b81"&gt;shot-scraper html command&lt;/a&gt; for &lt;a href="https://github.com/simonw/shot-scraper/issues/96"&gt;shot-scraper #96&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/s3-credentials/commit/c7bb7268c4a124349bb511f7ec3ee3f28f9581ad"&gt;s3-credentials put-objects command&lt;/a&gt; for &lt;a href="https://github.com/simonw/s3-credentials/issues/68"&gt;s3-credentials #68&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-gunicorn/commit/0d561d7a94f76079b1eb7779b3e944c163d2539e"&gt;Initial implementation&lt;/a&gt; for &lt;a href="https://github.com/simonw/datasette-gunicorn/issues/1"&gt;datasette-gunicorn #1&lt;/a&gt; - this was the first commit to this repository, but I still bundled the tests, docs, implementation and a link to an issue.&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/code-review"&gt;code-review&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git"&gt;git&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="code-review"/><category term="definitions"/><category term="documentation"/><category term="git"/><category term="github"/><category term="software-engineering"/><category term="testing"/><category term="github-issues"/></entry><entry><title>Automating screenshots for the Datasette documentation using shot-scraper</title><link href="https://simonwillison.net/2022/Oct/14/automating-screenshots/#atom-series" rel="alternate"/><published>2022-10-14T23:44:03+00:00</published><updated>2022-10-14T23:44:03+00:00</updated><id>https://simonwillison.net/2022/Oct/14/automating-screenshots/#atom-series</id><summary type="html">
    &lt;p&gt;I released &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; back &lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;in March&lt;/a&gt; as a tool for keeping screenshots in documentation up-to-date.&lt;/p&gt;
&lt;p&gt;It's very easy for feature screenshots in documentation for a web application to drift out-of-date with the latest design of the software itself.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; is a command-line tool that aims to solve this.&lt;/p&gt;
&lt;p&gt;You can use it to take one-off screenshots like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper https://latest.datasette.io/ --height 800
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or you can define multiple screenshots in a single YAML file - let's call this &lt;code&gt;shots.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;index.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;database.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And run them all at once like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper multi shots.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This morning I used &lt;code&gt;shot-scraper&lt;/code&gt; to replace all of the existing screenshots in the &lt;a href="https://docs.datasette.io/en/latest/"&gt;Datasette documentation&lt;/a&gt; with up-to-date, automated equivalents.&lt;/p&gt;
&lt;p&gt;I decided to use this as an opportunity to create a more detailed tutorial for how to use &lt;code&gt;shot-scraper&lt;/code&gt; for this kind of screenshot automation project.&lt;/p&gt;
&lt;h4&gt;Four screenshots to replace&lt;/h4&gt;
&lt;p&gt;Datasette's documentation included four screenshots that I wanted to replace with automated equivalents.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/full_text_search.png"&gt;full_text_search.png&lt;/a&gt; illustrates the full-text search feature:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/full_text_search.png" alt="A search for cherry running against the Street_Tree_List table, returning 14,663 rows" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/advanced_export.png"&gt;advanced_export.png&lt;/a&gt; displays Datasette's "advanced export" dialog:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/advanced_export.png" alt="Advanced export dialog, with four links 3 checkboxes and an Export CSV button" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/binary_data.png"&gt;binary_data.png&lt;/a&gt; displays just a small fragment of a table with binary download links:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/binary_data.png" alt="A small screenshot showing binary data download links" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/facets.png"&gt;facets.png&lt;/a&gt; demonstrates faceting against a table:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://github.com/simonw/datasette/raw/0.62/docs/facets.png?raw=true" alt="Datasette's facet interface, showing one suggested facet and three facet lists" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'll walk through each screenshot in turn.&lt;/p&gt;
&lt;h4&gt;full_text_search.png&lt;/h4&gt;
&lt;p&gt;I decided to use a different example for the new screenshot, because I don't currently have a live instance for that table running against the most recent Datasette release.&lt;/p&gt;
&lt;p&gt;I went with &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/a&gt; - a search against the UK register of members interests for "hamper" (see &lt;a href="https://simonwillison.net/2018/Apr/25/register-members-interests/"&gt;Exploring the UK Register of Members Interests with SQL and Datasette&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The existing image in the documentation was 960 pixels wide, so I stuck with that and tried a few iterations until I found a height that I liked.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://shot-scraper.datasette.io/en/stable/installation.html"&gt;installed shot-scraper&lt;/a&gt; and ran the following, in my &lt;code&gt;/tmp&lt;/code&gt; directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date' \
  -h 585 \
  -w 960
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This produced a &lt;code&gt;register-of-members-interests-datasettes-com-regmem-items.png&lt;/code&gt; file which looked good when I opened it in Preview.&lt;/p&gt;
&lt;p&gt;I turned that into the following YAML in my &lt;code&gt;shots.yml&lt;/code&gt; file:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;585&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;960&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;regmem-search.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running &lt;code&gt;shot-scraper multi shots.yml&lt;/code&gt; against that file produced this &lt;code&gt;regmem-search.png&lt;/code&gt; image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png" alt="A screenshot of that search, with the most recent design for Datasette" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;advanced_export.png&lt;/h4&gt;
&lt;p&gt;This next image isn't a full page screenshot - it's just a small fragment of the page.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; can take partial screenshots based on one or more CSS selectors. Given a CSS selector the tool draws a box around just that element and uses that to take the screenshot - adding optional padding.&lt;/p&gt;
&lt;p&gt;Here's the recipe for the advanced export box - I used the same &lt;code&gt;register-of-members-interests.datasettes.com&lt;/code&gt; example for it as this had enough rows to trigger all of the advanced options to be displayed:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper' \
  -s '#export' \
  -p 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-p 10&lt;/code&gt; here specifies 10px of padding, needed to capture the drop shadow on the box.&lt;/p&gt;
&lt;p&gt;Here's the equivalent YAML:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;#export&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;advanced-export.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the result:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png" alt="A screenshot of the advanced export box" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;binary_data.png&lt;/h4&gt;
&lt;p&gt;This screenshot required a different trick.&lt;/p&gt;
&lt;p&gt;I wanted to take a screenshot of the table &lt;a href="https://latest.datasette.io/fixtures/binary_data"&gt;on this page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The full table looks like this, with three rows:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/shot-scraper-binary-table.png" alt="A table with three rows - two containing binary data and one that is empty" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I only wanted the first two of these to be shown in the screenshot though.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; has the ability to execute JavaScript on the page before the screenshot is taken. This can be used to remove elements first.&lt;/p&gt;
&lt;p&gt;Here's the JavaScript I came up with to remove all but the first two rows (actually the first three, because the table header counts as a row too):&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Array&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;from&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;querySelectorAll&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'tr:nth-child(n+3)'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-s1"&gt;el&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;el&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;parentNode&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;el&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I did it this way so that if I add any more rows to that test table in the future the code will still remove everything but the first two.&lt;/p&gt;
&lt;p&gt;The CSS selector &lt;code&gt;tr:nth-child(n+3)&lt;/code&gt; selects all rows that are not the first three (one header plus two content rows).&lt;/p&gt;
&lt;p&gt;Here's how to run that from the command-line, and then take a 10 pixel padded screenshot of just the table on the page after it has been modified by the JavaScript:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://latest.datasette.io/fixtures/binary_data' \
  -j 'Array.from(document.querySelectorAll("tr:nth-child(n+3)"), el =&amp;gt; el.parentNode.removeChild(el));' \
  -s table -p 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The YAML I added to &lt;code&gt;shots.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures/binary_data&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;table&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;    Array.from(&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelectorAll('tr:nth-child(n+3)'),&lt;/span&gt;
&lt;span class="pl-s"&gt;      el =&amp;gt; el.parentNode.removeChild(el)&lt;/span&gt;
&lt;span class="pl-s"&gt;    );&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;binary-data.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the resulting image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png" alt="A screenshot of the binary data table, with just the first two rows" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;facets.png&lt;/h4&gt;
&lt;p&gt;I left the most complex screenshot to last.&lt;/p&gt;
&lt;p&gt;For the faceting screenshot, I wanted to include the "suggested facet" links at the top of the page, a set of active facets and then the first three rows of the following table.&lt;/p&gt;
&lt;p&gt;But... the table has quite a lot of columns. For a neater screenshot I only wanted to include a subset of columns in the final shot.&lt;/p&gt;
&lt;p&gt;Here's the screenshot I ended up taking:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png" alt="A screenshot of the suggested facet,s facets and first three rows and ten columns of the following table" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And the YAML recipe:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&amp;amp;_facet=party&amp;amp;_facet=state&amp;amp;_facet_size=10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selectors_all&lt;/span&gt;:
  - &lt;span class="pl-s"&gt;.suggested-facets a&lt;/span&gt;
  - &lt;span class="pl-s"&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;faceting-details.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The key trick I'm using here is that &lt;code&gt;selectors_all&lt;/code&gt; list.&lt;/p&gt;
&lt;p&gt;The usual &lt;code&gt;shot-scraper&lt;/code&gt; selector option finds the first element on the page matching the specified CSS selector and takes a screenshot of that.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;--selector-all&lt;/code&gt; - or the YAML equivalent &lt;code&gt;selectors_all&lt;/code&gt; - instead finds EVERY element that matches any of the specified selectors and draws a bounding box containing all of them.&lt;/p&gt;
&lt;p&gt;I wanted that bounding box to surround a subset of the table cells on the page. I used this CSS selector to indicate that subset:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Here's what GPT-3 says if you &lt;a href="https://simonwillison.net/2022/Jul/9/gpt-3-explain-code/"&gt;ask it to explain&lt;/a&gt; the selector:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Explain this CSS selector:&lt;/p&gt;
&lt;p&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This selector is selecting all table cells in rows that are not the fourth row or greater, and are not in columns that are the 11th column or greater.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(See also &lt;a href="https://til.simonwillison.net/shot-scraper/subset-of-table-columns"&gt;this TIL&lt;/a&gt;.)&lt;/p&gt;
&lt;h4&gt;Automating everything using GitHub Actions&lt;/h4&gt;
&lt;p&gt;Here's the full &lt;code&gt;shots.yml&lt;/code&gt; YAML needed to generate all four of these screenshots:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;585&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;960&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;regmem-search.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;#export&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;advanced-export.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&amp;amp;_facet=party&amp;amp;_facet=state&amp;amp;_facet_size=10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selectors_all&lt;/span&gt;:
  - &lt;span class="pl-s"&gt;.suggested-facets a&lt;/span&gt;
  - &lt;span class="pl-s"&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;faceting-details.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures/binary_data&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;table&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;    Array.from(&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelectorAll('tr:nth-child(n+3)'),&lt;/span&gt;
&lt;span class="pl-s"&gt;      el =&amp;gt; el.parentNode.removeChild(el)&lt;/span&gt;
&lt;span class="pl-s"&gt;    );&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;binary-data.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running &lt;code&gt;shot-scraper shots shots.yml&lt;/code&gt; against this file takes all four screenshots.&lt;/p&gt;
&lt;p&gt;But I want this to be fully automated! So I turned to &lt;a href="https://github.com/features/actions"&gt;GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A while ago I created a template repository for setting up GitHub Actions to take screenshots using &lt;code&gt;shot-scraper&lt;/code&gt; and write them back to the same repo. I wrote about that in &lt;a href="https://simonwillison.net/2022/Mar/14/shot-scraper-template/"&gt;Instantly create a GitHub repository to take screenshots of a web page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I had previously used that recipe to create my &lt;a href="https://github.com/simonw/datasette-screenshots"&gt;datasette-screenshots&lt;/a&gt; repository - with its own &lt;code&gt;shots.yml&lt;/code&gt; file.&lt;/p&gt;
&lt;p&gt;So I added the new YAML to that existing file, committed the change, waited a minute and the result was all four images stored in that repository!&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;datasette-screenshots&lt;/code&gt; &lt;a href="https://github.com/simonw/datasette-screenshots/blob/main/.github/workflows/shots.yml"&gt;workflow&lt;/a&gt; actually has two key changes from my default template. First, it takes every screenshot twice - once as a retina image and once as a regular image:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take retina shots&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper multi shots.yml --retina&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take non-retina shots&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        mkdir -p non-retina&lt;/span&gt;
&lt;span class="pl-s"&gt;        cd non-retina&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper multi ../shots.yml&lt;/span&gt;
&lt;span class="pl-s"&gt;        cd ..&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This provides me with both a high quality image and a smaller, faster-loading image for each screenshot.&lt;/p&gt;
&lt;p&gt;Secondly, it runs &lt;code&gt;oxipng&lt;/code&gt; to optimize the PNGs before committing them to the repo:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Optimize PNGs&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;        oxipng -o 4 -i 0 --strip safe *.png&lt;/span&gt;
&lt;span class="pl-s"&gt;        oxipng -o 4 -i 0 --strip safe non-retina/*.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;a href="https://shot-scraper.datasette.io/en/stable/github-actions.html#optimizing-pngs-using-oxipng"&gt;shot-scraper documentation&lt;/a&gt; describes this pattern in more detail.&lt;/p&gt;
&lt;p&gt;With all of that in place, simply committing a change to the &lt;code&gt;shots.yml&lt;/code&gt; file is enough to generate and store the new screenshots.&lt;/p&gt;
&lt;h4&gt;Linking to the images&lt;/h4&gt;
&lt;p&gt;One last problem to solve: I want to include these images in my documentation, which means I need a way to link to them.&lt;/p&gt;
&lt;p&gt;I decided to use GitHub to host these directly, via the &lt;code&gt;raw.githubusercontent.com&lt;/code&gt; domain - which is fronted by the Fastly CDN.&lt;/p&gt;
&lt;p&gt;I care about up-to-date images, but I also want different versions of the Datasette documentation to reflect the corresponding design in their screenshots - so I needed a way to snapshot those screenshots to a known version.&lt;/p&gt;
&lt;p&gt;Repository tags are one way to do this.&lt;/p&gt;
&lt;p&gt;I tagged the &lt;code&gt;datasette-screenshots&lt;/code&gt; repository with &lt;code&gt;0.62&lt;/code&gt;, since that's the version of Datasette that the screenshots were taken for.&lt;/p&gt;
&lt;p&gt;This gave me the following URLs for the images:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png&lt;/a&gt; (retina)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png&lt;/a&gt; (retina)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To save on page loading time I decided to use the non-retina URLs for the two larger images.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/commit/fdf9891c3f0313af9244778574c7ebaac9c3a438"&gt;the commit&lt;/a&gt; that updated the Datasette documentation to link to these new images (and deleted the old images from the repo).&lt;/p&gt;
&lt;p&gt;You can see the new images in the documentation on these pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/csv_export.html"&gt;https://docs.datasette.io/en/latest/csv_export.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/binary_data.html"&gt;https://docs.datasette.io/en/latest/binary_data.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/facets.html"&gt;https://docs.datasette.io/en/latest/facets.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/full_text_search.html"&gt;https://docs.datasette.io/en/latest/full_text_search.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="datasette"/><category term="github-actions"/><category term="shot-scraper"/></entry><entry><title>Software engineering practices</title><link href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#atom-series" rel="alternate"/><published>2022-10-01T15:56:02+00:00</published><updated>2022-10-01T15:56:02+00:00</updated><id>https://simonwillison.net/2022/Oct/1/software-engineering-practices/#atom-series</id><summary type="html">
    &lt;p&gt;Gergely Orosz &lt;a href="https://twitter.com/GergelyOrosz/status/1576161504260657152"&gt;started a Twitter conversation&lt;/a&gt; asking about recommended "software engineering practices" for development teams.&lt;/p&gt;
&lt;p&gt;(I really like his rejection of the term "best practices" here: I always feel it's prescriptive and misguiding to announce something as "best".)&lt;/p&gt;
&lt;p&gt;I decided to flesh some of my replies out into a longer post.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#docs-same-repo"&gt;Documentation in the same repo as the code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#create-test-data"&gt;Mechanisms for creating test data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#rock-solid-migrations"&gt;Rock solid database migrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#new-project-templates"&gt;Templates for new projects and components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#auto-formatting"&gt;Automated code formatting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#tested-dev-environments"&gt;Tested, automated process for new development environments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#automated-previews"&gt;Automated preview environments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="docs-same-repo"&gt;Documentation in the same repo as the code&lt;/h4&gt;
&lt;p&gt;The most important characteristic of internal documentation is trust: do people trust that documentation both exists and is up-to-date?&lt;/p&gt;
&lt;p&gt;If they don't, they won't read it or contribute to it.&lt;/p&gt;
&lt;p&gt;The best trick I know of for improving the trustworthiness of documentation is to put it in the same repository as the code it documents, for a few reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You can enforce documentation updates as part of your code review process. If a PR changes code in a way that requires documentation updates, the reviewer can ask for those updates to be included.&lt;/li&gt;
&lt;li&gt;You get versioned documentation. If you're using an older version of a library you can consult the documentation for that version. If you're using the current main branch you can see documentation for that, without confusion over what corresponds to the most recent "stable" release.&lt;/li&gt;
&lt;li&gt;You can integrate your documentation with your automated tests! I wrote about this in &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;, which describes a pattern for introspecting code and then ensuring that the documentation at least has a section header that matches specific concepts, such as plugin hooks or configuration options.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="create-test-data"&gt;Mechanisms for creating test data&lt;/h4&gt;
&lt;p&gt;When you work on large products, your customers will inevitably find surprising ways to stress or break your system. They might create an event with over a hundred different types of ticket for example, or an issue thread with a thousand comments.&lt;/p&gt;
&lt;p&gt;These can expose performance issues that don't affect the majority of your users, but can still lead to service outages or other problems.&lt;/p&gt;
&lt;p&gt;Your engineers need a way to replicate these situations in their own development environments.&lt;/p&gt;
&lt;p&gt;One way to handle this is to provide tooling to import production data into local environments. This has privacy and security implications - what if a developer laptop gets stolen that happens to have a copy of your largest customer's data?&lt;/p&gt;
&lt;p&gt;A better approach is to have a robust system in place for generating test data, that covers a variety of different scenarios.&lt;/p&gt;
&lt;p&gt;You might have a button somewhere that creates an issue thread with a thousand fake comments, with a note referencing the bug that this helps emulate.&lt;/p&gt;
&lt;p&gt;Any time a new edge case shows up, you can add a new recipe to that system. That way engineers can replicate problems locally without needing copies of production data.&lt;/p&gt;
&lt;h4 id="rock-solid-migrations"&gt;Rock solid database migrations&lt;/h4&gt;
&lt;p&gt;The hardest part of large-scale software maintenance is inevitably the bit where you need to change your database schema.&lt;/p&gt;
&lt;p&gt;(I'm confident that one of the biggest reasons NoSQL databases became popular over the last decade was the pain people had associated with relational databases due to schema changes. Of course, NoSQL database schema modifications are still necessary, and often they're even more painful!)&lt;/p&gt;
&lt;p&gt;So you need to invest in a really good, version-controlled mechanism for managing schema changes. And a way to run them in production without downtime.&lt;/p&gt;
&lt;p&gt;If you do not have this your engineers will respond by being fearful of schema changes. Which means they'll come up with increasingly complex hacks to avoid them, which piles on technical debt.&lt;/p&gt;
&lt;p&gt;This is a deep topic. I mostly use Django for large database-backed applications, and Django has the best &lt;a href="https://docs.djangoproject.com/en/4.1/topics/migrations/"&gt;migration system&lt;/a&gt; I've ever personally experienced. If I'm working without Django I try to replicate its approach as closely as possible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The database knows which migrations have already been applied. This means when you run the "migrate" command it can run just the ones that are still needed - important for managing multiple databases, e.g. production, staging, test and development environments.&lt;/li&gt;
&lt;li&gt;A single command that applies pending migrations, and updates the database rows that record which migrations have been run.&lt;/li&gt;
&lt;li&gt;Optional: rollbacks. Django migrations can be rolled back, which is great for iterating in a development environment but using that in production is actually quite rare: I'll often ship a new migration that reverses the change instead rather than using a rollback, partly to keep the record of the mistake in version control.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even harder is the challenge of making schema changes without any downtime. I'm always interested in reading about new approaches for this - GitHub's &lt;a href="https://github.com/github/gh-ost"&gt;gh-ost&lt;/a&gt; is a neat solution for MySQL.&lt;/p&gt;
&lt;p&gt;An interesting consideration here is that it's rarely possible to have application code and database schema changes go out at the exact same instance in time. As a result, to avoid downtime you need to design every schema change with this in mind. The process needs to be:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Design a new schema change that can be applied without changing the application code that uses it.&lt;/li&gt;
&lt;li&gt;Ship that change to production, upgrading your database while keeping the old code working.&lt;/li&gt;
&lt;li&gt;Now ship new application code that uses the new schema.&lt;/li&gt;
&lt;li&gt;Ship a new schema change that cleans up any remaining work - dropping columns that are no longer used, for example.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This process is a pain. It's difficult to get right. The only way to get good at it is to practice it a lot over time.&lt;/p&gt;
&lt;p&gt;My rule is this: &lt;strong&gt;schema changes should be boring and common&lt;/strong&gt;, as opposed to being exciting and rare.&lt;/p&gt;
&lt;h4 id="new-project-templates"&gt;Templates for new projects and components&lt;/h4&gt;
&lt;p&gt;If you're working with microservices, your team will inevitably need to build new ones.&lt;/p&gt;
&lt;p&gt;If you're working in a monorepo, you'll still have elements of your codebase with similar structures - components and feature implementations of some sort.&lt;/p&gt;
&lt;p&gt;Be sure to have really good templates in place for creating these "the right way" - with the right directory structure, a README and a test suite with a single, dumb passing test.&lt;/p&gt;
&lt;p&gt;I like to use the Python &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; tool for this. I've also used GitHub template repositories, and I even have a neat trick for &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;combining the two&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These templates need to be maintained and kept up-to-date. The best way to do that is to make sure they are being used - every time a new project is created is a chance to revise the template and make sure it still reflects the recommended way to do things.&lt;/p&gt;
&lt;h4 id="auto-formatting"&gt;Automated code formatting&lt;/h4&gt;
&lt;p&gt;This one's easy. Pick a code formatting tool for your language - like &lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt; for Python or &lt;a href="https://prettier.io/"&gt;Prettier&lt;/a&gt; for JavaScript (I'm so jealous of how Go has &lt;a href="https://pkg.go.dev/cmd/gofmt"&gt;gofmt&lt;/a&gt; built in) - and run its "check" mode in your CI flow.&lt;/p&gt;
&lt;p&gt;Don't argue with its defaults, just commit to them.&lt;/p&gt;
&lt;p&gt;This saves an incredible amount of time in two places:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As an individual, you get back all of that mental energy you used to spend thinking about the best way to format your code and can spend it on something more interesting.&lt;/li&gt;
&lt;li&gt;As a team, your code reviews can entirely skip the pedantic arguments about code formatting. Huge productivity win!&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="tested-dev-environments"&gt;Tested, automated process for new development environments&lt;/h4&gt;
&lt;p&gt;The most painful part of any software project is inevitably setting up the initial development environment.&lt;/p&gt;
&lt;p&gt;The moment your team grows beyond a couple of people, you should invest in making this work better.&lt;/p&gt;
&lt;p&gt;At the very least, you need a documented process for creating a new environment - and it has to be known-to-work, so any time someone is onboarded using it they should be encouraged to fix any problems in the documentation or accompanying scripts as they encounter them.&lt;/p&gt;
&lt;p&gt;Much better is an automated process: a single script that gets everything up and running. Tools like Docker have made this a LOT easier over the past decade.&lt;/p&gt;
&lt;p&gt;I'm increasingly convinced that the best-in-class solution here is cloud-based development environments. The ability to click a button on a web page and have a fresh, working development environment running a few seconds later is a game-changer for large development teams.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.gitpod.io/"&gt;Gitpod&lt;/a&gt; and &lt;a href="https://github.com/features/codespaces"&gt;Codespaces&lt;/a&gt; are two of the most promising tools I've tried in this space.&lt;/p&gt;
&lt;p&gt;I've seen developers lose hours a week to issues with their development environment. Eliminating that across a large team is the equivalent of hiring several new full-time engineers!&lt;/p&gt;
&lt;h4 id="automated-previews"&gt;Automated preview environments&lt;/h4&gt;
&lt;p&gt;Reviewing a pull request is a lot easier if you can actually try out the changes.&lt;/p&gt;
&lt;p&gt;The best way to do this is with automated preview environments, directly linked to from the PR itself.&lt;/p&gt;
&lt;p&gt;These are getting increasingly easy to offer. &lt;a href="https://vercel.com/features/previews"&gt;Vercel&lt;/a&gt;, &lt;a href="https://www.netlify.com/products/deploy-previews/"&gt;Netlify&lt;/a&gt;, &lt;a href="https://render.com/docs/pull-request-previews"&gt;Render&lt;/a&gt; and &lt;a href="https://devcenter.heroku.com/articles/github-integration-review-apps"&gt;Heroku&lt;/a&gt; all have features that can do this. Building a custom system on top of something like &lt;a href="https://cloud.google.com/run"&gt;Google Cloud Run&lt;/a&gt; or &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt; is also possible with a bit of work.&lt;/p&gt;
&lt;p&gt;This is another one of those things which requires some up-front investment but will pay itself off many times over through increased productivity and quality of reviews.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/version-control"&gt;version-control&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/zero-downtime"&gt;zero-downtime&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/technical-debt"&gt;technical-debt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gergely-orosz"&gt;gergely-orosz&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="software-engineering"/><category term="testing"/><category term="version-control"/><category term="zero-downtime"/><category term="github-actions"/><category term="technical-debt"/><category term="gergely-orosz"/></entry><entry><title>Writing better release notes</title><link href="https://simonwillison.net/2022/Jan/31/release-notes/#atom-series" rel="alternate"/><published>2022-01-31T20:13:50+00:00</published><updated>2022-01-31T20:13:50+00:00</updated><id>https://simonwillison.net/2022/Jan/31/release-notes/#atom-series</id><summary type="html">
    &lt;p&gt;Release notes are an important part of the open source process. I've been thinking about these a lot recently, and I've assembled some thoughts on how to do a better job with them.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Write release notes&lt;/strong&gt;. Seriously - if you want people to take advantage of the work you have been doing to improve your projects, you need to tell them about it!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Include the date&lt;/strong&gt;. The date matters a lot, because I want to be able to determine how old a release is - especially important for the dependencies I am using.&lt;/p&gt;
&lt;p&gt;It’s much more reasonable to assume people will be running as a minimum a version from 3 years ago than one that came out last week.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Make sure people can link to the release notes&lt;/strong&gt; for a version. This can be a page-per-release or an anchor target on the releases page, but it needs to be possible and it needs to be discoverable.&lt;/p&gt;
&lt;p&gt;There are some projects for which I have to view source on the HTML page to find the anchor links for the version headers - don’t make me do that!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For larger releases, &lt;strong&gt;break them up with headers&lt;/strong&gt;. “New features” v.s. “Bug fixes” is a useful distinction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Emphasize the highlights&lt;/strong&gt;. It can be easy for the highlights of a larger release to get lost in a sea of bullet points. I’ll sometimes include an introductory paragraph highlighting the major themes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Link to relevant documentation&lt;/strong&gt; from the release notes. If I want to know more about a new feature that should be the best place for me to start.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Flesh them out with &lt;strong&gt;examples and screenshots&lt;/strong&gt;. Most release notes are pretty dry - there’s no space limit on these things, so feel free to use all of the tools at your disposal to best answer the question “what has changed and why does this matter to me?”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Link to the associated issue thread&lt;/strong&gt;. If I want to know more about a feature, and you’ve been using issues effectively, the issue link can tell me the entire story of the new feature in more detail than anything else.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Credit your contributors!&lt;/strong&gt; If someone helped build a feature the release notes are a great place to give them a shout out.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Once shipped, let people know&lt;/strong&gt;. I mainly use Twitter for this, but I also write about releases on my blog (in &lt;a href="https://simonwillison.net/tags/weeknotes/"&gt;my weeknotes&lt;/a&gt;) and push out news about major releases through &lt;a href="https://datasette.substack.com/"&gt;my project newsletter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When I tweet my release notes (&lt;a href="https://twitter.com/datasetteproj/status/1461070941862039552"&gt;recent example&lt;/a&gt;) I include both a link to them and, if they’re short, a screenshot of the release notes with an alt= attribute duplicating their content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;GitHub Releases and GitHub Actions&lt;/h4&gt;
&lt;p&gt;I really like the &lt;a href="https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository"&gt;GitHub releases feature&lt;/a&gt;. You can easily create new release attached to tags, and each release gets its own linkable page. Release notes are written in Markdown and you can edit them later on, expanding them further and fixing any typos or errors.&lt;/p&gt;
&lt;p&gt;Releases pages also automatically link to a zip or .tar.gz file of your repository at that tag, and you can attach binary builds to that page too.&lt;/p&gt;
&lt;p&gt;Plus they have a good API, and they integrate well with GitHub Actions.&lt;/p&gt;
&lt;p&gt;I manage all of my Python package releases using this - I have &lt;a href="https://github.com/simonw/sqlite-utils/blob/main/.github/workflows/publish.yml"&gt;an actions workflow&lt;/a&gt; which triggers on a new GitHub release, builds and packages that tag and then uploads it to PyPI - so all I have to do is click “new release” and fill in the form and my automation does the rest of the work.&lt;/p&gt;
&lt;p&gt;I use the GitHub API to show links to my most recent releases on both &lt;a href="https://datasette.io/"&gt;the Datasette homepage&lt;/a&gt; and my &lt;a href="https://github.com/simonw"&gt;personal GitHub profile&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Annotated release notes&lt;/h4&gt;
&lt;p&gt;Something I’ve been trying with my own projects is publishing &lt;a href="https://simonwillison.net/tags/annotatedreleasenotes/"&gt;annotated release notes&lt;/a&gt; to accompany the official ones.&lt;/p&gt;
&lt;p&gt;The idea here is that while the official release notes succinctly document “what changed”, the annotated ones are a place where I can provide personal notes about the new features - the background on them, what I learned along the way and my own opinions on what they are useful for and why.&lt;/p&gt;
&lt;p&gt;I enjoy writing these, but I’ve not yet got a great feel for if people find them useful or not - so I’m not ready to recommend them as a thing that other projects should aim to replicate. I like them though, so if you write them for your project I will look forward to reading them!&lt;/p&gt;
&lt;h4&gt;Some examples&lt;/h4&gt;
&lt;p&gt;I'm a huge fan of &lt;a href="https://docs.djangoproject.com/en/4.0/releases/"&gt;Django's release notes&lt;/a&gt; - they're some of the best I've ever seen. It's worth exploring both &lt;a href="https://docs.djangoproject.com/en/4.0/releases/4.0.2/"&gt;minor releases&lt;/a&gt; and &lt;a href="https://docs.djangoproject.com/en/4.0/releases/4.0/"&gt;major releases&lt;/a&gt; for an example of what this looks like when it's done really well.&lt;/p&gt;
&lt;p&gt;For my own projects, I put the most effort into the release notes &lt;a href="https://docs.datasette.io/en/stable/changelog.html"&gt;for Datasette&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html"&gt;for sqlite-utils&lt;/a&gt;. Both of those projects have a dedicated page in their documentation.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-10"&gt;sqlite-utils 3.10&lt;/a&gt; and &lt;a href="https://docs.datasette.io/en/stable/changelog.html#v0-44"&gt;Datasette 0.44&lt;/a&gt; are some of my better examples.&lt;/p&gt;
&lt;p&gt;I also write about them on my blog. Here's the full series of &lt;a href="https://simonwillison.net/series/datasette-release-notes/"&gt;annotated release notes for Datasette&lt;/a&gt;, and my write-ups of &lt;a href="https://simonwillison.net/series/sqlite-utils-features/"&gt;new features added to sqlite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My smaller projects use GitHub Releases rather than having a dedicated page in their documentation. &lt;a href="https://github.com/simonw/datasette-graphql/releases/tag/0.10"&gt;datasette-graphql 0.10&lt;/a&gt; and &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.9"&gt;s3-credentials 0.9&lt;/a&gt; are two good examples there.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/releasenotes"&gt;releasenotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/writing"&gt;writing&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-source"/><category term="releasenotes"/><category term="writing"/></entry><entry><title>How I build a feature</title><link href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#atom-series" rel="alternate"/><published>2022-01-12T18:10:17+00:00</published><updated>2022-01-12T18:10:17+00:00</updated><id>https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#atom-series</id><summary type="html">
    &lt;p&gt;I'm maintaining &lt;a href="https://github.com/simonw/simonw/blob/main/releases.md"&gt;a lot of different projects&lt;/a&gt; at the moment. I thought it would be useful to describe the process I use for adding a new feature to one of them, using the new &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#cli-create-database"&gt;sqlite-utils create-database&lt;/a&gt; command as an example.&lt;/p&gt;
&lt;p&gt;I like each feature to be represented by what I consider to be the &lt;strong&gt;perfect commit&lt;/strong&gt; - one that bundles together the implementation, the tests, the documentation and a link to an external issue thread.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 29th October 2022:&lt;/strong&gt; I wrote &lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/"&gt;more about the perfect commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;sqlite-utils create-database&lt;/code&gt; command is very simple: it creates a new, empty SQLite database file. You use it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;% sqlite-utils create-database empty.db
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#everything-starts-with-an-issue"&gt;Everything starts with an issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#development-environment"&gt;Development environment&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#automated-tests"&gt;Automated tests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#implementing-the-feature"&gt;Implementing the feature&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#code-formatting-with-black"&gt;Code formatting with Black&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#linting"&gt;Linting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#documentation"&gt;Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#committing-the-change"&gt;Committing the change&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#branches-and-pull-requests"&gt;Branches and pull requests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#release-notes-and-a-release"&gt;Release notes, and a release&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#a-live-demo"&gt;A live demo&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#tell-the-world-about-it"&gt;Tell the world about it&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/#more-examples-of-this-pattern"&gt;More examples of this pattern&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="everything-starts-with-an-issue"&gt;Everything starts with an issue&lt;/h4&gt;
&lt;p&gt;Every piece of work I do has an associated issue. This acts as ongoing work-in-progress notes and lets me record decisions, reference any research, drop in code snippets and sometimes even add screenshots and video - stuff that is really helpful but doesn't necessarily fit in code comments or commit messages.&lt;/p&gt;
&lt;p&gt;Even if it's a tiny improvement that's only a few lines of code, I'll still open an issue for it - sometimes just a few minutes before closing it again as complete.&lt;/p&gt;
&lt;p&gt;Any commits that I create that relate to an issue reference the issue number in their commit message. GitHub does a great job of automatically linking these together, bidirectionally so I can navigate from the commit to the issue or from the issue to the commit.&lt;/p&gt;
&lt;p&gt;Having an issue also gives me something I can link to from my release notes.&lt;/p&gt;
&lt;p&gt;In the case of the &lt;code&gt;create-database&lt;/code&gt; command, I opened &lt;a href="https://github.com/simonw/sqlite-utils/issues/348"&gt;this issue&lt;/a&gt; in November when I had the idea for the feature.&lt;/p&gt;
&lt;p&gt;I didn't do the work until over a month later - but because I had designed the feature in the issue comments I could get started on the implementation really quickly.&lt;/p&gt;
&lt;h4 id="development-environment"&gt;Development environment&lt;/h4&gt;
&lt;p&gt;Being able to quickly spin up a development environment for a project is crucial. All of my projects have a section in the README or the documentation describing how to do this - here's &lt;a href="https://sqlite-utils.datasette.io/en/stable/contributing.html"&gt;that section for sqlite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On my own laptop each project gets a directory, and I use &lt;code&gt;pipenv shell&lt;/code&gt; in that directory to activate a directory-specific virtual environment, then &lt;code&gt;pip install -e '.[test]'&lt;/code&gt; to install the dependencies and test dependencies.&lt;/p&gt;
&lt;h4 id="automated-tests"&gt;Automated tests&lt;/h4&gt;
&lt;p&gt;All of my features are accompanied by automated tests. This gives me the confidence to boldly make changes to the software in the future without fear of breaking any existing features.&lt;/p&gt;
&lt;p&gt;This means that writing tests needs to be as quick and easy as possible - the less friction here the better.&lt;/p&gt;
&lt;p&gt;The best way to make writing tests easy is to have a great testing framework in place from the very beginning of the project. My cookiecutter templates (&lt;a href="https://github.com/simonw/python-lib"&gt;python-lib&lt;/a&gt;, &lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; and &lt;a href="https://github.com/simonw/click-app"&gt;click-app&lt;/a&gt;) all configure &lt;a href="https://docs.pytest.org/"&gt;pytest&lt;/a&gt; and add a &lt;code&gt;tests/&lt;/code&gt; folder with a single passing test, to give me something to start adding tests to.&lt;/p&gt;
&lt;p&gt;I can't say enough good things about pytest. Before I adopted it, writing tests was a chore. Now it's an activity I genuinely look forward to!&lt;/p&gt;
&lt;p&gt;I'm not a religious adherent to writing the tests first - see &lt;a href="https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/"&gt;How to cheat at unit tests with pytest and Black&lt;/a&gt; for more thoughts on that - but I'll write the test first if it's pragmatic to do so.&lt;/p&gt;
&lt;p&gt;In the case of &lt;code&gt;create-database&lt;/code&gt;, writing the test first felt like the right thing to do. Here's the test I started with:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_create_database&lt;/span&gt;(&lt;span class="pl-s1"&gt;tmpdir&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;db_path&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;tmpdir&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-s"&gt;"test.db"&lt;/span&gt;
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-c1"&gt;not&lt;/span&gt; &lt;span class="pl-s1"&gt;db_path&lt;/span&gt;.&lt;span class="pl-en"&gt;exists&lt;/span&gt;()
    &lt;span class="pl-s1"&gt;result&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;CliRunner&lt;/span&gt;().&lt;span class="pl-en"&gt;invoke&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;cli&lt;/span&gt;.&lt;span class="pl-s1"&gt;cli&lt;/span&gt;, [&lt;span class="pl-s"&gt;"create-database"&lt;/span&gt;, &lt;span class="pl-en"&gt;str&lt;/span&gt;(&lt;span class="pl-s1"&gt;db_path&lt;/span&gt;)]
    )
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt;.&lt;span class="pl-s1"&gt;exit_code&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;0&lt;/span&gt;
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;db_path&lt;/span&gt;.&lt;span class="pl-en"&gt;exists&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;This test uses the &lt;a href="https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmpdir-fixture"&gt;tmpdir pytest fixture&lt;/a&gt; to provide a temporary directory that will be automatically cleaned up by pytest after the test run finishes.&lt;/p&gt;
&lt;p&gt;It checks that the &lt;code&gt;test.db&lt;/code&gt; file doesn't exist yet, then uses the Click framework's &lt;a href="https://click.palletsprojects.com/en/8.0.x/testing/"&gt;CliRunner utility&lt;/a&gt; to execute the create-database command. Then it checks that the command didn't throw an error and that the file has been created.&lt;/p&gt;
&lt;p&gt;The I run the test, and watch it fail - because I haven't built the feature yet!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;% pytest -k test_create_database

============ test session starts ============
platform darwin -- Python 3.8.2, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/simon/Dropbox/Development/sqlite-utils
plugins: cov-2.12.1, hypothesis-6.14.5
collected 808 items / 807 deselected / 1 selected                           

tests/test_cli.py F                                                   [100%]

================= FAILURES ==================
___________ test_create_database ____________

tmpdir = local('/private/var/folders/wr/hn3206rs1yzgq3r49bz8nvnh0000gn/T/pytest-of-simon/pytest-659/test_create_database0')

    def test_create_database(tmpdir):
        db_path = tmpdir / "test.db"
        assert not db_path.exists()
        result = CliRunner().invoke(
            cli.cli, ["create-database", str(db_path)]
        )
&amp;gt;       assert result.exit_code == 0
E       assert 1 == 0
E        +  where 1 = &amp;lt;Result SystemExit(1)&amp;gt;.exit_code

tests/test_cli.py:2097: AssertionError
========== short test summary info ==========
FAILED tests/test_cli.py::test_create_database - assert 1 == 0
===== 1 failed, 807 deselected in 0.99s ====
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-k&lt;/code&gt; option lets me run any test that match the search string, rather than running the full test suite. I use this all the time.&lt;/p&gt;
&lt;p&gt;Other pytest features I often use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pytest -x&lt;/code&gt;: runs the entire test suite but quits at the first test that fails&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pytest --lf&lt;/code&gt;: re-runs any tests that failed during the last test run&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pytest --pdb -x&lt;/code&gt;: open the Python debugger at the first failed test (omit the &lt;code&gt;-x&lt;/code&gt; to open it at every failed test). This is the main way I interact with the Python debugger. I often use this to help write the tests, since I can add &lt;code&gt;assert False&lt;/code&gt; and get a shell inside the test to interact with various objects and figure out how to best run assertions against them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="implementing-the-feature"&gt;Implementing the feature&lt;/h4&gt;
&lt;p&gt;Test in place, it's time to implement the command. I added this code to my existing &lt;a href="https://github.com/simonw/sqlite-utils/blob/3.20/sqlite_utils/cli.py"&gt;cli.py module&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;cli&lt;/span&gt;.&lt;span class="pl-en"&gt;command&lt;/span&gt;(&lt;span class="pl-s1"&gt;name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"create-database"&lt;/span&gt;)&lt;/span&gt;
&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;click&lt;/span&gt;.&lt;span class="pl-en"&gt;argument&lt;/span&gt;(&lt;/span&gt;
&lt;span class="pl-en"&gt;    &lt;span class="pl-s"&gt;"path"&lt;/span&gt;,&lt;/span&gt;
&lt;span class="pl-en"&gt;    &lt;span class="pl-s1"&gt;type&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;click&lt;/span&gt;.&lt;span class="pl-v"&gt;Path&lt;/span&gt;(&lt;span class="pl-s1"&gt;file_okay&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;, &lt;span class="pl-s1"&gt;dir_okay&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;, &lt;span class="pl-s1"&gt;allow_dash&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;),&lt;/span&gt;
&lt;span class="pl-en"&gt;    &lt;span class="pl-s1"&gt;required&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;,&lt;/span&gt;
&lt;span class="pl-en"&gt;)&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;create_database&lt;/span&gt;(&lt;span class="pl-s1"&gt;path&lt;/span&gt;):
    &lt;span class="pl-s"&gt;"Create a new empty database file."&lt;/span&gt;
    &lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sqlite_utils&lt;/span&gt;.&lt;span class="pl-v"&gt;Database&lt;/span&gt;(&lt;span class="pl-s1"&gt;path&lt;/span&gt;)
    &lt;span class="pl-s1"&gt;db&lt;/span&gt;.&lt;span class="pl-en"&gt;vacuum&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;(I happen to know that the quickest way to create an empty SQLite database file is to run &lt;code&gt;VACUUM&lt;/code&gt; against it.)&lt;/p&gt;
&lt;p&gt;The test now passes!&lt;/p&gt;
&lt;p&gt;I iterated on this implementation a little bit more, to add the &lt;code&gt;--enable-wal&lt;/code&gt; option I had designed &lt;a href="https://github.com/simonw/sqlite-utils/issues/348#issuecomment-983120066"&gt;in the issue comments&lt;/a&gt; - and updated the test to match. You can see the final implementation in this commit: &lt;a href="https://github.com/simonw/sqlite-utils/commit/1d64cd2e5b402ff957f9be2d9bb490d313c73989"&gt;1d64cd2e5b402ff957f9be2d9bb490d313c73989&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If I add a new test and it passes the first time, I’m always suspicious of it. I’ll deliberately break the test (change a 1 to a 2 for example) and run it again to make sure it fails, then change it back again.&lt;/p&gt;
&lt;h4 id="code-formatting-with-black"&gt;Code formatting with Black&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt; has increased my productivity as a Python developer by a material amount. I used to spend a whole bunch of brain cycles agonizing over how to indent my code, where to break up long function calls and suchlike. Thanks to Black I never think about this at all - I instinctively run &lt;code&gt;black .&lt;/code&gt; in the root of my project and accept whatever style decisions it applies for me.&lt;/p&gt;
&lt;h4 id="linting"&gt;Linting&lt;/h4&gt;
&lt;p&gt;I have a few linters set up to run on every commit. I can run these locally too - how to do that is &lt;a href="https://sqlite-utils.datasette.io/en/stable/contributing.html#linting-and-formatting"&gt;documented here&lt;/a&gt; - but I'm often a bit lazy and leave them to &lt;a href="https://github.com/simonw/sqlite-utils/blob/main/.github/workflows/test.yml"&gt;run in CI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this case one of my linters failed! I accidentally called the new command function &lt;code&gt;create_table()&lt;/code&gt; when it should have been called &lt;code&gt;create_database()&lt;/code&gt;. The code worked fine due to how the &lt;code&gt;cli.command(name=...)&lt;/code&gt; decorator works but &lt;code&gt;mypy&lt;/code&gt; &lt;a href="https://github.com/simonw/sqlite-utils/runs/4754944593?check_suite_focus=true"&gt;complained about&lt;/a&gt; the redefined function name. I fixed that in &lt;a href="https://github.com/simonw/sqlite-utils/commit/2f8879235afc6a06a8ae25ded1b2fe289ad8c3a6#diff-76294b3d4afeb27e74e738daa01c26dd4dc9ccb6f4477451483a2ece1095902e"&gt;a separate commit&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="documentation"&gt;Documentation&lt;/h4&gt;
&lt;p&gt;My policy these days is that if a feature isn't documented it doesn't exist. Updating existing documentation isn't much work at all if the documentation already exists, and over time these incremental improvements add up to something really comprehensive.&lt;/p&gt;
&lt;p&gt;For smaller projects I use a single &lt;code&gt;README.md&lt;/code&gt; which gets displayed on both GitHub and PyPI (and the Datasette website too, for example on &lt;a href="https://datasette.io/tools/git-history"&gt;datasette.io/tools/git-history&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;My larger projects, such as &lt;a href="https://docs.datasette.io/"&gt;Datasette&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt;, use &lt;a href="https://readthedocs.org/"&gt;Read the Docs&lt;/a&gt; and &lt;a href="https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html"&gt;reStructuredText&lt;/a&gt; with &lt;a href="https://www.sphinx-doc.org/"&gt;Sphinx&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;I like reStructuredText mainly because it has really good support for internal reference links - something that is missing from Markdown, though it can be enabled using &lt;a href="https://myst-parser.readthedocs.io"&gt;MyST&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sqlite-utils&lt;/code&gt; uses Sphinx. I have the &lt;a href="https://github.com/executablebooks/sphinx-autobuild"&gt;sphinx-autobuild&lt;/a&gt; extension configured, which means I can run a live reloading server with the documentation like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd docs
make livehtml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Any time I'm working on the documentation I have that server running, so I can hit "save" in VS Code and see a preview in my browser a few seconds later.&lt;/p&gt;
&lt;p&gt;For Markdown documentation I use the VS Code preview pane directly.&lt;/p&gt;
&lt;p&gt;The moment the documentation is live online, I like to add a link to it in a comment on the issue thread.&lt;/p&gt;
&lt;h4 id="committing-the-change"&gt;Committing the change&lt;/h4&gt;
&lt;p&gt;I run &lt;code&gt;git diff&lt;/code&gt; a LOT while hacking on code, to make sure I haven’t accidentally changed something unrelated. This also helps spot things like rogue &lt;code&gt;print()&lt;/code&gt; debug statements I may have added.&lt;/p&gt;
&lt;p&gt;Before my final commit, I sometimes even run &lt;code&gt;git diff | grep print&lt;/code&gt; to check for those.&lt;/p&gt;
&lt;p&gt;My goal with the commit is to bundle the test, documentation and implementation. If those are the only files I've changed I do this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git commit -a -m "sqlite-utils create-database command, closes #348"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If this completes the work on the issue I use "&lt;code&gt;closes #N&lt;/code&gt;", which causes GitHub to close the issue for me. If it's not yet ready to close I use "&lt;code&gt;refs #N&lt;/code&gt;" instead.&lt;/p&gt;
&lt;p&gt;Sometimes there will be unrelated changes in my working directory. If so, I use &lt;code&gt;git add &amp;lt;files&amp;gt;&lt;/code&gt; and then commit just with &lt;code&gt;git commit -m message&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="branches-and-pull-requests"&gt;Branches and pull requests&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;create-database&lt;/code&gt; is a good example of a feature that can be implemented in a single commit, with no need to work in a branch.&lt;/p&gt;
&lt;p&gt;For larger features, I'll work in a feature branch:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git checkout -b my-feature
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I'll make a commit (often just labelled "WIP prototype, refs #N") and then push that to GitHub and open a pull request for it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git push -u origin my-feature 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I ensure the new pull request links back to the issue in its description, then switch my ongoing commentary to comments on the pull request itself.&lt;/p&gt;
&lt;p&gt;I'll sometimes add a task checklist to the opening comment on the pull request, since tasks there get reflected in the GitHub UI anywhere that links to the PR. Then I'll check those off as I complete them.&lt;/p&gt;
&lt;p&gt;An example of a PR I used like this is &lt;a href="https://github.com/simonw/sqlite-utils/pull/361"&gt;#361: --lines and --text and --convert and --import&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I don't like merge commits - I much prefer to keep my &lt;code&gt;main&lt;/code&gt; branch history as linear as possible. I usually merge my PRs through the GitHub web interface using the squash feature, which results in a single, clean commit to main with the combined tests, documentation and implementation. Occasionally I will see value in keeping the individual commits, in which case I will rebase merge them.&lt;/p&gt;
&lt;p&gt;Another goal here is to keep the &lt;code&gt;main&lt;/code&gt; branch releasable at all times. Incomplete work should stay in a branch. This makes turning around and releasing quick bug fixes a lot less stressful!&lt;/p&gt;
&lt;h4 id="release-notes-and-a-release"&gt;Release notes, and a release&lt;/h4&gt;
&lt;p&gt;A feature isn't truly finished until it's been released to &lt;a href="https://pypi.org/"&gt;PyPI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;All of my projects are configured the same way: they use GitHub releases to trigger a GitHub Actions workflow which publishes the new release to PyPI. The &lt;code&gt;sqlite-utils&lt;/code&gt; workflow for that &lt;a href="https://github.com/simonw/sqlite-utils/blob/main/.github/workflows/publish.yml"&gt;is here in publish.yml&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; templates for new projects set up this workflow for me. I just need to create a PyPI token for the project and assign it as a repository secret. See the &lt;a href="https://github.com/simonw/python-lib"&gt;python-lib cookiecutter README&lt;/a&gt; for details.&lt;/p&gt;
&lt;p&gt;To push out a new release, I need to increment the version number in &lt;a href="https://github.com/simonw/sqlite-utils/blob/main/setup.py"&gt;setup.py&lt;/a&gt; and write the release notes.&lt;/p&gt;
&lt;p&gt;I use &lt;a href="https://semver.org/"&gt;semantic versioning&lt;/a&gt; - a new feature is a minor version bump, a breaking change is a major version bump (I try very hard to avoid these) and a bug fix or documentation-only update is a patch increment.&lt;/p&gt;
&lt;p&gt;Since &lt;code&gt;create-database&lt;/code&gt; was a new feature, it went out in &lt;a href="https://github.com/simonw/sqlite-utils/releases/3.21"&gt;release 3.21&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My projects that use Sphinx for documentation have &lt;a href="https://github.com/simonw/sqlite-utils/blob/main/docs/changelog.rst"&gt;changelog.rst&lt;/a&gt; files in their repositories. I add the release notes there, linking to the relevant issues and cross-referencing the new documentation. Then I ship a commit that bundles the release notes with the bumped version number, with a commit message that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git commit -m "Release 3.21

Refs #348, #364, #366, #368, #371, #372, #374, #375, #376, #379"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/sqlite-utils/commit/7c637b11805adc3d3970076a7ba6afe8e34b371e"&gt;the commit for release 3.21&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Referencing the issue numbers in the release automatically adds a note to their issue threads indicating the release that they went out in.&lt;/p&gt;
&lt;p&gt;I generate that list of issue numbers by pasting the release notes into an Observable notebook I built for the purpose: &lt;a href="https://observablehq.com/@simonw/extract-issue-numbers-from-pasted-text"&gt;Extract issue numbers from pasted text&lt;/a&gt;. Observable is really great for building this kind of tiny interactive utility.&lt;/p&gt;
&lt;p&gt;For projects that just have a README I write the release notes in Markdown and paste them directly into the GitHub "new release" form.&lt;/p&gt;
&lt;p&gt;I like to duplicate the release notes to GiHub releases for my Sphinx changelog projects too. This is mainly so the &lt;a href="https://datasette.io/"&gt;datasette.io&lt;/a&gt; website will display the release notes on its homepage, which is populated &lt;a href="https://simonwillison.net/2020/Dec/13/datasette-io/"&gt;at build time&lt;/a&gt; using the GitHub GraphQL API.&lt;/p&gt;
&lt;p&gt;To convert my reStructuredText to Markdown I copy and paste the rendered HTML into this brilliant &lt;a href="https://euangoddard.github.io/clipboard2markdown/"&gt;Paste to Markdown&lt;/a&gt; tool by &lt;a href="https://github.com/euangoddard"&gt;Euan Goddard&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="a-live-demo"&gt;A live demo&lt;/h4&gt;
&lt;p&gt;When possible, I like to have a live demo that I can link to.&lt;/p&gt;
&lt;p&gt;This is easiest for features in Datasette core. Datesette’s main branch gets &lt;a href="https://github.com/simonw/datasette/blob/0.60a1/.github/workflows/deploy-latest.yml#L51-L73"&gt;deployed automatically&lt;/a&gt; to &lt;a href="https://latest.datasette.io/"&gt;latest.datasette.io&lt;/a&gt; so I can often link to a demo there.&lt;/p&gt;
&lt;p&gt;For Datasette plugins, I’ll deploy a fresh instance with the plugin (e.g. &lt;a href="https://datasette-graphql-demo.datasette.io/"&gt;this one for datasette-graphql&lt;/a&gt;) or (more commonly) add it to my big &lt;a href="https://latest-with-plugins.datasette.io/"&gt;latest-with-plugins.datasette.io&lt;/a&gt; instance - which tries to demonstrate what happens to Datasette if you install dozens of plugins at once (so far it works OK).&lt;/p&gt;
&lt;p&gt;Here’s a demo of the &lt;a href="https://datasette.io/plugins/datasette-copyable"&gt;datasette-copyable plugin&lt;/a&gt; running there:  &lt;a href="https://latest-with-plugins.datasette.io/github/commits.copyable"&gt;https://latest-with-plugins.datasette.io/github/commits.copyable&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="tell-the-world-about-it"&gt;Tell the world about it&lt;/h4&gt;
&lt;p&gt;The last step is to tell the world (beyond the people who meticulously read the release notes) about the new feature.&lt;/p&gt;
&lt;p&gt;Depending on the size of the feature, I might do this with a tweet &lt;a href="https://twitter.com/simonw/status/1455266746701471746"&gt;like this one&lt;/a&gt; - usually with a screenshot and a link to the documentation. I often extend this into a short Twitter thread, which gives me a chance to link to related concepts and demos or add more screenshots.&lt;/p&gt;
&lt;p&gt;For larger or more interesting feature I'll blog about them. I may save this for my weekly &lt;a href="https://simonwillison.net/tags/weeknotes/"&gt;weeknotes&lt;/a&gt;, but sometimes for particularly exciting features I'll write up a dedicated blog entry. Some examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-table/"&gt;Executing advanced ALTER TABLE operations in SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2020/Jul/30/fun-binary-data-and-sqlite/"&gt;Fun with binary data and SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/"&gt;Refactoring databases with sqlite-utils extract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/"&gt;Joining CSV and JSON data with an in-memory SQLite database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/"&gt;Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I may even assemble a full set of &lt;a href="https://simonwillison.net/tags/annotatedreleasenotes/"&gt;annotated release notes&lt;/a&gt; on my blog, where I quote each item from the release in turn and provide some fleshed out examples plus background information on why I built it.&lt;/p&gt;
&lt;p&gt;If it’s a new Datasette (or Datasette-adjacent) feature, I’ll try to remember to write about it in the next edition of the &lt;a href="https://datasette.substack.com/"&gt;Datasette Newsletter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, if I learned a new trick while building a feature I might extract that into &lt;a href="https://til.simonwillison.net/"&gt;a TIL&lt;/a&gt;. If I do that I'll link to the new TIL from the issue thread.&lt;/p&gt;
&lt;h4 id="more-examples-of-this-pattern"&gt;More examples of this pattern&lt;/h4&gt;
&lt;p&gt;Here are a bunch of examples of commits that implement this pattern, combining the tests, implementation and documentation into a single unit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sqlite-utils: &lt;a href="https://github.com/simonw/sqlite-utils/commit/324ebc31308752004fe5f7e4941fc83706c5539c"&gt;adding —limit and —offset to sqlite-utils rows&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;sqlite-utils: &lt;a href="https://github.com/simonw/sqlite-utils/commit/d83b2568131f2b1cc01228419bb08c96d843d65d"&gt;--where and -p options for sqlite-utils convert&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;s3-credentials: &lt;a href="https://github.com/simonw/s3-credentials/commit/905258379817e8b458528e4ccc5e6cc2c8cf4352"&gt;s3-credentials policy command&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;datasette: &lt;a href="https://github.com/simonw/datasette/commit/5cadc244895fc47e0534c6e90df976d34293921e"&gt;db.execute_write_script() and db.execute_write_many()&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;datasette: &lt;a href="https://github.com/simonw/datasette/commit/992496f2611a72bd51e94bfd0b17c1d84e732487"&gt;?_nosuggest=1 parameter for table views&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;datasette-graphql: &lt;a href="https://github.com/simonw/datasette-graphql/commit/2d8c042e93e3429c5b187121d26f8817997073dd"&gt;GraphQL execution limits: time_limit_ms and num_queries_limit&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/git"&gt;git&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytest"&gt;pytest&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/black"&gt;black&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/read-the-docs"&gt;read-the-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="git"/><category term="github"/><category term="software-engineering"/><category term="testing"/><category term="pytest"/><category term="black"/><category term="read-the-docs"/><category term="github-issues"/></entry><entry><title>How to build, test and publish an open source Python library</title><link href="https://simonwillison.net/2021/Nov/4/publish-open-source-python-library/#atom-series" rel="alternate"/><published>2021-11-04T22:02:03+00:00</published><updated>2021-11-04T22:02:03+00:00</updated><id>https://simonwillison.net/2021/Nov/4/publish-open-source-python-library/#atom-series</id><summary type="html">
    &lt;p&gt;At &lt;a href="https://2021.pygotham.tv/talks/how-to-build-test-and-publish-an-open-source-python-library/"&gt;PyGotham&lt;/a&gt; this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to &lt;a href="https://pypi.org/"&gt;the Python Package Index&lt;/a&gt;. Here is &lt;a href="https://www.youtube.com/watch?v=VMnLXynUqys"&gt;the video&lt;/a&gt; and accompanying notes, which should make sense even without watching the talk.&lt;/p&gt;
&lt;h4&gt;The video&lt;/h4&gt;
&lt;iframe style="max-width: 100%" width="560" height="315" src="https://www.youtube-nocookie.com/embed/VMnLXynUqys" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;
&lt;p&gt;PyGotham arrange for sign language interpretation for all of their talks, which is &lt;em&gt;really&lt;/em&gt; cool. Since those take up a portion of the screen (and YouTube don't yet have a way to apply them as a different layer) I've also made available a copy of &lt;a href="https://www.youtube.com/watch?v=0SPqMR08VWU"&gt;the original video&lt;/a&gt; without the sign language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 29th July 2022:&lt;/strong&gt; There is a more modern way to create Python packages that uses &lt;code&gt;pyproject.toml&lt;/code&gt; files instead of &lt;code&gt;setup.py&lt;/code&gt;. This &lt;a href="https://packaging.python.org/en/latest/tutorials/packaging-projects/"&gt;tutorial on Packaging Python Projects&lt;/a&gt; from the PyPA shows how to do this in detail.&lt;/p&gt;
&lt;h4&gt;Packaging a single module&lt;/h4&gt;
&lt;p&gt;I used &lt;a href="https://github.com/CAVaccineInventory/vial/blob/5c5f7eb344d28afb91e947243a6f96b337ea0ce2/vaccinate/core/baseconverter.py"&gt;this code&lt;/a&gt; which I've been copying and pasting between my own projects for over a decade.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;BaseConverter&lt;/code&gt; is a simple class that can convert an integer to a shortened character string and back again, for example:&lt;/p&gt;
&lt;div class="highlight highlight-text-python-console"&gt;&lt;pre&gt;&amp;gt;&amp;gt;&amp;gt; pid &lt;span class="pl-k"&gt;=&lt;/span&gt; BaseConverter(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;bcdfghkmpqrtwxyz&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;)
&amp;gt;&amp;gt;&amp;gt; pid.from_int(&lt;span class="pl-c1"&gt;1234&lt;/span&gt;)
"gxd"
&amp;gt;&amp;gt;&amp;gt; pid.to_int(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;gxd&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;)
1234&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To turn this into a library, first I created a &lt;code&gt;pids/&lt;/code&gt; directory (I've chosen that name because it's available on PyPI).&lt;/p&gt;
&lt;p&gt;To turn that code into a package:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkdir pids &amp;amp;&amp;amp; cd pids&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create &lt;code&gt;pids.py&lt;/code&gt; in that directory with the contents of &lt;a href="https://github.com/CAVaccineInventory/vial/blob/main/vaccinate/core/baseconverter.py"&gt;this file&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a new &lt;code&gt;setup.py&lt;/code&gt; file in that folder containing the following:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;setuptools&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;setup&lt;/span&gt;

&lt;span class="pl-en"&gt;setup&lt;/span&gt;(
    &lt;span class="pl-s1"&gt;name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"pids"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;version&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"0.1"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;description&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"A tiny Python library for generating public IDs from integers"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;author&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Simon Willison"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;url&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"https://github.com/simonw/..."&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;license&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Apache License, Version 2.0"&lt;/span&gt;,    
    &lt;span class="pl-s1"&gt;py_modules&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[&lt;span class="pl-s"&gt;"pids"&lt;/span&gt;],
)&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run &lt;code&gt;python3 setup.py sdist&lt;/code&gt; to create the packaged source distribution, &lt;code&gt;dist/pids-0.1.tar.gz&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;(I've since learned that it's better to run &lt;code&gt;python3 -m build&lt;/code&gt; here instead, see &lt;a href="https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html"&gt; Why you shouldn't invoke setup.py directly&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is all it takes: a &lt;code&gt;setup.py&lt;/code&gt; file with some metadata, then a single command to turn that into a packaged &lt;code&gt;.tar.gz&lt;/code&gt; file.&lt;/p&gt;
&lt;h4&gt;Testing it in a Jupyter notebook&lt;/h4&gt;
&lt;p&gt;Having created that file, I demonstrated how it can be tested in a Jupyter notebook.&lt;/p&gt;
&lt;p&gt;Jupyter has a &lt;code&gt;%pip&lt;/code&gt; magic command which runs &lt;code&gt;pip&lt;/code&gt; to install a package into the same environment as the current Jupyter kernel:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;%pip install /Users/simon/Dropbox/Presentations/2021/pygotham/pids/dist/pids-0.1.tar.gz
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Having done this, I could excute the library like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import pids
&amp;gt;&amp;gt;&amp;gt; pids.pid.from_int(1234)
'gxd'
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Uploading the package to PyPI&lt;/h4&gt;
&lt;p&gt;I used &lt;a href="https://pypi.org/project/twine/"&gt;twine&lt;/a&gt; (&lt;code&gt;pip install twine&lt;/code&gt;) to upload my new package to &lt;a href="https://pypi.org/"&gt;PyPI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You need to create a PyPI account before running this command.&lt;/p&gt;
&lt;p&gt;By default you need to paste in the PyPI account's username and password:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;% twine upload dist/pids-0.1.tar.gz
Uploading distributions to https://upload.pypi.org/legacy/
Enter your username: simonw
Enter your password: 
Uploading pids-0.1.tar.gz
100%|██████████████████████████████████████| 4.16k/4.16k [00:00&amp;lt;00:00, 4.56kB/s]

View at:
https://pypi.org/project/pids/0.1/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The release is now live at &lt;a href="https://pypi.org/project/pids/0.1/"&gt;https://pypi.org/project/pids/0.1/&lt;/a&gt; - and anyone can run &lt;code&gt;pip install pids&lt;/code&gt; to install it.&lt;/p&gt;
&lt;h2&gt;Adding documentation&lt;/h2&gt;
&lt;p&gt;If you visit the &lt;a href="https://pypi.org/project/pids/0.1/"&gt;0.1 release page&lt;/a&gt; you'll see the following message:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The author of this package has not provided a project description&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To fix this I added a &lt;code&gt;README.md&lt;/code&gt; file with some basic documentation, written in Markdown:&lt;/p&gt;
&lt;div class="highlight highlight-source-gfm"&gt;&lt;pre&gt;&lt;span class="pl-mh"&gt;&lt;span class="pl-mh"&gt;#&lt;/span&gt;&lt;span class="pl-mh"&gt; &lt;/span&gt;pids&lt;/span&gt;

Create short public identifiers based on integer IDs.

&lt;span class="pl-mh"&gt;&lt;span class="pl-mh"&gt;##&lt;/span&gt;&lt;span class="pl-mh"&gt; &lt;/span&gt;Installation&lt;/span&gt;

    pip install pids

&lt;span class="pl-mh"&gt;&lt;span class="pl-mh"&gt;##&lt;/span&gt;&lt;span class="pl-mh"&gt; &lt;/span&gt;Usage&lt;/span&gt;

    from pids import pid
    public_id = pid.from_int(1234)
    # public_id is now "gxd"
    id = pid.to_int("gxd")
    # id is now 1234&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I modified my &lt;code&gt;setup.py&lt;/code&gt; file to look like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;setuptools&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;setup&lt;/span&gt;
&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;os&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;get_long_description&lt;/span&gt;():
    &lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;open&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;os&lt;/span&gt;.&lt;span class="pl-s1"&gt;path&lt;/span&gt;.&lt;span class="pl-en"&gt;join&lt;/span&gt;(&lt;span class="pl-s1"&gt;os&lt;/span&gt;.&lt;span class="pl-s1"&gt;path&lt;/span&gt;.&lt;span class="pl-en"&gt;dirname&lt;/span&gt;(&lt;span class="pl-s1"&gt;__file__&lt;/span&gt;), &lt;span class="pl-s"&gt;"README.md"&lt;/span&gt;),
        &lt;span class="pl-s1"&gt;encoding&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"utf8"&lt;/span&gt;,
    ) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;fp&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;fp&lt;/span&gt;.&lt;span class="pl-en"&gt;read&lt;/span&gt;()

&lt;span class="pl-en"&gt;setup&lt;/span&gt;(
    &lt;span class="pl-s1"&gt;name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"pids"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;version&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"0.1.1"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;long_description&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-en"&gt;get_long_description&lt;/span&gt;(),
    &lt;span class="pl-s1"&gt;long_description_content_type&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"text/markdown"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;description&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"A tiny Python library for generating public IDs from integers"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;author&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Simon Willison"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;url&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"https://github.com/simonw/..."&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;license&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Apache License, Version 2.0"&lt;/span&gt;,    
    &lt;span class="pl-s1"&gt;py_modules&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[&lt;span class="pl-s"&gt;"pids"&lt;/span&gt;],
)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;get_long_description()&lt;/code&gt; function reads that &lt;code&gt;README.md&lt;/code&gt; file into a Python string.&lt;/p&gt;
&lt;p&gt;The following two extra arguments to &lt;code&gt;setup()&lt;/code&gt; add that as metadata visible to PyPI:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;long_description&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-en"&gt;get_long_description&lt;/span&gt;(),
&lt;span class="pl-s1"&gt;long_description_content_type&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"text/markdown"&lt;/span&gt;,&lt;/pre&gt;
&lt;p&gt;I also updated the version number to &lt;code&gt;0.1.1&lt;/code&gt; in preparation for a new release.&lt;/p&gt;
&lt;p&gt;Running &lt;code&gt;python3 setup.py sdist&lt;/code&gt; created a new file called &lt;code&gt;dist/pids-0.1.1.tar.gz&lt;/code&gt; - I then uploaded that file using &lt;code&gt;twine upload dist/pids-0.1.1.tar.gz&lt;/code&gt; which created a new release with a visible README at &lt;a href="https://pypi.org/project/pids/0.1.1/"&gt;https://pypi.org/project/pids/0.1.1/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Adding some tests&lt;/h2&gt;
&lt;p&gt;I like using &lt;a href="https://docs.pytest.org/"&gt;pytest&lt;/a&gt; for tests, so I added that as a test dependency by modifying &lt;code&gt;setup.py&lt;/code&gt; to add the following line:&lt;/p&gt;
&lt;pre&gt;    &lt;span class="pl-s1"&gt;extras_require&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;{&lt;span class="pl-s"&gt;"test"&lt;/span&gt;: [&lt;span class="pl-s"&gt;"pytest"&lt;/span&gt;]},&lt;/pre&gt;
&lt;p&gt;Next, I created a virtual environment and installed my package and its test dependencies into an "editable" mode like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Create and activate environment
python3 -m venv venv
source venv/bin/activate
# Install editable module, plus pytest
pip install -e '.[test]'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can run the tests!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(venv) pids % pytest
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /Users/simon/Dropbox/Presentations/2021/pygotham/pids
collected 0 items                                                              

============================ no tests ran in 0.01s =============================
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There aren't any tests yet. I created a &lt;code&gt;tests/&lt;/code&gt; folder and dropped in a &lt;code&gt;test_pids.py&lt;/code&gt; file that looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pytest&lt;/span&gt;
&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pids&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_from_int&lt;/span&gt;():
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;pids&lt;/span&gt;.&lt;span class="pl-s1"&gt;pid&lt;/span&gt;.&lt;span class="pl-en"&gt;from_int&lt;/span&gt;(&lt;span class="pl-c1"&gt;1234&lt;/span&gt;) &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s"&gt;"gxd"&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_to_int&lt;/span&gt;():
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;pids&lt;/span&gt;.&lt;span class="pl-s1"&gt;pid&lt;/span&gt;.&lt;span class="pl-en"&gt;to_int&lt;/span&gt;(&lt;span class="pl-s"&gt;"gxd"&lt;/span&gt;) &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;1234&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;Running &lt;code&gt;pytest&lt;/code&gt; in the project directory now runs those tests:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(venv) pids % pytest
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /Users/simon/Dropbox/Presentations/2021/pygotham/pids
collected 2 items                                                              

tests/test_pids.py ..                                                    [100%]

============================== 2 passed in 0.01s ===============================
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Creating a GitHub repository&lt;/h4&gt;
&lt;p&gt;Time to publish the source code on GitHub.&lt;/p&gt;
&lt;p&gt;I created a repository using the form at &lt;a href="https://github.com/new"&gt;https://github.com/new&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Having created the &lt;a href="https://github.com/simonw/pids"&gt;simonw/pids&lt;/a&gt; repository, I ran the following commands locally to push my code to it (mostly copy and pasted from the GitHub example):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git init
git add README.md pids.py setup.py tests/test_pids.py
git commit -m "first commit"
git branch -M main
git remote add origin git@github.com:simonw/pids.git
git push -u origin main
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Running the tests with GitHub Actions&lt;/h4&gt;
&lt;p&gt;I copied in a &lt;code&gt;.github/workflows&lt;/code&gt; folder from &lt;a href="https://github.com/simonw/sqlite-explain/tree/main/.github/workflows"&gt;another project&lt;/a&gt; with two files, &lt;code&gt;test.yml&lt;/code&gt; and &lt;code&gt;publish.yml&lt;/code&gt;. The &lt;code&gt;.github/workflows/test.yml&lt;/code&gt; file contained this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Test&lt;/span&gt;

&lt;span class="pl-ent"&gt;on&lt;/span&gt;: &lt;span class="pl-s"&gt;[push]&lt;/span&gt;

&lt;span class="pl-ent"&gt;jobs&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;test&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;strategy&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;matrix&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;[3.6, 3.7, 3.8, 3.9]&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v2&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python ${{ matrix.python-version }}&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ matrix.python-version }}&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Configure pip caching&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}&lt;/span&gt;
        &lt;span class="pl-ent"&gt;restore-keys&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          ${{ runner.os }}-pip-&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install -e '.[test]'&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Run tests&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pytest&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;matrix&lt;/code&gt; block there causes the job to run four times, on four different versions of Python.&lt;/p&gt;
&lt;p&gt;The action steps do the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Checkout the current repository&lt;/li&gt;
&lt;li&gt;Install the specified Python version&lt;/li&gt;
&lt;li&gt;Configure GitHub's action caching mechanism for the &lt;code&gt;~/.cache/pip&lt;/code&gt; directory - this avoids installing the same files from PyPI over the internet the next time the workflow runs&lt;/li&gt;
&lt;li&gt;Install the test dependencies&lt;/li&gt;
&lt;li&gt;Run the tests&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I added and pushed these new files:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git add .github
git commit -m "GitHub Actions"
git push
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/pids/actions"&gt;Actions tab&lt;/a&gt; in my repository instantly ran the test suite, and when it passed added a green checkmark to &lt;a href="https://github.com/simonw/pids/commit/93029ab191c00ab8274a1b88548d279b45411c3a"&gt;my commit&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Publishing a new release using GitHub&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;.github/workflows/publish.yml&lt;/code&gt; workflow is triggered by new GitHub releases. It tests them and then, if the tests pass, publishes them up to PyPI using &lt;code&gt;twine&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The workflow looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Publish Python Package&lt;/span&gt;

&lt;span class="pl-ent"&gt;on&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;release&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;types&lt;/span&gt;: &lt;span class="pl-s"&gt;[created]&lt;/span&gt;

&lt;span class="pl-ent"&gt;jobs&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;test&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;strategy&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;matrix&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;[3.6, 3.7, 3.8, 3.9]&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v2&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python ${{ matrix.python-version }}&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ matrix.python-version }}&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Configure pip caching&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}&lt;/span&gt;
        &lt;span class="pl-ent"&gt;restore-keys&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          ${{ runner.os }}-pip-&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install -e '.[test]'&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Run tests&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pytest&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;deploy&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;needs&lt;/span&gt;: &lt;span class="pl-s"&gt;[test]&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v2&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;3.9&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Configure pip caching&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-publish-pip-${{ hashFiles('**/setup.py') }}&lt;/span&gt;
        &lt;span class="pl-ent"&gt;restore-keys&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          ${{ runner.os }}-publish-pip-&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install setuptools wheel twine&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Publish&lt;/span&gt;
      &lt;span class="pl-ent"&gt;env&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;TWINE_USERNAME&lt;/span&gt;: &lt;span class="pl-s"&gt;__token__&lt;/span&gt;
        &lt;span class="pl-ent"&gt;TWINE_PASSWORD&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ secrets.PYPI_TOKEN }}&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        python setup.py sdist bdist_wheel&lt;/span&gt;
&lt;span class="pl-s"&gt;        twine upload dist/*&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It contains two jobs: the &lt;code&gt;test&lt;/code&gt; job the tests again - we should never publish a package without first ensuring that the test suite passes - and then the &lt;code&gt;deploy&lt;/code&gt; job runs &lt;code&gt;python setup.py sdist bdist_wheel&lt;/code&gt; followed by &lt;code&gt;twine upload dist/*&lt;/code&gt; to upload the resulting packages.&lt;/p&gt;
&lt;p&gt;(My &lt;a href="https://github.com/simonw/python-lib/blob/6ca0a8fea6986f076d88cfcf9cb6dce782b53098/%7B%7Bcookiecutter.hyphenated%7D%7D/.github/workflows/publish.yml"&gt;latest version of this&lt;/a&gt; uses &lt;code&gt;python3 -m build&lt;/code&gt; here instead.)&lt;/p&gt;
&lt;p&gt;Before publishing a package with this action, I needed to create a &lt;code&gt;PYPI_TOKEN&lt;/code&gt; that the action could use to authenticate with my PyPI account.&lt;/p&gt;
&lt;p&gt;I used the &lt;a href="https://pypi.org/manage/account/token/"&gt;https://pypi.org/manage/account/token/&lt;/a&gt; page to create that token:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the PyPI interface for creating an API token" src="https://static.simonwillison.net/static/2021/pygotham-add-api-token.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I copied out the newly created token:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the PyPI interface for coping an API token" src="https://static.simonwillison.net/static/2021/pygotham-api-token.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Then I used the "Settings -&amp;gt; Secrets" tab on the GitHub repository to add that as a secret called &lt;code&gt;PYPI_TOKEN&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the GitHub interface for adding a repository secret" src="https://static.simonwillison.net/static/2021/pygotham-new-secret.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;(I have since revoked the token that I used in the video, since it is visible on screen to anyone watching.)&lt;/p&gt;
&lt;p&gt;I used the GitHub web interface to edit &lt;code&gt;setup.py&lt;/code&gt; to bump the version number in that file up to &lt;code&gt;0.1.2&lt;/code&gt;, then I navigated to the &lt;a href="https://github.com/simonw/pids/releases"&gt;releases tab&lt;/a&gt; in the repository, clicked "Draft new release" and created a release that would create a new &lt;code&gt;0.1.2&lt;/code&gt; tag as part of the release process.&lt;/p&gt;
&lt;p&gt;When I published the release, the &lt;code&gt;publish.yml&lt;/code&gt; action started to run. After the tests had passed it pushed the new release to PyPI: &lt;a href="https://pypi.org/project/pids/0.1.2/"&gt;https://pypi.org/project/pids/0.1.2/&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="bonus-cookiecutter"&gt;Bonus: cookiecutter templates&lt;/h4&gt;
&lt;p&gt;I've published a &lt;em&gt;lot&lt;/em&gt; of packages using this process - &lt;a href="https://pypi.org/user/simonw/"&gt;143 and counting&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Rather than copy and paste in a &lt;code&gt;setup.py&lt;/code&gt; each time, a couple of years ago I switched over to using &lt;a href="https://github.com/cookiecutter/cookiecutter"&gt;cookiecutter&lt;/a&gt; templates.&lt;/p&gt;
&lt;p&gt;I have three templates that I use today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/python-lib"&gt;python-lib&lt;/a&gt; for standalone Python libraries&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; for &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; plugins&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/click-app"&gt;click-app&lt;/a&gt; for CLI applications built using the &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; package&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Back in August I figured out a way to make these available as GitHub repository templates, which I described in &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions&lt;/a&gt;. This means you can create a new GitHub repository that implements the &lt;code&gt;test.yml&lt;/code&gt; and &lt;code&gt;publish.yml&lt;/code&gt; pattern described in this talk with just a few clicks on the GitHub website.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pypi"&gt;pypi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="github"/><category term="open-source"/><category term="pypi"/><category term="python"/><category term="my-talks"/><category term="github-actions"/><category term="annotated-talks"/></entry><entry><title>Open source projects: consider running office hours</title><link href="https://simonwillison.net/2021/Feb/19/office-hours/#atom-series" rel="alternate"/><published>2021-02-19T21:54:22+00:00</published><updated>2021-02-19T21:54:22+00:00</updated><id>https://simonwillison.net/2021/Feb/19/office-hours/#atom-series</id><summary type="html">
    &lt;p&gt;Back in December I decided to try something new for my &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; open source project: &lt;a href="https://calendly.com/swillison/datasette-office-hours"&gt;Datasette Office Hours&lt;/a&gt;. The idea is simple: anyone can book a 25 minute conversation with me on a Friday to talk about the project. I’m interested in talking to people who are using Datasette, or who are considering using it, or who just want to have a chat.&lt;/p&gt;
&lt;p&gt;I’ve now had 35 conversations and it’s been absolutely fantastic. I’ve talked to people in Iceland, Burundi, Finland, Singapore, Bulgaria and dozens of other places around the world. I’ve seen my software applied to applications ranging from historic cemetery records to library collections to open city data. It’s been thrilling.&lt;/p&gt;
&lt;p&gt;I’d like to encourage more open source project maintainers to consider doing something similar.&lt;/p&gt;

&lt;p&gt;("Office hours" is a term used at some universities for periods of time when you can drop in to talk with a lecturer. In Germany they use the term "Sprechstunde" for "speaking hours" which I think is better!)&lt;/p&gt;

&lt;h4&gt;Reasons to do this&lt;/h4&gt;
&lt;p&gt;A challenge of open source is that it's easy to be starved of feedback. People might file bug reports if something breaks, but other than that it can feel like publishing software into a void.&lt;/p&gt;
&lt;p&gt;Hearing directly from people who are using your stuff is incredibly motivational. It’s also an amazing source of ideas and feedback on where the project should go next.&lt;/p&gt;
&lt;p&gt;In the startup world “talk to your users and potential customers” is advice that becomes a constant drumbeat… because it’s really effective, but it’s also hard to bring up the courage to do!&lt;/p&gt;
&lt;p&gt;Talking to users in open source is similarly valuable. And it turns out, especially in these pandemic times, people really do want to talk to you. Office hours is an extremely low-friction way of putting up a sign that says “let’s have a conversation”.&lt;/p&gt;
&lt;h4&gt;How I do this: 25 minute slots, via Calendly&lt;/h4&gt;
&lt;p&gt;I’m using &lt;a href="https://calendly.com/"&gt;Calendly&lt;/a&gt; to make 20 minute slots available every Friday between 9am and 5pm Pacific Time (with 12:30-1:30 set aside for lunch), with a ten minute buffer between slots.&lt;/p&gt;
&lt;p&gt;In practice, I treat these as 25 minute slots. This gives me five 5 minutes break in between conversations, and also means it’s possible to stretch to 30 minutes if we get to a key topic just before the time slot ends.&lt;/p&gt;
&lt;p&gt;I configured Calendly to allow a maximum of five bookings on any Friday. This feels right to me - conversations with five different people can be pretty mentally tiring, and cutting off after five still gives me a good chance to get other work done during the day.&lt;/p&gt;
&lt;p&gt;I use Calendly’s Zoom integration, which automatically sends out a calendar invite to both myself and my conversation partner and schedules a Zoom room that’s linked to from the invite. All I have to do is click the link at the appropriate time.&lt;/p&gt;
&lt;p&gt;About one in fifteen conversations ends up cancelled. That’s completely fine - I get half an hour of my day back and we can usually reschedule for another week.&lt;/p&gt;
&lt;h4&gt;How about making some money?&lt;/h4&gt;
&lt;p&gt;I’ve been having some &lt;a href="https://twitter.com/simonw/status/1361752269700489216"&gt;fascinating conversations&lt;/a&gt; on Twitter recently about the challenges of taking an open source project and turning it into a full-time job, earning a salary good enough to avoid the siren call of working for a FAANG company.&lt;/p&gt;
&lt;p&gt;People pointed me to a few good examples of open source maintainers who charge for video conference consulting sessions - &lt;a href="https://benjie.dev/"&gt;Graphile’s Benjie&lt;/a&gt; and &lt;a href="https://calendly.com/csswizardry/consultation"&gt;CSS Wizardry's Harry Roberts&lt;/a&gt; both let you book paid sessions with them directly.&lt;/p&gt;
&lt;p&gt;I really like this as an opportunity for earning money against an open source project, and I think it could complement office hours nicely: 25 minutes on a Friday free on a first-come, first-served basis could then up-sell to a 1.5 hours paid consulting session, which could then lead to larger consulting contracts.&lt;/p&gt;
&lt;h4&gt;Try this yourself&lt;/h4&gt;
&lt;p&gt;If you're tempted to try office hours for your own project, getting started is easy. I'm using the Calendly free plan, but their paid plans (which include the ability to attach Stripe or PayPal payments to bookings) are reasonably priced. I've been promoting my sessions via Twitter, the &lt;a href="https://datasette.io/"&gt;datasette.io&lt;/a&gt; website and the &lt;a href="https://datasette.substack.com/"&gt;Datasette Newsletter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is one of those ideas I wish I'd had sooner. It's quickly become a highlight of my week.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 5th March 2021&lt;/strong&gt;: The original headline of this piece was "Open source projects should run office hours". This was &lt;a href="https://news.ycombinator.com/item?id=26351053"&gt;being misinterpreted&lt;/a&gt; as yet another demand for free labor from open source maintainers, so I changed it to "Open source projects: consider running office hours" - less pithy, but it better reflects my actual message here.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-source"/><category term="datasette"/></entry><entry><title>How to cheat at unit tests with pytest and Black</title><link href="https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/#atom-series" rel="alternate"/><published>2020-02-11T06:56:55+00:00</published><updated>2020-02-11T06:56:55+00:00</updated><id>https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/#atom-series</id><summary type="html">
    &lt;p&gt;I’ve been making a lot of progress on &lt;a href="https://simonwillison.net/tags/datasettecloud/"&gt;Datasette Cloud&lt;/a&gt; this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.&lt;/p&gt;
&lt;p&gt;The one thing I’ve learned about permissions code over the years is that it absolutely warrants comprehensive unit tests. This is not code that can afford to have dumb bugs, or regressions caused by future development!&lt;/p&gt;
&lt;p&gt;I’ve become a big proponent of &lt;a href="https://docs.pytest.org/en"&gt;pytest&lt;/a&gt; over the past two years, but this is the first Django project that I’ve built using pytest from day one as opposed to relying on the Django test runner. It’s been a great opportunity to try out &lt;a href="https://pytest-django.readthedocs.io/"&gt;pytest-django&lt;/a&gt;, and I’m really impressed with it. It maintains my favourite things about Django’s test framework - smart usage of database transactions to reset the database and a handy &lt;a href="https://docs.djangoproject.com/en/3.0/topics/testing/tools/#the-test-client"&gt;test client object&lt;/a&gt; for sending fake HTTP requests - and adds all of that pytest magic that &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/#Taking_advantage_of_pytest_78"&gt;I’ve grown to love&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It also means I get to use my favourite trick for productively writing unit tests: the combination of pytest and &lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt;, the “uncompromising Python code formatter”.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Cheating_at_unit_tests_10"&gt;&lt;/a&gt;Cheating at unit tests&lt;/h3&gt;
&lt;p&gt;In pure test-driven development you write the tests first, and don’t start on the implementation until you’ve watched them fail.&lt;/p&gt;
&lt;p&gt;Most of the time I find that this is a net loss on productivity. I tend to prototype my way to solutions, so I often find myself with rough running code before I’ve developed enough of a concrete implementation plan to be able to write the tests.&lt;/p&gt;
&lt;p&gt;So… I cheat. Once I’m happy with the implementation I write the tests to match it. Then once I have the tests in place and I know what needs to change I can switch to using changes to the tests to drive the implementation.&lt;/p&gt;
&lt;p&gt;In particular, I like using a rough initial implementation to help generate the tests in the first place.&lt;/p&gt;
&lt;p&gt;Here’s how I do that with pytest. I’ll write a test that looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_some_api&lt;/span&gt;(&lt;span class="pl-s1"&gt;client&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;response&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;client&lt;/span&gt;.&lt;span class="pl-en"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"/some/api/"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-c1"&gt;False&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-en"&gt;json&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;Note that I’m using the pytest-django &lt;code&gt;client&lt;/code&gt; fixture here, which magically passes a fully configured Django test client object to my test function.&lt;/p&gt;
&lt;p&gt;I run this test, and it fails:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pytest -k test_some_api
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(&lt;code&gt;pytest -k blah&lt;/code&gt; runs just tests that contain &lt;code&gt;blah&lt;/code&gt; in their name)&lt;/p&gt;
&lt;p&gt;Now… I run the test again, but with the &lt;code&gt;--pdb&lt;/code&gt; option to cause pytest to drop me into a debugger at the failure point:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pytest -k test_some_api --pdb
== test session starts ===
platform darwin -- Python 3.7.5, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
django: settings: config.test_settings (from ini)
...
client = &amp;lt;django.test.client.Client object at 0x10cfdb510&amp;gt;

    def test_some_api(client):
        response = client.get(&amp;quot;/some/api/&amp;quot;)
&amp;gt;       assert False == response.json()
E       assert False == {'this': ['is', 'an', 'example', 'api']}
core/test_docs.py:27: AssertionError
&amp;gt;&amp;gt; entering PDB &amp;gt;&amp;gt;
&amp;gt;&amp;gt; PDB post_mortem (IO-capturing turned off) &amp;gt;&amp;gt;
&amp;gt; core/test_docs.py(27)test_some_api()
-&amp;gt; assert False == response.json()
(Pdb) response.json()
{'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'}
(Pdb) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running &lt;code&gt;response.json()&lt;/code&gt; in the debugger dumps out the actual value to the console.&lt;/p&gt;
&lt;p&gt;Then I copy that output - in this case &lt;code&gt;{'this': ['is', 'an', 'example', 'api'], 'that_outputs': 'JSON'}&lt;/code&gt; - and paste it into the test:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_some_api&lt;/span&gt;(&lt;span class="pl-s1"&gt;client&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;response&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;client&lt;/span&gt;.&lt;span class="pl-en"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"/some/api/"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; {&lt;span class="pl-s"&gt;'this'&lt;/span&gt;: [&lt;span class="pl-s"&gt;'is'&lt;/span&gt;, &lt;span class="pl-s"&gt;'an'&lt;/span&gt;, &lt;span class="pl-s"&gt;'example'&lt;/span&gt;, &lt;span class="pl-s"&gt;'api'&lt;/span&gt;], &lt;span class="pl-s"&gt;'that_outputs'&lt;/span&gt;: &lt;span class="pl-s"&gt;'JSON'&lt;/span&gt;} &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-en"&gt;json&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;Finally, I run &lt;code&gt;black .&lt;/code&gt; in my project root to reformat the test:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_some_api&lt;/span&gt;(&lt;span class="pl-s1"&gt;client&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;response&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;client&lt;/span&gt;.&lt;span class="pl-en"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"/some/api/"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; {
        &lt;span class="pl-s"&gt;"this"&lt;/span&gt;: [&lt;span class="pl-s"&gt;"is"&lt;/span&gt;, &lt;span class="pl-s"&gt;"an"&lt;/span&gt;, &lt;span class="pl-s"&gt;"example"&lt;/span&gt;, &lt;span class="pl-s"&gt;"api"&lt;/span&gt;],
        &lt;span class="pl-s"&gt;"that_outputs"&lt;/span&gt;: &lt;span class="pl-s"&gt;"JSON"&lt;/span&gt;,
    } &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-en"&gt;json&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;This last step means that no matter how giant and ugly the test comparison has become I’ll always get a neatly formatted test out of it.&lt;/p&gt;
&lt;p&gt;I always eyeball the generated test to make sure that it’s what I would have written by hand if I wasn’t so lazy - then I commit it along with the implementation and move on to the next task.&lt;/p&gt;
&lt;p&gt;I’ve used this technique to write many of the tests in both &lt;a href="https://github.com/simonw/datasette"&gt;Datasette&lt;/a&gt; and &lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, and those are by far the best tested pieces of software I’ve ever released.&lt;/p&gt;
&lt;p&gt;I started doing this around two years ago, and I’ve held off writing about it until I was confident I understood the downsides. I haven’t found any yet: I end up with a robust, comprehensive test suite and it takes me less than half the time to write the tests than if I’d been hand-crafting all of those comparisons from scratch.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Also_this_week_86"&gt;&lt;/a&gt;Also this week&lt;/h3&gt;
&lt;p&gt;Working on Datasette Cloud has required a few minor releases to some of my open source projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Shipped &lt;a href="https://github.com/simonw/datasette-auth-existing-cookies/releases"&gt;datasette-auth-existing-cookies&lt;/a&gt; 0.6 and 0.6.1&lt;/li&gt;
&lt;li&gt;Shipped &lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;sqlite-utils&lt;/a&gt; 2.2, 2.2.1, 2.3 and 2.3.1&lt;/li&gt;
&lt;li&gt;Shipped &lt;a href="https://datasette.readthedocs.io/en/latest/changelog.html#v0-35"&gt;Datasette 0.35&lt;/a&gt; with a new utility method for plugins to render their own templates, which I’m now using in…&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.2a"&gt;datasette-upload-csvs 0.2a&lt;/a&gt; - still very alpha, but at least it looks slightly nicer now&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unrelated to Datasette Cloud, I also shipped &lt;a href="https://github.com/dogsheep/twitter-to-sqlite/releases/tag/0.16"&gt;twitter-to-sqlite 0.16&lt;/a&gt; with a new command for importing your Twitter friends (previously it only had a command for importing your followers).&lt;/p&gt;
&lt;p&gt;In bad personal motivation news… I missed my weekly update to &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt; and lost my streak!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytest"&gt;pytest&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/black"&gt;black&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="python"/><category term="testing"/><category term="datasette"/><category term="pytest"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="black"/></entry><entry><title>Documentation unit tests</title><link href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/#atom-series" rel="alternate"/><published>2018-07-28T15:59:55+00:00</published><updated>2018-07-28T15:59:55+00:00</updated><id>https://simonwillison.net/2018/Jul/28/documentation-unit-tests/#atom-series</id><summary type="html">
    &lt;p&gt;&lt;em&gt;Or: Test-driven documentation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.&lt;/p&gt;
&lt;p&gt;One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.&lt;/p&gt;
&lt;p&gt;When combined with a code review system (like &lt;a href="https://www.phacility.com/phabricator/"&gt;Phabricator&lt;/a&gt; or &lt;a href="https://help.github.com/articles/about-pull-requests/"&gt;GitHub pull requests&lt;/a&gt;) this pattern lets you enforce documentation updates as part of the review process: if a change doesn’t update the relevant documentation, point that out in your review!&lt;/p&gt;
&lt;p&gt;Good code review systems also execute unit tests automatically and attach the results to the review. This provides an opportunity to have the tests enforce other aspects of the codebase: for example, running a linter so that no-one has to waste their time arguing over standardize coding style.&lt;/p&gt;
&lt;p&gt;I’ve been experimenting with using unit tests to ensure that aspects of a project are covered by the documentation. I think it’s a very promising technique.&lt;/p&gt;
&lt;h4 id="Introspect_the_code_introspect_the_docs_12"&gt;Introspect the code, introspect the docs&lt;/h4&gt;
&lt;p&gt;The key to this trick is introspection: interogating the code to figure out what needs to be documented, then parsing the documentation to see if each item has been covered.&lt;/p&gt;
&lt;p&gt;I’ll use my &lt;a href="https://github.com/simonw/datasette"&gt;Datasette&lt;/a&gt; project as an example. Datasette’s &lt;a href="https://github.com/simonw/datasette/blob/295d005ca48747faf046ed30c3c61e7563c61ed2/tests/test_docs.py"&gt;test_docs.py&lt;/a&gt; module contains three relevant tests:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;test_config_options_are_documented&lt;/code&gt; checks that every one of Datasette’s &lt;a href="http://datasette.readthedocs.io/en/latest/config.html"&gt;configuration options&lt;/a&gt; are documented.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;test_plugin_hooks_are_documented&lt;/code&gt; ensures all of the plugin hooks (powered by &lt;a href="https://pluggy.readthedocs.io/en/latest/"&gt;pluggy&lt;/a&gt;) are covered in the &lt;a href="http://datasette.readthedocs.io/en/latest/plugins.html#plugin-hooks"&gt;plugin documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;test_view_classes_are_documented&lt;/code&gt; iterates through all of the &lt;code&gt;*View&lt;/code&gt; classes (corresponding to pages in the Datasette user interface) and makes sure &lt;a href="http://datasette.readthedocs.io/en/latest/pages.html"&gt;they are covered&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In each case, the test uses introspection against the relevant code areas to figure out what needs to be documented, then runs a regular expression against the documentation to make sure it is mentioned in the correct place.&lt;/p&gt;
&lt;p&gt;Obviously the tests can’t confirm the quality of the documentation, so they are easy to cheat: but they do at least protect against adding a new option but forgetting to document it.&lt;/p&gt;
&lt;h4 id="Testing_that_Datasettes_view_classes_are_covered_26"&gt;Testing that Datasette’s view classes are covered&lt;/h4&gt;
&lt;p&gt;Datasette’s view classes use a naming convention: they all end in &lt;code&gt;View&lt;/code&gt;. The current list of view classes is &lt;code&gt;DatabaseView&lt;/code&gt;, &lt;code&gt;TableView&lt;/code&gt;, &lt;code&gt;RowView&lt;/code&gt;, &lt;code&gt;IndexView&lt;/code&gt; and &lt;code&gt;JsonDataView&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Since these classes are all imported into the &lt;a href="https://github.com/simonw/datasette/blob/295d005ca48747faf046ed30c3c61e7563c61ed2/datasette/app.py"&gt;datasette.app&lt;/a&gt; module (in order to be hooked up to URL routes) the easiest way to introspect them is to import that module, then run &lt;code&gt;dir(app)&lt;/code&gt; and grab any class names that end in &lt;code&gt;View&lt;/code&gt;. We can do that with a Python list comprehension:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from datasette import app
views = [v for v in dir(app) if v.endswith(&amp;quot;View&amp;quot;)]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m using reStructuredText labels to mark the place in the documentation that addresses each of these classes. This also ensures that each documentation section can be linked to, for example:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://datasette.readthedocs.io/en/latest/pages.html#tableview"&gt;http://datasette.readthedocs.io/en/latest/pages.html#tableview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The reStructuredText syntax for that label looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;.. _TableView:

Table
=====

The table page is the heart of Datasette...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can extract these labels using a regular expression:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from pathlib import Path
import re

docs_path = Path(__file__).parent.parent / 'docs'
label_re = re.compile(r'\.\. _([^\s:]+):')

def get_labels(filename):
    contents = (docs_path / filename).open().read()
    return set(label_re.findall(contents))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since Datasette’s documentation is spread across multiple &lt;code&gt;*.rst&lt;/code&gt; files, and I want the freedom to document a view class in any one of them, I iterate through every file to find the labels and pull out the ones ending in &lt;code&gt;View&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def documented_views():
    view_labels = set()
    for filename in docs_path.glob(&amp;quot;*.rst&amp;quot;):
        for label in get_labels(filename):
            first_word = label.split(&amp;quot;_&amp;quot;)[0]
            if first_word.endswith(&amp;quot;View&amp;quot;):
                view_labels.add(first_word)
    return view_labels
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have a list of class names and a list of labels across all of our documentation. Writing a basic unit test comparing the two lists is trivial:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def test_view_documentation():
    view_labels = documented_views()
    view_classes = set(v for v in dir(app) if v.endswith(&amp;quot;View&amp;quot;))
    assert view_labels == view_classes
&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id="Taking_advantage_of_pytest_78"&gt;Taking advantage of pytest&lt;/h4&gt;
&lt;p&gt;Datasette uses &lt;a href="https://pytest.org/"&gt;pytest&lt;/a&gt; for its unit tests, and documentation unit tests are a great opportunity to take advantage of some advanced pytest features.&lt;/p&gt;
&lt;h5 id="Parametrization_82"&gt;Parametrization&lt;/h5&gt;
&lt;p&gt;The first of these is &lt;a href="https://docs.pytest.org/en/6.2.x/parametrize.html"&gt;parametrization&lt;/a&gt;: pytest provides a decorator which can be used to execute a single test function multiple times, each time with different arguments.&lt;/p&gt;
&lt;p&gt;This example from the pytest documentation shows how parametrization works:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import pytest
@pytest.mark.parametrize(&amp;quot;test_input,expected&amp;quot;, [
    (&amp;quot;3+5&amp;quot;, 8),
    (&amp;quot;2+4&amp;quot;, 6),
    (&amp;quot;6*9&amp;quot;, 42),
])
def test_eval(test_input, expected):
    assert eval(test_input) == expected
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;pytest treats this as three separate unit tests, even though they share a single function definition.&lt;/p&gt;
&lt;p&gt;We can combine this pattern with our introspection to execute an independent unit test for each of our view classes. Here’s what that looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@pytest.mark.parametrize(&amp;quot;view&amp;quot;, [v for v in dir(app) if v.endswith(&amp;quot;View&amp;quot;)])
def test_view_classes_are_documented(view):
    assert view in documented_views()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here’s the output from pytest if we execute just this unit test (and one of our classes is undocumented):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pytest -k test_view_classes_are_documented -v
=== test session starts ===
collected 249 items / 244 deselected

tests/test_docs.py::test_view_classes_are_documented[DatabaseView] PASSED [ 20%]
tests/test_docs.py::test_view_classes_are_documented[IndexView] PASSED [ 40%]
tests/test_docs.py::test_view_classes_are_documented[JsonDataView] PASSED [ 60%]
tests/test_docs.py::test_view_classes_are_documented[RowView] PASSED [ 80%]
tests/test_docs.py::test_view_classes_are_documented[TableView] FAILED [100%]

=== FAILURES ===

view = 'TableView'

    @pytest.mark.parametrize(&amp;quot;view&amp;quot;, [v for v in dir(app) if v.endswith(&amp;quot;View&amp;quot;)])
    def test_view_classes_are_documented(view):
&amp;gt;       assert view in documented_views()
E       AssertionError: assert 'TableView' in {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'}
E        +  where {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'} = documented_views()

tests/test_docs.py:77: AssertionError
=== 1 failed, 4 passed, 244 deselected in 1.13 seconds ===
&lt;/code&gt;&lt;/pre&gt;
&lt;h5 id="Fixtures_130"&gt;Fixtures&lt;/h5&gt;
&lt;p&gt;There’s a subtle inefficiency in the above test: for every view class, it calls the &lt;code&gt;documented_views()&lt;/code&gt; function - and that function then iterates through every &lt;code&gt;*.rst&lt;/code&gt; file in the &lt;code&gt;docs/&lt;/code&gt; directory and uses a regular expression to extract the labels. With 5 view classes and 17 documentation files that’s 85 executions of &lt;code&gt;get_labels()&lt;/code&gt;, and that number will only increase as Datasette’s code and documentation grow larger.&lt;/p&gt;
&lt;p&gt;We can use pytest’s neat &lt;a href="https://docs.pytest.org/en/6.2.x/fixture.html"&gt;fixtures&lt;/a&gt; to reduce this to a single call to &lt;code&gt;documented_views()&lt;/code&gt; that is shared across all of the tests. Here’s what that looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@pytest.fixture(scope=&amp;quot;session&amp;quot;)
def documented_views():
    view_labels = set()
    for filename in docs_path.glob(&amp;quot;*.rst&amp;quot;):
        for label in get_labels(filename):
            first_word = label.split(&amp;quot;_&amp;quot;)[0]
            if first_word.endswith(&amp;quot;View&amp;quot;):
                view_labels.add(first_word)
    return view_labels

@pytest.mark.parametrize(&amp;quot;view_class&amp;quot;, [
    v for v in dir(app) if v.endswith(&amp;quot;View&amp;quot;)
])
def test_view_classes_are_documented(documented_views, view_class):
    assert view_class in documented_views
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fixtures in pytest are an example of dependency injection: pytest introspects every &lt;code&gt;test_*&lt;/code&gt; function and checks if it has a function argument with a name matching something that has been annotated with the &lt;code&gt;@pytest.fixture&lt;/code&gt; decorator. If it finds any matching arguments, it executes the matching fixture function and passes its return value in to the test function.&lt;/p&gt;
&lt;p&gt;By default, pytest will execute the fixture function once for every test execution. In the above code we use the &lt;code&gt;scope=&amp;quot;session&amp;quot;&lt;/code&gt; argument to tell pytest that this particular fixture should be executed only once for every &lt;code&gt;pytest&lt;/code&gt; command-line execution of the tests, and that single return value should be passed to every matching test.&lt;/p&gt;
&lt;h4 id="What_if_you_havent_documented_everything_yet_157"&gt;What if you haven’t documented everything yet?&lt;/h4&gt;
&lt;p&gt;Adding unit tests to your documentation in this way faces an obvious problem: when you first add the tests, you may have to write a whole lot of documentation before they can all pass.&lt;/p&gt;
&lt;p&gt;Having tests that protect against future code being added without documentation is only useful once you’ve added them to the codebase - but blocking that on documenting your existing features could prevent that benefit from ever manifesting itself.&lt;/p&gt;
&lt;p&gt;Once again, pytest to the rescue. The &lt;code&gt;@pytest.mark.xfail&lt;/code&gt; decorator allows you to mark a test as “expected to fail” - if it fails, pytest will take note but will not fail the entire test suite.&lt;/p&gt;
&lt;p&gt;This means you can add deliberately failing tests to your codebase without breaking the build for everyone - perfect for tests that look for documentation that hasn’t yet been written!&lt;/p&gt;
&lt;p&gt;I used &lt;code&gt;xfail&lt;/code&gt; when I &lt;a href="https://github.com/simonw/datasette/commit/e8625695a3b7938f37b64dff09c14e47d9428fe5"&gt;first added view documentation tests&lt;/a&gt; to Datasette, then removed it once the documentation was all in place. Any future code in pull requests without documentation will cause a hard test failure.&lt;/p&gt;
&lt;p&gt;Here’s what the test output looks like when some of those tests are marked as “expected to fail”:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ pytest tests/test_docs.py
collected 31 items

tests/test_docs.py ..........................XXXxx.                [100%]

============ 26 passed, 2 xfailed, 3 xpassed in 1.06 seconds ============
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since this reports both the xfailed &lt;em&gt;and&lt;/em&gt; the xpassed counts, it shows how much work is still left to be done before the &lt;code&gt;xfail&lt;/code&gt; decorator can be safely removed.&lt;/p&gt;
&lt;h4 id="Structuring_code_for_testable_documentation_180"&gt;Structuring code for testable documentation&lt;/h4&gt;
&lt;p&gt;A benefit of comprehensive unit testing is that it encourages you to design your code in a way that is easy to test. In my experience this leads to much higher code quality in general: it encourages separation of concerns and cleanly decoupled components.&lt;/p&gt;
&lt;p&gt;My hope is that documentation unit tests will have a similar effect. I’m already starting to think about ways of restructuring my code such that I can cleanly introspect it for the areas that need to be documented. I’m looking forward to discovering code design patterns that help support this goal.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/design-patterns"&gt;design-patterns&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/restructuredtext"&gt;restructuredtext&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytest"&gt;pytest&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="design-patterns"/><category term="documentation"/><category term="restructuredtext"/><category term="testing"/><category term="datasette"/><category term="pytest"/></entry></feed>