Simon Willison's Weblog: playwright

Agentic manual testing

2026-03-06T05:43:54+00:00

The defining characteristic of a coding agent is that it can execute the code that it writes. This is what makes coding agents so much more useful than LLMs that simply spit out code without any way to verify it.

Never assume that code generated by an LLM works until that code has been executed.

Coding agents have the ability to confirm that the code they have produced works as intended, or iterate further on that code until it does.

Getting agents to write unit tests, especially using test-first TDD, is a powerful way to ensure they have exercised the code they are writing.

That's not the only worthwhile approach, though.

Just because code passes tests doesn't mean it works as intended. Anyone who's worked with automated tests will have seen cases where the tests all pass but the code itself fails in some obvious way - it might crash the server on startup, fail to display a crucial UI element, or miss some detail that the tests failed to cover.

Automated tests are no replacement for manual testing. I like to see a feature working with my own eye before I land it in a release.

I've found that getting agents to manually test code is valuable as well, frequently revealing issues that weren't spotted by the automated tests.

Mechanisms for agentic manual testing

How an agent should "manually" test a piece of code varies depending on what that code is.

For Python libraries a useful pattern is python -c "... code ...". You can pass a string (or multiline string) of Python code directly to the Python interpreter, including code that imports other modules.

The coding agents are all familiar with this trick and will sometimes use it without prompting. Reminding them to test using python -c can often be effective though:

Other languages may have similar mechanisms, and if they don't it's still quick for an agent to write out a demo file and then compile and run it. I sometimes encourage it to use /tmp purely to avoid those files being accidentally committed to the repository later on.

Many of my projects involve building web applications with JSON APIs. For these I tell the agent to exercise them using curl:

Telling an agent to "explore" often results in it trying out a bunch of different aspects of a new API, which can quickly cover a whole lot of ground.

If an agent finds something that doesn't work through their manual testing, I like to tell them to fix it with red/green TDD. This ensures the new case ends up covered by the permanent automated tests.

Using browser automation for web UIs

Having a manual testing procedure in place becomes even more valuable if a project involves an interactive web UI.

Historically these have been difficult to test from code, but the past decade has seen notable improvements in systems for automating real web browsers. Running a real Chrome or Firefox or Safari browser against an application can uncover all sorts of interesting problems in a realistic setting.

Coding agents know how to use these tools extremely well.

The most powerful of these today is Playwright, an open source library developed by Microsoft. Playwright offers a full-featured API with bindings in multiple popular programming languages and can automate any of the popular browser engines.

Simply telling your agent to "test that with Playwright" may be enough. The agent can then select the language binding that makes the most sense, or use Playwright's playwright-cli tool.

Coding agents work really well with dedicated CLIs. agent-browser by Vercel is a comprehensive CLI wrapper around Playwright specially designed for coding agents to use.

My own project Rodney serves a similar purpose, albeit using the Chrome DevTools Protocol to directly control an instance of Chrome.

Here's an example prompt I use to test things with Rodney:

There are three tricks in this prompt:

Saying "use uvx rodney --help" causes the agent to run rodney --help via the uvx package management tool, which automatically installs Rodney the first time it is called.
The rodney --help command is specifically designed to give agents everything they need to know to both understand and use the tool. Here's that help text.
Saying "look at screenshots" hints to the agent that it should use the rodney screenshot command and reminds it that it can use its own vision abilities against the resulting image files to evaluate the visual appearance of the page.

That's a whole lot of manual testing baked into a short prompt!

Rodney and tools like it offer a wide array of capabilities, from running JavaScript on the loaded site to scrolling, clicking, typing, and even reading the accessibility tree of the page.

As with other forms of manual tests, issues found and fixed via browser automation can then be added to permanent automated tests as well.

Many developers have avoided too many automated browser tests in the past due to their reputation for flakiness - the smallest tweak to the HTML of a page can result in frustrating waves of test breaks.

Having coding agents maintain those tests over time greatly reduces the friction involved in keeping them up-to-date in the face of design changes to the web interfaces.

Have them take notes with Showboat

Having agents manually test code can catch extra problems, but it can also be used to create artifacts that can help document the code and demonstrate how it has been tested.

I'm fascinated by the challenge of having agents show their work. Being able to see demos or documented experiments is a really useful way of confirming that the agent has comprehensively solved the challenge it was given.

I built Showboat to facilitate building documents that capture the agentic manual testing flow.

Here's a prompt I frequently use:

As with Rodney above, the showboat --help command teaches the agent what Showboat is and how to use it. Here's that help text in full.

The three key Showboat commands are note, exec, and image.

note appends a Markdown note to the Showboat document. exec records a command, then runs that command and records its output. image adds an image to the document - useful for screenshots of web applications taken using Rodney.

The exec command is the most important of these, because it captures a command along with the resulting output. This shows you what the agent did and what the result was, and is designed to discourage the agent from cheating and writing what it hoped had happened into the document.

I've been finding the Showboat pattern to work really well for documenting the work that has been achieved during my agent sessions. I'm hoping to see similar patterns adopted across a wider set of tools.

Tags: playwright, testing, agentic-engineering, ai, llms, coding-agents, ai-assisted-programming, rodney, showboat

Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone

2025-07-17T19:38:50+00:00

This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex and Claude Artifacts.

This weekend is Open Sauce 2025, the third edition of the Bay Area conference for YouTube creators in the science and engineering space. I have a couple of friends going and they were complaining that the official schedule was difficult to navigate on a phone - it's not even linked from the homepage on mobile, and once you do find the agenda it isn't particularly mobile-friendly.

We were out for coffee this morning so I only had my phone, but I decided to see if I could fix it anyway.

TLDR: Working entirely on my iPhone, using a combination of OpenAI Codex in the ChatGPT mobile app and Claude Artifacts via the Claude app, I was able to scrape the full schedule and then build and deploy this: tools.simonwillison.net/open-sauce-2025

The site offers a faster loading and more useful agenda view, but more importantly it includes an option to "Download Calendar (ICS)" which allows mobile phone users (Android and iOS) to easily import the schedule events directly into their calendar app of choice.

Here are some detailed notes on how I built it.

Scraping the schedule

Step one was to get that schedule in a structured format. I don't have good tools for viewing source on my iPhone, so I took a different approach to turning the schedule site into structured data.

My first thought was to screenshot the schedule on my phone and then dump the images into a vision LLM - but the schedule was long enough that I didn't feel like scrolling through several different pages and stitching together dozens of images.

If I was working on a laptop I'd turn to scraping: I'd dig around in the site itself and figure out where the data came from, then write code to extract it out.

How could I do the same thing working on my phone?

I decided to use OpenAI Codex - the hosted tool, not the confusingly named CLI utility.

Codex recently grew the ability to interact with the internet while attempting to resolve a task. I have a dedicated Codex "environment" configured against a GitHub repository that doesn't do anything else, purely so I can run internet-enabled sessions there that can execute arbitrary network-enabled commands.

I started a new task there (using the Codex interface inside the ChatGPT iPhone app) and prompted:

Install playwright and use it to visit https://opensauce.com/agenda/ and grab the full details of all three day schedules from the tabs - Friday and Saturday and Sunday - then save and on Data in as much detail as possible in a JSON file and submit that as a PR

Codex is frustrating in that you only get one shot: it can go away and work autonomously on a task for a long time, but while it's working you can't give it follow-up prompts. You can wait for it to finish entirely and then tell it to try again in a new session, but ideally the instructions you give it are enough for it to get to the finish state where it submits a pull request against your repo with the results.

I got lucky: my above prompt worked exactly as intended.

Codex churned for a 13 minutes! I was sat chatting in a coffee shop, occasionally checking the logs to see what it was up to.

It tried a whole bunch of approaches, all involving running the Playwright Python library to interact with the site. You can see the full transcript here. It includes notes like "Looks like xxd isn't installed. I'll grab "vim-common" or "xxd" to fix it.".

Eventually it downloaded an enormous obfuscated chunk of JavaScript called schedule-overview-main-1752724893152.js (316KB) and then ran a complex sequence of grep, grep, sed, strings, xxd and dd commands against it to figure out the location of the raw schedule data in order to extract it out.

Here's the eventual extract_schedule.py Python script it wrote, which uses Playwright to save that schedule-overview-main-1752724893152.js file and then extracts the raw data using the following code (which calls Node.js inside Python, just so it can use the JavaScript eval() function):

node_script = (
    "const fs=require('fs');"
    f"const d=fs.readFileSync('{tmp_path}','utf8');"
    "const m=d.match(/var oo=(\\{.*?\\});/s);"
    "if(!m){throw new Error('not found');}"
    "const obj=eval('(' + m[1] + ')');"
    f"fs.writeFileSync('{OUTPUT_FILE}', JSON.stringify(obj, null, 2));"
)
subprocess.run(['node', '-e', node_script], check=True)

As instructed, it then filed a PR against my repo. It included the Python Playwright script, but more importantly it also included that full extracted schedule.json file. That meant I now had the schedule data, with a raw.githubusercontent.com URL with open CORS headers that could be fetched by a web app!

Building the web app

Now that I had the data, the next step was to build a web application to preview it and serve it up in a more useful format.

I decided I wanted two things: a nice mobile friendly interface for browsing the schedule, and mechanism for importing that schedule into a calendar application, such as Apple or Google Calendar.

It took me several false starts to get this to work. The biggest challenge was getting that 63KB of schedule JSON data into the app. I tried a few approaches here, all on my iPhone while sitting in coffee shop and later while driving with a friend to drop them off at the closest BART station.

Using ChatGPT Canvas and o3, since unlike Claude Artifacts a Canvas can fetch data from remote URLs if you allow-list that domain. I later found out that this had worked when I viewed it on my laptop, but on my phone it threw errors so I gave up on it.
Uploading the JSON to Claude and telling it to build an artifact that read the file directly - this failed with an error "undefined is not an object (evaluating 'window.fs.readFile')". The Claude 4 system prompt had lead me to expect this to work, I'm not sure why it didn't.
Having Claude copy the full JSON into the artifact. This took too long - typing out 63KB of JSON is not a sensible use of LLM tokens, and it flaked out on me when my connection went intermittent driving through a tunnel.
Telling Claude to fetch from the URL to that schedule JSON instead. This was my last resort because the Claude Artifacts UI blocks access to external URLs, so you have to copy and paste the code out to a separate interface (on an iPhone, which still lacks a "select all" button) making for a frustrating process.

That final option worked! Here's the full sequence of prompts I used with Claude to get to a working implementation - full transcript here:

Use your analyst tool to read this JSON file and show me the top level keys

This was to prime Claude - I wanted to remind it about its window.fs.readFile function and have it read enough of the JSON to understand the structure.

Build an artifact with no react that turns the schedule into a nice mobile friendly webpage - there are three days Friday, Saturday and Sunday, which corresponded to the 25th and 26th and 27th of July 2025

Don’t copy the raw JSON over to the artifact - use your fs function to read it instead

Also include a button to download ICS at the top of the page which downloads a ICS version of the schedule

I had noticed that the schedule data had keys for "friday" and "saturday" and "sunday" but no indication of the dates, so I told it those. It turned out later I'd got these wrong!

This got me a version of the page that failed with an error, because that fs.readFile() couldn't load the data from the artifact for some reason. So I fixed that with:

Change it so instead of using the readFile thing it fetches the same JSON from https://raw.githubusercontent.com/simonw/.github/f671bf57f7c20a4a7a5b0642837811e37c557499/schedule.json

... then copied the HTML out to a Gist and previewed it with gistpreview.github.io - here's that preview.

Then we spot-checked it, since there are so many ways this could have gone wrong. Thankfully the schedule JSON itself never round-tripped through an LLM so we didn't need to worry about hallucinated session details, but this was almost pure vibe coding so there was a big risk of a mistake sneaking through.

I'd set myself a deadline of "by the time we drop my friend at the BART station" and I hit that deadline with just seconds to spare. I pasted the resulting HTML into my simonw/tools GitHub repo using the GitHub mobile web interface which deployed it to that final tools.simonwillison.net/open-sauce-2025 URL.

... then we noticed that we had missed a bug: I had given it the dates of "25th and 26th and 27th of July 2025" but actually that was a week too late, the correct dates were July 18th-20th.

Thankfully I have Codex configured against my simonw/tools repo as well, so fixing that was a case of prompting a new Codex session with:

The open sauce schedule got the dates wrong - Friday is 18 July 2025 and Saturday is 19 and Sunday is 20 - fix it

Here's that Codex transcript, which resulted in this PR which I landed and deployed, again using the GitHub mobile web interface.

What this all demonstrates

So, to recap: I was able to scrape a website (without even a view source too), turn the resulting JSON data into a mobile-friendly website, add an ICS export feature and deploy the results to a static hosting platform (GitHub Pages) working entirely on my phone.

If I'd had a laptop this project would have been faster, but honestly aside from a little bit more hands-on debugging I wouldn't have gone about it in a particularly different way.

I was able to do other stuff at the same time - the Codex scraping project ran entirely autonomously, and the app build itself was more involved only because I had to work around the limitations of the tools I was using in terms of fetching data from external sources.

As usual with this stuff, my 25+ years of previous web development experience was critical to being able to execute the project. I knew about Codex, and Artifacts, and GitHub, and Playwright, and CORS headers, and Artifacts sandbox limitations, and the capabilities of ICS files on mobile phones.

This whole thing was so much fun! Being able to spin up multiple coding agents directly from my phone and have them solve quite complex problems while only paying partial attention to the details is a solid demonstration of why I continue to enjoying exploring the edges of AI-assisted programming.

Update: I removed the speaker avatars

Here's a beautiful cautionary tale about the dangers of vibe-coding on a phone with no access to performance profiling tools. A commenter on Hacker News pointed out:

The web app makes 176 requests and downloads 130 megabytes.

And yeah, it did! Turns out those speaker avatar images weren't optimized, and there were over 170 of them.

I told a fresh Codex instance "Remove the speaker avatar images from open-sauce-2025.html" and now the page weighs 93.58 KB - about 1,400 times smaller!

Update 2: Improved accessibility

That same commenter on Hacker News:

It's also <div> soup and largely inaccessible.

Yeah, this HTML isn't great:

dayContainer.innerHTML = sessions.map(session => `
    <div class="session-card">
        <div class="session-header">
            <div>
                <span class="session-time">${session.time}</span>
                <span class="length-badge">${session.length} min</span>
            </div>
            <div class="session-location">${session.where}</div>
        </div>

I opened an issue and had both Claude Code and Codex look at it. Claude Code failed to submit a PR for some reason, but Codex opened one with a fix that sounded good to me when I tried it with VoiceOver on iOS (using a Cloudflare Pages preview) so I landed that. Here's the diff, which added a hidden "skip to content" link, some aria- attributes on buttons and upgraded the HTML to use <h3> for the session titles.

Next time I'll remember to specify accessibility as a requirement in the initial prompt. I'm disappointed that Claude didn't consider that without me having to ask.

Tags: definitions, github, icalendar, mobile, scraping, tools, ai, playwright, openai, generative-ai, chatgpt, llms, ai-assisted-programming, claude, claude-artifacts, ai-agents, vibe-coding, coding-agents, async-coding-agents, prompt-to-app

TIL: Using Playwright MCP with Claude Code

2025-07-01T23:55:09+00:00

TIL: Using Playwright MCP with Claude Code

Inspired by Armin ("I personally use only one MCP - I only use Playwright") I decided to figure out how to use the official Playwright MCP server with Claude Code.

It turns out it's easy:

claude mcp add playwright npx '@playwright/mcp@latest'
claude

The claude mcp add command only affects the current directory by default - it gets persisted in the ~/.claude.json file.

Now Claude can use Playwright to automate a Chrome browser! Tell it to "Use playwright mcp to open a browser to example.com" and watch it go - it can navigate pages, submit forms, execute custom JavaScript and take screenshots to feed back into the LLM.

The browser window stays visible which means you can interact with it too, including signing into websites so Claude can act on your behalf.

Tags: armin-ronacher, til, playwright, ai-assisted-programming, anthropic, claude, claude-code

shot-scraper 1.8

2025-03-25T01:59:38+00:00

shot-scraper 1.8

I've added a new feature to shot-scraper that makes it easier to share scripts for other people to use with the shot-scraper javascript command.

shot-scraper javascript lets you load up a web page in an invisible Chrome browser (via Playwright), execute some JavaScript against that page and output the results to your terminal. It's a fun way of running complex screen-scraping routines as part of a terminal session, or even chained together with other commands using pipes.

The -i/--input option lets you load that JavaScript from a file on disk - but now you can also use a gh: prefix to specify loading code from GitHub instead.

To quote the release notes:

shot-scraper javascript can now optionally load scripts hosted on GitHub via the new gh: prefix to the shot-scraper javascript -i/--input option. #173

Scripts can be referenced as gh:username/repo/path/to/script.js or, if the GitHub user has created a dedicated shot-scraper-scripts repository and placed scripts in the root of it, using gh:username/name-of-script.

For example, to run this readability.js script against any web page you can use the following:
shot-scraper javascript --input gh:simonw/readability \
  https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/

The output from that example starts like this:

{
    "title": "Qwen2.5-VL-32B: Smarter and Lighter",
    "byline": "Simon Willison",
    "dir": null,
    "lang": "en-gb",
    "content": "<div id=\"readability-page-1\"...

My simonw/shot-scraper-scripts repo only has that one file in it so far, but I'm looking forward to growing that collection and hopefully seeing other people create and share their own shot-scraper-scripts repos as well.

This feature is an imitation of a similar feature that's coming in the next release of LLM.

Tags: github, javascript, projects, scraping, annotated-release-notes, playwright, shot-scraper

microsoft/playwright-mcp

2025-03-25T01:40:05+00:00

microsoft/playwright-mcp

The Playwright team at Microsoft have released an MCP (Model Context Protocol) server wrapping Playwright, and it's pretty fascinating.

They implemented it on top of the Chrome accessibility tree, so MCP clients (such as the Claude Desktop app) can use it to drive an automated browser and use the accessibility tree to read and navigate pages that they visit.

Trying it out is quite easy if you have Claude Desktop and Node.js installed already. Edit your claude_desktop_config.json file:

code ~/Library/Application\ Support/Claude/claude_desktop_config.json

And add this:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "@playwright/mcp@latest"
      ]
    }
  }
}

Now when you launch Claude Desktop various new browser automation tools will be available to it, and you can tell Claude to navigate to a website and interact with it.

I ran the following to get a list of the available tools:

cd /tmp
git clone https://github.com/microsoft/playwright-mcp
cd playwright-mcp/src/tools
files-to-prompt . | llm -m claude-3.7-sonnet \
  'Output a detailed description of these tools'

The full output is here, but here's the truncated tool list:

Navigation Tools (common.ts)

browser_navigate: Navigate to a specific URL

browser_go_back: Navigate back in browser history

browser_go_forward: Navigate forward in browser history

browser_wait: Wait for a specified time in seconds

browser_press_key: Press a keyboard key

browser_save_as_pdf: Save current page as PDF

browser_close: Close the current page

Screenshot and Mouse Tools (screenshot.ts)

browser_screenshot: Take a screenshot of the current page

browser_move_mouse: Move mouse to specific coordinates

browser_click (coordinate-based): Click at specific x,y coordinates

browser_drag (coordinate-based): Drag mouse from one position to another

browser_type (keyboard): Type text and optionally submit

Accessibility Snapshot Tools (snapshot.ts)

browser_snapshot: Capture accessibility structure of the page

browser_click (element-based): Click on a specific element using accessibility reference

browser_drag (element-based): Drag between two elements

browser_hover: Hover over an element

browser_type (element-based): Type text into a specific element

Tags: ai, playwright, generative-ai, llms, anthropic, claude, llm-tool-use, model-context-protocol, files-to-prompt

shot-scraper 1.6 with support for HTTP Archives

2025-02-13T21:02:37+00:00

shot-scraper 1.6 with support for HTTP Archives

New release of my shot-scraper CLI tool for taking screenshots and scraping web pages.

The big new feature is HTTP Archive (HAR) support. The new shot-scraper har command can now create an archive of a page and all of its dependents like this:

shot-scraper har https://datasette.io/

This produces a datasette-io.har file (currently 163KB) which is JSON representing the full set of requests used to render that page. Here's a copy of that file. You can visualize that here using ericduran.github.io/chromeHAR.

That JSON includes full copies of all of the responses, base64 encoded if they are binary files such as images.

You can add the --zip flag to instead get a datasette-io.har.zip file, containing JSON data in har.har but with the response bodies saved as separate files in that archive.

The shot-scraper multi command lets you run shot-scraper against multiple URLs in sequence, specified using a YAML file. That command now takes a --har option (or --har-zip or --har-file name-of-file), described in the documentation, which will produce a HAR at the same time as taking the screenshots.

Shots are usually defined in YAML that looks like this:

- output: example.com.png
  url: http://www.example.com/
- output: w3c.org.png
  url: https://www.w3.org/

You can now omit the output: keys and generate a HAR file without taking any screenshots at all:

- url: http://www.example.com/
- url: https://www.w3.org/

Run like this:

shot-scraper multi shots.yml --har

Which outputs:

Skipping screenshot of 'https://www.example.com/'
Skipping screenshot of 'https://www.w3.org/'
Wrote to HAR file: trace.har

shot-scraper is built on top of Playwright, and the new features use the browser.new_context(record_har_path=...) parameter.

Tags: cli, projects, python, scraping, playwright, shot-scraper

Guidepup

2024-03-14T04:07:49+00:00

Guidepup

I’ve been hoping to find something like this for years. Guidepup is “a screen reader driver for test automation”—you can use it to automate both VoiceOver on macOS and NVDA on Windows, and it can both drive the screen reader for automated tests and even produce a video at the end of the test.

Also available: @guidepup/playwright, providing integration with the Playwright browser automation testing framework.

I’d love to see open source JavaScript libraries both use something like this for their testing and publish videos of the tests to demonstrate how they work in these common screen readers.

Tags: accessibility, screen-readers, playwright

Weeknotes: datasette-test, datasette-build, PSF board retreat

2024-01-21T11:34:43+00:00

I wrote about Page caching and custom templates in my last weeknotes. This week I wrapped up that work, modifying datasette-edit-templates to be compatible with the jinja2_environment_from_request() plugin hook. This means you can edit templates directly in Datasette itself and have those served either for the full instance or just for the instance when served from a specific domain (the Datasette Cloud case).

Testing plugins with Playwright

As Datasette 1.0 draws closer, I've started thinking about plugin compatibility. This is heavily inspired by my work on Datasette Cloud, which has been running the latest Datasette alphas for several months.

I spotted that datasette-cluster-map wasn't working correctly on Datasette Cloud, as it hadn't been upgraded to account for JSON API changes in Datasette 1.0.

datasette-cluster-map 0.18 fixed that, while continuing to work with previous versions of Datasette. More importantly, it introduced Playwright tests to exercise the plugin in a real Chromium browser running in GitHub Actions.

I've been wanting to establish a good pattern for this for a while, since a lot of Datasette plugins include JavaScript behaviour that warrants browser automation testing.

Alex Garcia figured this out for datasette-comments - inspired by his code I wrote up a TIL on Writing Playwright tests for a Datasette Plugin which I've now also used in datasette-search-all.

datasette-test

datasette-test is a new library that provides testing utilities for Datasette plugins. So far it offers two:

from datasette_test import Datasette
import pytest

@pytest.mark.asyncio
async def test_datasette():
    ds = Datasette(plugin_config={"my-plugin": {"config": "goes here"})

This datasette_test.Datasette class is a subclass of Datasette which helps write tests that work against both Datasette <1.0 and Datasette >=1.0a8 (releasing shortly). The way plugin configuration works is changing, and this plugin_config= parameter papers over that difference for plugin tests.

The other utility is a wait_until_responds("http://localhost:8001") function. Thes can be used to wait until a server has started, useful for testing with Playwright. I extracted this from Alex's datasette-comments tests.

datasette-build

So far this is just the skeleton of a new tool. I plan for datasette-build to offer comprehensive support for converting a directory full of static data files - JSON, TSV, CSV and more - into a SQLite database, and eventually to other database backends as well.

So far it's pretty minimal, but my goal is to use plugins to provide optional support for further formats, such as GeoJSON or Parquet or even .xlsx.

I really like using GitHub to keep smaller (less than 1GB) datasets under version control. My plan is for datasette-build to support that pattern, making it easy to load version-controlled data files into a SQLite database you can then query directly.

PSF board in-person meeting

I spent the last two days of this week at the annual Python Software Foundation in-person board meeting. It's been fantastic catching up with the other board members over more than just a Zoom connection, and we had a very thorough two days figuring out strategy for the next year and beyond.

Blog entries

Releases

datasette-edit-templates 0.4.3 - 2024-01-17
Plugin allowing Datasette templates to be edited within Datasette
datasette-test 0.2 - 2024-01-16
Utilities to help write tests for Datasette plugins and applications
datasette-cluster-map 0.18.1 - 2024-01-16
Datasette plugin that shows a map for any data with latitude/longitude columns
datasette-build 0.1a0 - 2024-01-15
Build a directory full of files into a SQLite database
datasette-auth-tokens 0.4a7 - 2024-01-13
Datasette plugin for authenticating access using API tokens
datasette-search-all 1.1.2 - 2024-01-08
Datasette plugin for searching all searchable tables at once

TILs

Publish releases to PyPI from GitHub Actions without a password or token - 2024-01-15
Using pprint() to print dictionaries while preserving their key order - 2024-01-15
Using expect() to wait for a selector to match multiple items - 2024-01-13
literalinclude with markers for showing code in documentation - 2024-01-10
Writing Playwright tests for a Datasette Plugin - 2024-01-09
How to get Cloudflare to cache HTML - 2024-01-09
Running Varnish on Fly - 2024-01-08

Tags: projects, datasette, weeknotes, datasette-cloud, playwright, psf

nat/natbot

2022-09-30T01:01:30+00:00

nat/natbot

Extremely devious hack by Nat Friedman: opens a browser using Playwright and then passes a DOM representation to GPT-3 in order to power a chat-style interface for driving the browser. Worth diving into the code to look at the prompt it uses, it’s fascinating.

Via @natfriedman

Tags: playwright, gpt-3, openai

Bundling binary tools in Python wheels

2022-05-23T15:06:04+00:00

I spotted a new (to me) pattern which I think is pretty interesting: projects are bundling compiled binary applications as part of their Python packaging wheels. I think it’s really neat.

pip install ziglang

Zig is a new programming language lead by Andrew Kelley that sits somewhere near Rust: Wikipedia calls it an "imperative, general-purpose, statically typed, compiled system programming language".

One of its most notable features is that it bundles its own C/C++ compiler, as a “hermetic” compiler - it’s completely standalone, unaffected by the system that it is operating within. I learned about this usage of the word hermetic this morning from How Uber Uses Zig by Motiejus Jakštys.

The concept reminds me of Gregory Szorc's python-build-standalone, which provides redistributable Python builds and was key to getting my Datasette Desktop Electron application working with its own hermetic build of Python.

One of the options provided for installing Zig (and its bundled toolchain) is to use pip:

% pip install ziglang
...
% python -m ziglang cc --help
OVERVIEW: clang LLVM compiler

USAGE: zig [options] file...

OPTIONS:
  -###                    Print (but do not run) the commands to run for this compilation
  --amdgpu-arch-tool=<value>
                          Tool used for detecting AMD GPU arch in the system.
...

This means you can now pip install a full C compiler for your current platform!

The way this works is really simple. The ziglang package that you install has two key files: A zig binary (155MB on my system) containing the full Zig compiled implementation, and a __main__.py module containing the following:

import os, sys, subprocess
sys.exit(subprocess.call([
    os.path.join(os.path.dirname(__file__), "zig"),
    *sys.argv[1:]
]))

The package also bundles lib and doc folders with supporting files used by Zig itself, unrelated to Python.

The Zig project then bundles and ships eight different Python wheels targetting different platforms. Here's their code that does that, which lists the platforms that are supported:

for zig_platform, python_platform in {
    'windows-i386':   'win32',
    'windows-x86_64': 'win_amd64',
    'macos-x86_64':   'macosx_10_9_x86_64',
    'macos-aarch64':  'macosx_11_0_arm64',
    'linux-i386':     'manylinux_2_12_i686.manylinux2010_i686',
    'linux-x86_64':   'manylinux_2_12_x86_64.manylinux2010_x86_64',
    'linux-armv7a':   'manylinux_2_17_armv7l.manylinux2014_armv7l',
    'linux-aarch64':  'manylinux_2_17_aarch64.manylinux2014_aarch64',
}.items():
    # Build the wheel here...

They suggest that if you want to run their tools from a Python program you do so like this, to ensure your script can find the installed binary:

import sys, subprocess

subprocess.call([sys.executable, "-m", "ziglang"])

I find this whole approach pretty fascinating. I really love the idea that I can add a full C/C++ compiler as a dependency to any of my Python projects, and thanks to Python wheels I'll automatically get a binary excutable compiled for my current platform.

Playwright Python

I spotted another example of this pattern recently in Playwright Python. Playwright is Microsoft's open source browser automation and testing framework - a kind of modern Selenium. I used it recently to build my shot-scraper screenshot automation tool.

Playwright provides a full-featured API for controlling headless (and headful) browser instances, with implementations in Node.js, Python, Java and .NET.

I was intrigued as to how they had developed such a sophisticated API for four different platforms/languages at once, providing full equivalence for all of their features across all four.

So I dug around in their Python package (from pip install playwright) and found this:

77M ./venv/lib/python3.10/site-packages/playwright/driver/node

That's a full copy of the Node.js binary!

% ./venv/lib/python3.10/site-packages/playwright/driver/node --version
v16.13.0

Playwright Python works by providing a Python layer on top of the existing JavaScript API library. It runs a Node.js process which does the actual work, the Python library just communicates with the JavaScript for you.

As with Zig, the Playwright team offer seven pre-compiled wheels for different platforms. The list today is:

playwright-1.22.0-py3-none-win_amd64.whl
playwright-1.22.0-py3-none-win32.whl
playwright-1.22.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
playwright-1.22.0-py3-none-manylinux1_x86_64.whl
playwright-1.22.0-py3-none-macosx_11_0_universal2.whl
playwright-1.22.0-py3-none-macosx_11_0_arm64.whl
playwright-1.22.0-py3-none-macosx_10_13_x86_64.whl

I wish I could say "you can now pip install a browser!" but Playwright doesn't actually bundle the browsers themselves - you need to run python -m playwright install to download those separately.

Pretty fascinating example of the same pattern though!

pip install a SQLite database

It's not quite the same thing, since it's not packaging an executable, but the one project I have that fits this mould if you squint a little is my datasette-basemap plugin.

It's a Datasette plugin which bundles a 23MB SQLite database file containing OpenStreetMap tiles for the first seven zoom levels of their world map - 5,461 tile images total.

I built it so that people could use my datasette-cluster-map and datasette-leaflet-geojson entirely standalone, without needing to load tiles from a central tile server.

You can play with a demo here. I wrote more about that project in Serving map tiles from SQLite with MBTiles and datasette-tiles. It's pretty fun to be able to run pip install datasette-basemap to install a full map of the world.

Seen any other interesting examples of pip install being (ab)used in this way? Ping them to me on Twitter.

Update: Paul O'Leary McCann points out that PyPI has a default 60MB size limit for packages, though it can be raised on a case-by-case basis. He wrote about this in Distributing Large Files with PyPI Packages.

Tags: packaging, pypi, python, playwright, zig

@newshomepages

2022-03-12T19:21:34+00:00

@newshomepages

Ben Welsh used my shot-scraper tool and GitHub Actions to launch a Twitter bot which tweets screenshots of newspaper homepages on a scheduled basis. Ben says: “The tech is so easy, I was able to pull it off in a couple hours at zero cost. A decade ago I ran a similar project using the cloud resources of the day. [...] It costs thousands of dollars and the screenshots were of much lower quality. Incredible progress!”

Via @palewire

Tags: twitter, github-actions, playwright, shot-scraper, ben-welsh

Weeknotes: Distracted by Playwright

2022-03-12T00:30:26+00:00

My goal for this week was to unblock progress on Datasette by finally finishing the dash encoding implementation I described last week. I was getting close, and then I got very distracted by Playwright.

Dash encoding v2

In Why I invented “dash encoding”, a new encoding scheme for URL paths I described a new mechanism I had invented for handling the gnarly problem of including table names with / characters in the URL path on Datasette. The very short version: you can't use URL encoding in a path, because common proxies (including Apache and Nginx) will decode them before they get to your application.

Thanks to feedback on that post I actually changed my design: I'm now using a variant of percent encoding that uses the - instead of the %. More details in the issue - and I'll write this up fully once I've finished landing the change.

shot-scraper and Playwright

I thoroughly nerd-sniped myself with this one. I started investigating possibilities for automatically generating screeshots for documentation, and realized that Playwright made this substantially easier than it has been in the past.

The result was shot-scraper - a new command-line utility for taking screenshots of web pages, or portions of web pages - and for running through a set of screenshots defined in a YAML file.

I still can't quite believe how quickly this came together.

Every now and then a tool comes along which adds a fundamental new set of capabilities to your toolbox, and can be multiplied against other tools to open up a huge range of possibilities.

Playwright feels like one of those tools.

A quick pip install playwright is all it takes to start writing robust browser automation tools, using dedicated standalone headless instances of multiple browsers that are installed for you using playwright install.

It's easy to run in CI - getting it working in GitHub Actions was trivial.

shot-scraper is my first project built on Playwright, but there will definitely be more.

shot-scraper accessibility

I started a Twitter conversation asking for ways to write automated tests that exercise screen readers - not just running audit rules, but actually simulating what happens when a screen reader user attempts to navigate through a specific flow within an application.

The most interesting answer I had was from Ben Mustill-Rose, who built a system for automating tests against an Android screen reader while working on BBC iPlayer - demo here.

@fardarter pointed me back to Playwright again, which turns out to have an Accessibility snapshot mechanism that can dump out the current state of the Chromium accessibility tree.

I couldn't resist adding that to shot-scraper - so now you can run the following to see the accessibility tree for a web page:

~ % shot-scraper accessibility https://datasette.io
{
    "role": "WebArea",
    "name": "Datasette: An open source multi-tool for exploring and publishing data",
    "children": [
        {
            "role": "link",
            "name": "Uses"
        },
        {
            "role": "link",
            "name": "Documentation"
        },

Full output here.

As a really fun bonus trick: since the output is JSON, you can pipe it into sqlite-utils insert to get a SQLite database:

shot-scraper accessibility https://datasette.io \
    | jq .children | sqlite-utils insert \
    /tmp/accessibility.db nodes - --alter

And then open it in Datasette Desktop and start faceting by role and heading level!

sqlite-utils documentation improvements

I complained on Twitter that the way type information was displayed in the Sphinx sqlite-utils API reference documentation was ugly:

Adam Johnson pointed me to the autodoc_typehints = "description" option which fixes this. I spent a while tidying up the documentation to work better with this, mainly by adding a whole bunch of :param name: description tags that I had previously omitted. That work happenen in this issue. I think it looks much better now:

Releases this week

image-diff: 0.2.1 - (3 releases total) - 2022-03-11
CLI tool for comparing images
sqlite-utils: 3.25.1 - (98 releases total) - 2022-03-11
Python CLI utility and library for manipulating SQLite databases
shot-scraper: 0.4 - (5 releases total) - 2022-03-10
Automated website screenshots using GitHub Actions
django-sql-dashboard: 1.0.2 - (34 releases total) - 2022-03-08
Django app for building dashboards using raw SQL queries
geojson-to-sqlite: 1.0 - (8 releases total) - 2022-03-04
CLI tool for converting GeoJSON files to SQLite (with SpatiaLite)
xml-analyser: 1.3 - (4 releases total) - 2022-03-01
Simple command line tool for quickly analysing the structure of an arbitrary XML file
datasette-dateutil: 0.3 - (4 releases total) - 2022-03-01
dateutil functions for Datasette

TIL this week

Tags: accessibility, documentation, datasette, weeknotes, sphinx-docs, playwright, shot-scraper

shot-scraper: automated screenshots for documentation, built on Playwright

2022-03-10T00:13:30+00:00

shot-scraper is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool - hence the name - which I picked as a complement to my git scraping and help scraping techniques.

Update 13th March 2022: The new shot-scraper javascript command can now be used to scrape web pages from the command line.

Update 14th October 2022: Automating screenshots for the Datasette documentation using shot-scraper offers a tutorial introduction to using the tool.

The problem

I like to include screenshots in documentation. I recently started writing end-user tutorials for Datasette, which are particularly image heavy (for example).

As software changes over time, screenshots get out-of-date. I don't like the idea of stale screenshots, but I also don't want to have to manually recreate them every time I make the tiniest tweak to the visual appearance of my software.

Introducing shot-scraper

shot-scraper is a tool for automating this process. You can install it using pip like this:

pip install shot-scraper
shot-scraper install

That second shot-scraper install line will install the browser it needs to do its job - more on that later.

You can use it in two ways. To take a one-off screenshot, you can run it like this:

shot-scraper https://simonwillison.net/ -o simonwillison.png

Or if you want to take a set of screenshots in a repeatable way, you can define them in a YAML file that looks like this:

- url: https://simonwillison.net/
  output: simonwillison.png
- url: https://www.example.com/
  width: 400
  height: 400
  quality: 80
  output: example.jpg

And then use shot-scraper multi to execute every screenshot in one go:

% shot-scraper multi shots.yml 
Screenshot of 'https://simonwillison.net/' written to 'simonwillison.png'
Screenshot of 'https://www.example.com/' written to 'example.jpg'

The documentation describes all of the available options you can use when taking a screenshot.

Each option can be provided to the shot-scraper one-off tool, or can be embedded in the YAML file for use with shot-scraper multi.

JavaScript and CSS selectors

The default behaviour for shot-scraper is to take a full page screenshot, using a browser width of 1280px.

For documentation screenshots you probably don't want the whole page though - you likely want to create an image of one specific part of the interface.

The --selector option allows you to specify an area of the page by CSS selector. The resulting image will consist just of that part of the page.

What if you want to modify the page in addition to selecting a specific area?

The --javascript option lets you pass in a block of JavaScript code which will be injected into the page and executed after the page has loaded, but before the screenshot is taken.

The combination of these two options - also available as javascript: and selector: keys in the YAML file - should be flexible enough to cover the custom screenshot case for documentation.

A complex example

To prove to myself that the tool works, I decided to try replicating this screenshot from my tutorial.

I made the original using CleanShot X, manually adding the two pink arrows:

This is pretty tricky!

It's not this whole page, just a subset of the page
The cog menu for one of the columns is open, which means the cog icon needs to be clicked before taking the screenshot
There are two pink arrows superimposed on the image

I decided to do use just one arrow for the moment, which should hopefully result in a clearer image.

I started by creating my own pink arrow SVG using Figma:

I then fiddled around in the Firefox developer console for quite a while, working out the JavaScript needed to trim the page down to the bit I wanted, open the menu and position the arrow.

With the JavaScript figured out, I pasted it into a YAML file called shot.yml:

- url: https://congress-legislators.datasettes.com/legislators/executive_terms?start__startswith=18&type=prez
  javascript: |
    new Promise(resolve => {
      // Run in a promise so we can sleep 1s at the end
      function remove(el) { el.parentNode.removeChild(el);}
      // Remove header and footer
      remove(document.querySelector('header'));
      remove(document.querySelector('footer'));
      // Remove most of the children of .content
      Array.from(document.querySelectorAll('.content > *:not(.table-wrapper,.suggested-facets)')).map(remove)
      // Bit of breathing room for the screenshot
      document.body.style.marginTop = '10px';
      // Add a bit of padding to .content
      var content = document.querySelector('.content');
      content.style.width = '820px';
      content.style.padding = '10px';
      // Open the menu - it's an SVG so we need to use dispatchEvent here
      document.querySelector('th.col-executive_id svg').dispatchEvent(new Event('click'));
      // Remove all but table header and first 11 rows
      Array.from(document.querySelectorAll('tr')).slice(12).map(remove);
      // Add a pink SVG arrow
      let div = document.createElement('div');
      div.innerHTML = `<svg width="104" height="60" fill="none" xmlns="http://www.w3.org/2000/svg">
        <g filter="url(#a)">
          <path fill-rule="evenodd" clip-rule="evenodd" d="m76.7 1 2 2 .2-.1.1.4 20 20a3.5 3.5 0 0 1 0 5l-20 20-.1.4-.3-.1-1.9 2a3.5 3.5 0 0 1-5.4-4.4l3.2-14.4H4v-12h70.6L71.3 5.4A3.5 3.5 0 0 1 76.7 1Z" fill="#FF31A0"/>
        </g>
        <defs>
          <filter id="a" x="0" y="0" width="104" height="59.5" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
              <feFlood flood-opacity="0" result="BackgroundImageFix"/>
              <feColorMatrix in="SourceAlpha" values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 0" result="hardAlpha"/>
              <feOffset dy="4"/>
              <feGaussianBlur stdDeviation="2"/>
              <feComposite in2="hardAlpha" operator="out"/>
              <feColorMatrix values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0"/>
              <feBlend in2="BackgroundImageFix" result="effect1_dropShadow_2_26"/>
              <feBlend in="SourceGraphic" in2="effect1_dropShadow_2_26" result="shape"/>
          </filter>
        </defs>
      </svg>`;
      let svg = div.firstChild;
      content.appendChild(svg);
      content.style.position = 'relative';
      svg.style.position = 'absolute';
      // Give the menu time to finish fading in
      setTimeout(() => {
        // Position arrow pointing to the 'facet by this' menu item
        var pos = document.querySelector('.dropdown-facet').getBoundingClientRect();
        svg.style.left = (pos.left - pos.width) + 'px';
        svg.style.top = (pos.top - 20) + 'px';
        resolve();
      }, 1000);
    });
  output: annotated-screenshot.png
  selector: .content

And ran this command to generate the screenshot:

shot-scraper multi shot.yml

The generated annotated-screenshot.png image looks like this:

I'm pretty happy with this! I think it works very well as a proof of concept for the process.

How it works: Playwright

I built the first prototype of shot-scraper using Puppeteer, because I had used that before.

Then I noticed that the puppeteer-cli package I was using hadn't had an update in two years, which reminded me to check out Playwright.

I've been looking for an excuse to learn Playwright for a while now, and this project turned out to be ideal.

Playwright is Microsoft's open source browser automation framework. They promote it as a testing tool, but it has plenty of applications outside of testing - screenshot automation and screen scraping being two of the most obvious.

Playwright is comprehensive: it downloads its own custom browser builds, and can run tests across multiple different rendering engines.

The second prototype used the Playwright CLI utility instead, executed via npx:

subprocess.run(
    [
        "npx",
        "playwright",
        "screenshot",
        "--full-page",
        url,
        output,
    ],
    capture_output=True,
)

This could take a full page screenshot, but that CLI tool wasn't flexible enough to take screenshots of specific elements. So I needed to switch to the Playwright programmatic API.

I started out trying to get Python to generate and pass JavaScript to the Node.js library... and then I spotted the official Playwright for Python package.

pip install playwright

It's amazing! It has the exact same functionality as the JavaScript library - the same classes, the same methods. Everything just works, in both languages.

I was curious how they pulled this off, so I dug inside the playwright Python package in my site-packages folder... and found it bundles a full Node.js binary executable and uses it to bridge the two worlds! What a wild hack.

Thanks to Playwright, the entire implementation of shot-scraper is currently just 181 lines of Python code - it's all glue code tying together a Click CLI interface with some code that calls Playwright to do the actual work.

I couldn't be more impressed with Playwright. I'll definitely be using it for other projects - for one thing, I think I'll finally be able to add automated tests to my Datasette Desktop Electron application.

Hooking shot-scraper up to GitHub Actions

I built shot-scraper very much with GitHub Actions in mind.

My shot-scraper-demo repository is my first live demo of the tool.

Once a day, it runs this shots.yml file, generates two screenshots and commits them back to the repository.

One of them is the tutorial screenshot described above.

The other is a screenshot of the list of "recently spotted owls" from this page on owlsnearme.com. I wanted a page that would change on an occasional basis, to demonstrate GitHub's neat image diffing interface.

I may need to change that demo though! That page includes "spotted 5 hours ago" text, which means that there's almost always a tiny pixel difference, like this one (use the "swipe" comparison tool to watch 6 hours ago change to 7 hours ago under the top left photo).

Storing image files that change frequently in a free repository on GitHub feels rude to me, so please use this tool cautiously there!

What's next?

I had ambitious plans to add utilities to the tool that would help with annotations, such as adding pink arrows and drawing circles around different elements on the page.

I've shelved those plans for the moment: as the demo above shows, the JavaScript hook is good enough. I may revisit this later once common patterns have started to emerge.

So really, my next step is to start using this tool for my own projects - to generate screenshots for my documentation.

I'm also very interested to see what kinds of things other people use this for.

Tags: cli, documentation, projects, scraping, github-actions, git-scraping, puppeteer, playwright, shot-scraper