<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: puppeteer</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/puppeteer.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-06-16T19:33:58+00:00</updated><author><name>Simon Willison</name></author><entry><title>Jina AI Reader</title><link href="https://simonwillison.net/2024/Jun/16/jina-ai-reader/#atom-tag" rel="alternate"/><published>2024-06-16T19:33:58+00:00</published><updated>2024-06-16T19:33:58+00:00</updated><id>https://simonwillison.net/2024/Jun/16/jina-ai-reader/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/reader/"&gt;Jina AI Reader&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jina AI provide a number of different AI-related platform products, including an excellent &lt;a href="https://huggingface.co/collections/jinaai/jina-embeddings-v2-65708e3ec4993b8fb968e744"&gt;family of embedding models&lt;/a&gt;, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.&lt;/p&gt;
&lt;p&gt;Add &lt;code&gt;r.jina.ai&lt;/code&gt; to the front of a URL to get back Markdown of that page, for example &lt;a href="https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/"&gt;https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/&lt;/a&gt; - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.&lt;/p&gt;
&lt;p&gt;The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.&lt;/p&gt;
&lt;p&gt;The Apache 2 licensed source code for the hosted service is &lt;a href="https://github.com/jina-ai/reader"&gt;on GitHub&lt;/a&gt; - it's written in TypeScript and &lt;a href="https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/puppeteer.ts"&gt;uses Puppeteer&lt;/a&gt; to run &lt;a href="https://github.com/mozilla/readability"&gt;Readabiliy.js&lt;/a&gt; and &lt;a href="https://github.com/mixmark-io/turndown"&gt;Turndown&lt;/a&gt; against the scraped page.&lt;/p&gt;
&lt;p&gt;It can also handle PDFs, which have their contents extracted &lt;a href="https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/pdf-extract.ts"&gt;using PDF.js&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's also a search feature, &lt;code&gt;s.jina.ai/search+term+goes+here&lt;/code&gt;, which &lt;a href="https://github.com/jina-ai/reader/blob/ed80c9a4a2c340fb7c874347d3f25501e42ca251/backend/functions/src/services/brave-search.ts"&gt;uses the Brave Search API&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/brave"&gt;brave&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="markdown"/><category term="ai"/><category term="puppeteer"/><category term="llms"/><category term="jina"/><category term="brave"/></entry><entry><title>shot-scraper: automated screenshots for documentation, built on Playwright</title><link href="https://simonwillison.net/2022/Mar/10/shot-scraper/#atom-tag" rel="alternate"/><published>2022-03-10T00:13:30+00:00</published><updated>2022-03-10T00:13:30+00:00</updated><id>https://simonwillison.net/2022/Mar/10/shot-scraper/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt; is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool - hence the name - which I picked as a complement to my &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;git scraping&lt;/a&gt; and &lt;a href="https://simonwillison.net/2022/Feb/2/help-scraping/"&gt;help scraping&lt;/a&gt; techniques.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 13th March 2022:&lt;/strong&gt; The new &lt;code&gt;shot-scraper javascript&lt;/code&gt; command can now be used to &lt;a href="https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/"&gt;scrape web pages from the command line&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 14th October 2022:&lt;/strong&gt; &lt;a href="https://simonwillison.net/2022/Oct/14/automating-screenshots/"&gt;Automating screenshots for the Datasette documentation using shot-scraper&lt;/a&gt; offers a tutorial introduction to using the tool.&lt;/p&gt;
&lt;h4&gt;The problem&lt;/h4&gt;
&lt;p&gt;I like to include screenshots in documentation. I recently &lt;a href="https://simonwillison.net/2022/Feb/27/datasette-tutorials/"&gt;started writing end-user tutorials&lt;/a&gt; for Datasette, which are particularly image heavy (&lt;a href="https://datasette.io/tutorials/explore"&gt;for example&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;As software changes over time, screenshots get out-of-date. I don't like the idea of stale screenshots, but I also don't want to have to manually recreate them every time I make the tiniest tweak to the visual appearance of my software.&lt;/p&gt;
&lt;h4&gt;Introducing shot-scraper&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; is a tool for automating this process. You can install it using &lt;code&gt;pip&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install shot-scraper
shot-scraper install
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That second &lt;code&gt;shot-scraper install&lt;/code&gt; line will install the browser it needs to do its job - more on that later.&lt;/p&gt;
&lt;p&gt;You can use it in two ways. To take a one-off screenshot, you can run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper https://simonwillison.net/ -o simonwillison.png
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or if you want to take a set of screenshots in a repeatable way, you can define them in a YAML file that looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://simonwillison.net/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;simonwillison.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://www.example.com/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;400&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;400&lt;/span&gt;
  &lt;span class="pl-ent"&gt;quality&lt;/span&gt;: &lt;span class="pl-c1"&gt;80&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;example.jpg&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And then use &lt;code&gt;shot-scraper multi&lt;/code&gt; to execute every screenshot in one go:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;% shot-scraper multi shots.yml 
Screenshot of 'https://simonwillison.net/' written to 'simonwillison.png'
Screenshot of 'https://www.example.com/' written to 'example.jpg'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://shot-scraper.datasette.io/en/stable/screenshots.html"&gt;The documentation&lt;/a&gt; describes all of the available options you can use when taking a screenshot.&lt;/p&gt;
&lt;p&gt;Each option can be provided to the &lt;code&gt;shot-scraper&lt;/code&gt; one-off tool, or can be embedded in the YAML file for use with &lt;code&gt;shot-scraper multi&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;JavaScript and CSS selectors&lt;/h4&gt;
&lt;p&gt;The default behaviour for &lt;code&gt;shot-scraper&lt;/code&gt; is to take a full page screenshot, using a browser width of 1280px.&lt;/p&gt;
&lt;p&gt;For documentation screenshots you probably don't want the whole page though - you likely want to create an image of one specific part of the interface.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--selector&lt;/code&gt; option allows you to specify an area of the page by CSS selector. The resulting image will consist just of that part of the page.&lt;/p&gt;
&lt;p&gt;What if you want to modify the page in addition to selecting a specific area?&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--javascript&lt;/code&gt; option lets you pass in a block of JavaScript code which will be injected into the page and executed after the page has loaded, but before the screenshot is taken.&lt;/p&gt;
&lt;p&gt;The combination of these two options - also available as &lt;code&gt;javascript:&lt;/code&gt; and &lt;code&gt;selector:&lt;/code&gt; keys in the YAML file - should be flexible enough to cover the custom screenshot case for documentation.&lt;/p&gt;
&lt;h4 id="a-complex-example"&gt;A complex example&lt;/h4&gt;
&lt;p&gt;To prove to myself that the tool works, I decided to try replicating this screenshot from &lt;a href="https://datasette.io/tutorials/explore"&gt;my tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I made the original using &lt;a href="https://cleanshot.com/"&gt;CleanShot X&lt;/a&gt;, manually adding the two pink arrows:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/select-facets-original.jpg" alt="A screenshot of a portion of the table interface in Datasette, with a menu open and two pink arrows pointing to menu items" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This is pretty tricky!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It's not &lt;a href="https://congress-legislators.datasettes.com/legislators/executive_terms?start__startswith=18&amp;amp;type=prez"&gt;this whole page&lt;/a&gt;, just a subset of the page&lt;/li&gt;
&lt;li&gt;The cog menu for one of the columns is open, which means the cog icon needs to be clicked before taking the screenshot&lt;/li&gt;
&lt;li&gt;There are two pink arrows superimposed on the image&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I decided to do use just one arrow for the moment, which should hopefully result in a clearer image.&lt;/p&gt;
&lt;p&gt;I started by &lt;a href="https://github.com/simonw/shot-scraper/issues/9#issuecomment-1063314278"&gt;creating my own pink arrow SVG&lt;/a&gt; using Figma:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/pink-arrow.png" alt="A big pink arrow, with a drop shadow" style="width: 200px; max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I then fiddled around in the Firefox developer console for quite a while, working out the JavaScript needed to trim the page down to the bit I wanted, open the menu and position the arrow.&lt;/p&gt;
&lt;p&gt;With the JavaScript figured out, I pasted it into a YAML file called &lt;code&gt;shot.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/executive_terms?start__startswith=18&amp;amp;type=prez&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;    new Promise(resolve =&amp;gt; {&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Run in a promise so we can sleep 1s at the end&lt;/span&gt;
&lt;span class="pl-s"&gt;      function remove(el) { el.parentNode.removeChild(el);}&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove header and footer&lt;/span&gt;
&lt;span class="pl-s"&gt;      remove(document.querySelector('header'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      remove(document.querySelector('footer'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove most of the children of .content&lt;/span&gt;
&lt;span class="pl-s"&gt;      Array.from(document.querySelectorAll('.content &amp;gt; *:not(.table-wrapper,.suggested-facets)')).map(remove)&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Bit of breathing room for the screenshot&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.body.style.marginTop = '10px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Add a bit of padding to .content&lt;/span&gt;
&lt;span class="pl-s"&gt;      var content = document.querySelector('.content');&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.width = '820px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.padding = '10px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Open the menu - it's an SVG so we need to use dispatchEvent here&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelector('th.col-executive_id svg').dispatchEvent(new Event('click'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove all but table header and first 11 rows&lt;/span&gt;
&lt;span class="pl-s"&gt;      Array.from(document.querySelectorAll('tr')).slice(12).map(remove);&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Add a pink SVG arrow&lt;/span&gt;
&lt;span class="pl-s"&gt;      let div = document.createElement('div');&lt;/span&gt;
&lt;span class="pl-s"&gt;      div.innerHTML = `&amp;lt;svg width="104" height="60" fill="none" xmlns="http://www.w3.org/2000/svg"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;g filter="url(#a)"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;path fill-rule="evenodd" clip-rule="evenodd" d="m76.7 1 2 2 .2-.1.1.4 20 20a3.5 3.5 0 0 1 0 5l-20 20-.1.4-.3-.1-1.9 2a3.5 3.5 0 0 1-5.4-4.4l3.2-14.4H4v-12h70.6L71.3 5.4A3.5 3.5 0 0 1 76.7 1Z" fill="#FF31A0"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;/g&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;defs&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;filter id="a" x="0" y="0" width="104" height="59.5" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feFlood flood-opacity="0" result="BackgroundImageFix"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feColorMatrix in="SourceAlpha" values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 0" result="hardAlpha"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feOffset dy="4"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feGaussianBlur stdDeviation="2"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feComposite in2="hardAlpha" operator="out"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feColorMatrix values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feBlend in2="BackgroundImageFix" result="effect1_dropShadow_2_26"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feBlend in="SourceGraphic" in2="effect1_dropShadow_2_26" result="shape"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;/filter&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;/defs&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;      &amp;lt;/svg&amp;gt;`;&lt;/span&gt;
&lt;span class="pl-s"&gt;      let svg = div.firstChild;&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.appendChild(svg);&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.position = 'relative';&lt;/span&gt;
&lt;span class="pl-s"&gt;      svg.style.position = 'absolute';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Give the menu time to finish fading in&lt;/span&gt;
&lt;span class="pl-s"&gt;      setTimeout(() =&amp;gt; {&lt;/span&gt;
&lt;span class="pl-s"&gt;        // Position arrow pointing to the 'facet by this' menu item&lt;/span&gt;
&lt;span class="pl-s"&gt;        var pos = document.querySelector('.dropdown-facet').getBoundingClientRect();&lt;/span&gt;
&lt;span class="pl-s"&gt;        svg.style.left = (pos.left - pos.width) + 'px';&lt;/span&gt;
&lt;span class="pl-s"&gt;        svg.style.top = (pos.top - 20) + 'px';&lt;/span&gt;
&lt;span class="pl-s"&gt;        resolve();&lt;/span&gt;
&lt;span class="pl-s"&gt;      }, 1000);&lt;/span&gt;
&lt;span class="pl-s"&gt;    });&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;annotated-screenshot.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;.content&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And ran this command to generate the screenshot:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper multi shot.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The generated &lt;code&gt;annotated-screenshot.png&lt;/code&gt; image looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/annotated-screenshot.png" alt="A screenshot of the table with the menu open and a single pink arrow pointing to the 'facet by this' menu item" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm pretty happy with this! I think it works very well as a proof of concept for the process.&lt;/p&gt;
&lt;h4 id="how-it-works-playwright"&gt;How it works: Playwright&lt;/h4&gt;
&lt;p&gt;I built the &lt;a href="https://github.com/simonw/shot-scraper/tree/44995cd45ca6c56d34c5c3d131217f7b9170f6f7"&gt;first prototype&lt;/a&gt; of &lt;code&gt;shot-scraper&lt;/code&gt; using Puppeteer, because I had &lt;a href="https://simonwillison.net/2020/Sep/3/weeknotes-airtable-screenshots-dogsheep/"&gt;used that before&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I noticed that the &lt;a href="https://www.npmjs.com/package/puppeteer-cli"&gt;puppeteer-cli&lt;/a&gt; package I was using hadn't had an update in two years, which reminded me to check out Playwright.&lt;/p&gt;
&lt;p&gt;I've been looking for an excuse to learn &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt; for a while now, and this project turned out to be ideal.&lt;/p&gt;
&lt;p&gt;Playwright is Microsoft's open source browser automation framework. They promote it as a testing tool, but it has plenty of applications outside of testing - screenshot automation and screen scraping being two of the most obvious.&lt;/p&gt;
&lt;p&gt;Playwright is comprehensive: it downloads its own custom browser builds, and can run tests across multiple different rendering engines.&lt;/p&gt;
&lt;p&gt;The second prototype used the &lt;a href="https://github.com/simonw/shot-scraper/tree/b3318b2f27ca1526d5a9f06de50cf9900dd4d8d0"&gt;Playwright CLI utility&lt;/a&gt; instead, &lt;a href="https://github.com/simonw/shot-scraper/blob/b3318b2f27ca1526d5a9f06de50cf9900dd4d8d0/shot_scraper/cli.py#L39-L50"&gt;executed via npx&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;subprocess&lt;/span&gt;.&lt;span class="pl-en"&gt;run&lt;/span&gt;(
    [
        &lt;span class="pl-s"&gt;"npx"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"playwright"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"screenshot"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"--full-page"&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;url&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;output&lt;/span&gt;,
    ],
    &lt;span class="pl-s1"&gt;capture_output&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;,
)&lt;/pre&gt;
&lt;p&gt;This could take a full page screenshot, but that CLI tool wasn't flexible enough to take screenshots of specific elements. So I needed to switch to the Playwright programmatic API.&lt;/p&gt;
&lt;p&gt;I started out trying to get Python to generate and pass JavaScript to the Node.js library... and then I spotted the official &lt;a href="https://playwright.dev/python/docs/intro"&gt;Playwright for Python&lt;/a&gt; package.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install playwright
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It's amazing! It has the exact same functionality as the JavaScript library - the same classes, the same methods. Everything just works, in both languages.&lt;/p&gt;
&lt;p&gt;I was curious how they pulled this off, so I dug inside the &lt;code&gt;playwright&lt;/code&gt; Python package in my &lt;code&gt;site-packages&lt;/code&gt; folder... and found it bundles a full Node.js binary executable and uses it to bridge the two worlds! What a wild hack.&lt;/p&gt;
&lt;p&gt;Thanks to Playwright, the entire implementation of &lt;code&gt;shot-scraper&lt;/code&gt; is currently just &lt;a href="https://github.com/simonw/shot-scraper/blob/0.3/shot_scraper/cli.py"&gt;181 lines of Python code&lt;/a&gt; - it's all glue code tying together a &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; CLI interface with some code that calls Playwright to do the actual work.&lt;/p&gt;
&lt;p&gt;I couldn't be more impressed with Playwright. I'll definitely be using it for other projects - for one thing, I think I'll finally be able to add automated tests to my &lt;a href="https://datasette.io/desktop"&gt;Datasette Desktop&lt;/a&gt; Electron application.&lt;/p&gt;
&lt;h4&gt;Hooking shot-scraper up to GitHub Actions&lt;/h4&gt;
&lt;p&gt;I built &lt;code&gt;shot-scraper&lt;/code&gt; very much with GitHub Actions in mind.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/shot-scraper-demo"&gt;shot-scraper-demo&lt;/a&gt; repository is my first live demo of the tool.&lt;/p&gt;
&lt;p&gt;Once a day, it runs &lt;a href="https://github.com/simonw/shot-scraper-demo/blob/3fdd9d3e79f95d9d396aeefd5bf65e85a7700ef4/.github/workflows/shots.yml"&gt;this shots.yml&lt;/a&gt; file, generates two screenshots and commits them back to the repository.&lt;/p&gt;
&lt;p&gt;One of them is the tutorial screenshot described above.&lt;/p&gt;
&lt;p&gt;The other is a screenshot of the list of "recently spotted owls" from &lt;a href="https://www.owlsnearme.com/?place=127871"&gt;this page&lt;/a&gt; on &lt;a href="https://www.owlsnearme.com/"&gt;owlsnearme.com&lt;/a&gt;. I wanted a page that would change on an occasional basis, to demonstrate GitHub's neat image diffing interface.&lt;/p&gt;
&lt;p&gt;I may need to change that demo though! That page includes "spotted 5 hours ago" text, which means that there's almost always a tiny pixel difference, &lt;a href="https://github.com/simonw/shot-scraper-demo/commit/bc86510f49b6f8d6728c9f1880b999c83361dd5a#diff-897c3444fbbb2033cbba5840da4994d01c3f396e0cdf4b0613d7f410db9887e0"&gt;like this one&lt;/a&gt; (use the "swipe" comparison tool to watch 6 hours ago change to 7 hours ago under the top left photo).&lt;/p&gt;
&lt;p&gt;Storing image files that change frequently in a free repository on GitHub feels rude to me, so please use this tool cautiously there!&lt;/p&gt;
&lt;h4&gt;What's next?&lt;/h4&gt;
&lt;p&gt;I had ambitious plans to add utilities to the tool that would &lt;a href="https://github.com/simonw/shot-scraper/issues/9"&gt;help with annotations&lt;/a&gt;, such as adding pink arrows and drawing circles around different elements on the page.&lt;/p&gt;
&lt;p&gt;I've shelved those plans for the moment: as the demo above shows, the JavaScript hook is good enough. I may revisit this later once common patterns have started to emerge.&lt;/p&gt;
&lt;p&gt;So really, my next step is to start using this tool for my own projects - to generate screenshots for my documentation.&lt;/p&gt;
&lt;p&gt;I'm also very interested to see what kinds of things other people use this for.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git-scraping"&gt;git-scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/playwright"&gt;playwright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="documentation"/><category term="projects"/><category term="scraping"/><category term="github-actions"/><category term="git-scraping"/><category term="puppeteer"/><category term="playwright"/><category term="shot-scraper"/></entry><entry><title>A framework for building Open Graph images</title><link href="https://simonwillison.net/2021/Jun/22/open-graph-images/#atom-tag" rel="alternate"/><published>2021-06-22T21:25:13+00:00</published><updated>2021-06-22T21:25:13+00:00</updated><id>https://simonwillison.net/2021/Jun/22/open-graph-images/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.blog/2021-06-22-framework-building-open-graph-images/"&gt;A framework for building Open Graph images&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
GitHub’s new social preview images are generated by a Node.js script that fetches data from their GraphQL API, generates an HTML version of the card and then grabs a PNG snapshot of it using Puppeteer. It takes an average of 280ms to serve an image and generates around 2 million unique images a day. Interestingly, they found that bumping the available RAM from 512MB up to 513MB had a big effect on performance, because Chromium detects devices on 512MB or less and switches some processes from parallel to sequential.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nodejs"&gt;nodejs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="nodejs"/><category term="puppeteer"/></entry><entry><title>Animating a commit based Sudoku game using Puppeteer</title><link href="https://simonwillison.net/2020/Oct/9/animating-using-puppeteer/#atom-tag" rel="alternate"/><published>2020-10-09T22:28:53+00:00</published><updated>2020-10-09T22:28:53+00:00</updated><id>https://simonwillison.net/2020/Oct/9/animating-using-puppeteer/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/anishkny/animating-a-commit-based-sudoku-game-using-puppeteer-185f"&gt;Animating a commit based Sudoku game using Puppeteer&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is really clever. There’s a GitHub repo that tracks progress in a game of Sudoku: Anish Karandikar wrote code which iterates through the game board state commit by commit, uses that state to generate an HTML table, passes that table to Puppeteer using a data: URI, renders a PNG of each stage and then concatenates those PNGs together into an animated GIF using the gifencoder Node.js library.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/anishkny/status/1314692705969008640"&gt;@anishkny&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datauri"&gt;datauri&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gifs"&gt;gifs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;&lt;/p&gt;



</summary><category term="datauri"/><category term="gifs"/><category term="puppeteer"/></entry><entry><title>Weeknotes: airtable-export, generating screenshots in GitHub Actions, Dogsheep!</title><link href="https://simonwillison.net/2020/Sep/3/weeknotes-airtable-screenshots-dogsheep/#atom-tag" rel="alternate"/><published>2020-09-03T23:28:29+00:00</published><updated>2020-09-03T23:28:29+00:00</updated><id>https://simonwillison.net/2020/Sep/3/weeknotes-airtable-screenshots-dogsheep/#atom-tag</id><summary type="html">
    &lt;p&gt;This week I figured out how to populate Datasette from Airtable, wrote code to generate social media preview card page screenshots using Puppeteer, and made a big breakthrough with my Dogsheep project.&lt;/p&gt;
&lt;h4 id="weeknotes-2020-09-03-airtable-export"&gt;airtable-export&lt;/h4&gt;
&lt;p&gt;I wrote about &lt;a href="https://www.rockybeaches.com/"&gt;Rocky Beaches&lt;/a&gt; in my weeknotes &lt;a href="https://simonwillison.net/2020/Aug/21/weeknotes-rocky-beaches/"&gt;two weeks ago&lt;/a&gt;. It's a new website built by Natalie Downe that showcases great places to go rockpooling (tidepooling in American English), mixing in tide data from NOAA and species sighting data from iNaturalist.&lt;/p&gt;
&lt;p&gt;Rocky Beaches is powered by Datasette, using a GitHub Actions workflow that builds the site's underlying SQLite database using API calls and YAML data stored in the GitHub repository.&lt;/p&gt;
&lt;p&gt;Natalie wanted to use Airtable to maintain the structured data for the site, rather than hand-editing a YAML file. So I built &lt;a href="https://github.com/simonw/airtable-export"&gt;airtable-export&lt;/a&gt;, a command-line script for sucking down all of the data from an Airtable instance and writing it to disk as YAML or JSON.&lt;/p&gt;
&lt;p&gt;You run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;airtable-export out/ mybaseid table1 table2 --key=key
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will create a folder called &lt;code&gt;out/&lt;/code&gt; with a &lt;code&gt;.yml&lt;/code&gt; file for each of the tables.&lt;/p&gt;
&lt;p&gt;Sadly the Airtable API doesn't yet provide a mechanism to list all of the tables in a database (a &lt;a href="https://community.airtable.com/t/list-tables-given-api-key-and-baseid/1173"&gt;long-running feature request&lt;/a&gt;) so you have to list the tables yourself.&lt;/p&gt;
&lt;p&gt;We're now &lt;a href="https://github.com/natbat/rockybeaches/blob/32a010292e7c1ba47db1a86523a61c666d977074/.github/workflows/deploy.yml#L31-L44"&gt;running that command&lt;/a&gt; as part of the Rocky Beaches build script, and committing the latest version of the YAML file back to the GitHub repo (thus gaining a &lt;a href="https://github.com/natbat/rockybeaches/commits/main/airtable"&gt;full change history&lt;/a&gt; for that data).&lt;/p&gt;
&lt;h4 id="weeknotes-2020-09-03-social-media-cards-tils"&gt;Social media cards for my TILs&lt;/h4&gt;
&lt;p&gt;I really like social media cards - &lt;code&gt;og:image&lt;/code&gt; HTML meta attributes for Facebook and &lt;code&gt;twitter:image&lt;/code&gt; for Twitter. I wanted them for articles on my &lt;a href="https://til.simonwillison.net/"&gt;TIL website&lt;/a&gt; since I often share those via Twitter.&lt;/p&gt;
&lt;p&gt;One catch: my TILs aren't very image heavy. So I decided to generate screenshots of the pages and use those as the 2x1 social media card images.&lt;/p&gt;
&lt;p&gt;The best way I know of programatically generating screenshots is to use &lt;a href="https://developers.google.com/web/tools/puppeteer"&gt;Puppeteer&lt;/a&gt;, a Node.js library for automating a headless instance of the Chrome browser that is maintained by the Chrome DevTools team.&lt;/p&gt;
&lt;p&gt;My first attempt was to run Puppeteer in an AWS Lambda function on &lt;a href="https://vercel.com/"&gt;Vercel&lt;/a&gt;. I remembered seeing an example of how to do this in the Vercel documentation a few years ago. The example isn't there any more, but I found the &lt;a href="https://github.com/vercel/now-examples/pull/207"&gt;original pull request&lt;/a&gt; that introduced it.&lt;/p&gt;
&lt;p&gt;Since the example was MIT licensed I created my own fork at &lt;a href="https://github.com/simonw/puppeteer-screenshot"&gt;simonw/puppeteer-screenshot&lt;/a&gt; and updated it to work with the latest Chrome.&lt;/p&gt;
&lt;p&gt;It's pretty resource intensive, so I also added a secret &lt;code&gt;?key=&lt;/code&gt; mechanism so only my own automation code could call my instance running on Vercel.&lt;/p&gt;
&lt;p&gt;I needed to store the generated screenshots somewhere. They're pretty small - on the order of 60KB each - so I decided to store them in my SQLite database itself and use my &lt;a href="https://github.com/simonw/datasette-media"&gt;datasette-media&lt;/a&gt; plugin (see &lt;a href="https://simonwillison.net/2020/Jul/30/fun-binary-data-and-sqlite/"&gt;Fun with binary data and SQLite&lt;/a&gt;) to serve them up.&lt;/p&gt;
&lt;p&gt;This worked! Until it didn't... I ran into a showstopper bug when I realized that the screenshot process relies on the page being live on the site... but when a new article is added it's not live when the build process works, so the generated screenshot &lt;a href="https://github.com/simonw/til/issues/23"&gt;is of the 404 page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So I reworked it to generate the screenshots inside the GitHub Action as part of the build script, using &lt;a href="https://github.com/JarvusInnovations/puppeteer-cli"&gt;puppeteer-cli&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/til/blob/3fca996228ad54ee433b25840fcd3682e9f7bbfd/generate_screenshots.py"&gt;generate_screenshots.py&lt;/a&gt; script handles this, by first shelling out to &lt;code&gt;datasette --get&lt;/code&gt; to render the HTML for the page, then running &lt;code&gt;puppeteer&lt;/code&gt; to generate the screenshot. Relevant code:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;png_for_path&lt;/span&gt;(&lt;span class="pl-s1"&gt;path&lt;/span&gt;):
    &lt;span class="pl-c"&gt;# Path is e.g. /til/til/python_debug-click-with-pdb.md&lt;/span&gt;
    &lt;span class="pl-s1"&gt;page_html&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;str&lt;/span&gt;(&lt;span class="pl-v"&gt;TMP_PATH&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-s"&gt;"generate-screenshots-page.html"&lt;/span&gt;)
    &lt;span class="pl-c"&gt;# Use datasette to generate HTML&lt;/span&gt;
    &lt;span class="pl-s1"&gt;proc&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;subprocess&lt;/span&gt;.&lt;span class="pl-en"&gt;run&lt;/span&gt;([&lt;span class="pl-s"&gt;"datasette"&lt;/span&gt;, &lt;span class="pl-s"&gt;"."&lt;/span&gt;, &lt;span class="pl-s"&gt;"--get"&lt;/span&gt;, &lt;span class="pl-s1"&gt;path&lt;/span&gt;], &lt;span class="pl-s1"&gt;capture_output&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;)
    &lt;span class="pl-en"&gt;open&lt;/span&gt;(&lt;span class="pl-s1"&gt;page_html&lt;/span&gt;, &lt;span class="pl-s"&gt;"wb"&lt;/span&gt;).&lt;span class="pl-en"&gt;write&lt;/span&gt;(&lt;span class="pl-s1"&gt;proc&lt;/span&gt;.&lt;span class="pl-s1"&gt;stdout&lt;/span&gt;)
    &lt;span class="pl-c"&gt;# Now use puppeteer screenshot to generate a PNG&lt;/span&gt;
    &lt;span class="pl-s1"&gt;proc2&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;subprocess&lt;/span&gt;.&lt;span class="pl-en"&gt;run&lt;/span&gt;(
        [
            &lt;span class="pl-s"&gt;"puppeteer"&lt;/span&gt;,
            &lt;span class="pl-s"&gt;"screenshot"&lt;/span&gt;,
            &lt;span class="pl-s1"&gt;page_html&lt;/span&gt;,
            &lt;span class="pl-s"&gt;"--viewport"&lt;/span&gt;,
            &lt;span class="pl-s"&gt;"800x400"&lt;/span&gt;,
            &lt;span class="pl-s"&gt;"--full-page=false"&lt;/span&gt;,
        ],
        &lt;span class="pl-s1"&gt;capture_output&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;,
    )
    &lt;span class="pl-s1"&gt;png_bytes&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;proc2&lt;/span&gt;.&lt;span class="pl-s1"&gt;stdout&lt;/span&gt;
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;png_bytes&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;This worked great! Except for one thing... the site is hosted on Vercel, and Vercel has a 5MB &lt;a href="https://vercel.com/docs/platform/limits#serverless-function-payload-size-limit"&gt;response size limit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Every time my GitHub build script runs it downloads the previous SQLite database file, so it can avoid regenerating screenshots and HTML for pages that haven't changed.&lt;/p&gt;
&lt;p&gt;The addition of the binary screenshots drove the size of the SQLite database over 5MB, so the part of my script that retrieved the previous database &lt;a href="https://github.com/simonw/til/issues/25"&gt;no longer worked&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I needed a reliable way to store that 5MB (and probably eventually 10-50MB) database file in between runs of my action.&lt;/p&gt;
&lt;p&gt;The best place to put this would be an S3 bucket, but I find the process of setting up IAM permissions for access to a new bucket so infuriating that I couldn't bring myself to do it.&lt;/p&gt;
&lt;p&gt;So... I created a new dedicated GitHub repository, &lt;a href="https://github.com/simonw/til-db"&gt;simonw/til-db&lt;/a&gt;, and updated my action to store the binary file in that repo - using &lt;a href="https://github.com/simonw/til/blob/1e29c3fe5e90c29b0e71d87dba805484ceb4393c/.github/workflows/build.yml#L80-L86"&gt;a force push&lt;/a&gt; so the repo doesn't need to maintain unnecessary version history of the binary asset.&lt;/p&gt;
&lt;p&gt;This is an abomination of a hack, and it made me cackle a lot. I &lt;a href="https://twitter.com/simonw/status/1301029346614718465"&gt;tweeted about it&lt;/a&gt; and got the suggestion to try &lt;a href="https://git-lfs.github.com/"&gt;Git LFS&lt;/a&gt; instead, which would definitely be a more appropriate way to solve this problem.&lt;/p&gt;
&lt;h4 id="weeknotes-2020-09-03-rendering-markdown"&gt;Rendering Markdown&lt;/h4&gt;
&lt;p&gt;I write my blog entries in Markdown and transform them into HTML before I post them on my blog. Some day I'll teach my blog to render Markdown itself, but so far I've got by through copying and pasting into Markdown tools.&lt;/p&gt;
&lt;p&gt;My favourite Markdown flavour is GitHub's, which adds a bunch of useful capabilities - most notably the ability to apply syntax highlighting. GitHub &lt;a href="https://docs.github.com/en/rest/reference/markdown"&gt;expose an API&lt;/a&gt; that applies their Markdown formatter and returns the resulting HTML.&lt;/p&gt;
&lt;p&gt;I built myself &lt;a href="https://til.simonwillison.net/tools/render-markdown"&gt;a quick and scrappy tool&lt;/a&gt; in JavaScript that sends Markdown through their API and then applies a few DOM manipulations to clean up what comes back. It was a nice opportunity to write some modern vanilla JavaScript using &lt;code&gt;fetch()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-en"&gt;render&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;markdown&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'https://api.github.com/markdown'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
        &lt;span class="pl-c1"&gt;method&lt;/span&gt;: &lt;span class="pl-s"&gt;'POST'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
        &lt;span class="pl-c1"&gt;headers&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
            &lt;span class="pl-s"&gt;'Content-Type'&lt;/span&gt;: &lt;span class="pl-s"&gt;'application/json'&lt;/span&gt;
        &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
        &lt;span class="pl-c1"&gt;body&lt;/span&gt;: &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;stringify&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s"&gt;'mode'&lt;/span&gt;: &lt;span class="pl-s"&gt;'markdown'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'text'&lt;/span&gt;: &lt;span class="pl-s1"&gt;markdown&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;

&lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;button&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getElementsByTagName&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'button'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;0&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;output&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getElementById&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'output'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;preview&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getElementById&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'preview'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

&lt;span class="pl-s1"&gt;button&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addEventListener&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'click'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;function&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;rendered&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;render&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;input&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;value&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-s1"&gt;output&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;value&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;rendered&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-s1"&gt;preview&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;rendered&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="weeknotes-2020-09-03-dogsheep-beta"&gt;Dogsheep Beta&lt;/h4&gt;
&lt;p&gt;My most exciting project this week was getting out the first working version of &lt;a href="https://github.com/dogsheep/beta"&gt;Dogsheep Beta&lt;/a&gt; - the search engine that ties together results from my &lt;a href="https://dogsheep.github.io/"&gt;Dogsheep&lt;/a&gt; family of tools for personal analytics.&lt;/p&gt;
&lt;p&gt;I'm giving a talk about this tonight at PyCon Australia: &lt;a href="https://2020.pycon.org.au/program/73uk8x/"&gt;Build your own data warehouse for personal analytics with SQLite and Datasette&lt;/a&gt;. I'll be writing up detailed notes in the next few days, so watch this space.&lt;/p&gt;
&lt;h4 id="weeknotes-2020-09-03-til-this-week"&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/jq_reformatting-airtable-json.md"&gt;Converting Airtable JSON for use with sqlite-utils using jq&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/javascript_minifying-uglify-npx.md"&gt;Minifying JavaScript with npx uglify-js&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/pytest_subprocess-server.md"&gt;Start a server in a subprocess during a pytest session&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/bash_loop-over-csv.md"&gt;Looping over comma-separated values in Bash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/cloudrun_gcloud-run-services-list.md"&gt;Using the gcloud run services list command&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/python_debug-click-with-pdb.md"&gt;Debugging a Click application using pdb&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-2020-09-03-releases-this-week"&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.4.1"&gt;dogsheep-beta 0.4.1&lt;/a&gt; - 2020-09-03&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.4"&gt;dogsheep-beta 0.4&lt;/a&gt; - 2020-09-03&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.4a1"&gt;dogsheep-beta 0.4a1&lt;/a&gt; - 2020-09-03&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.4a0"&gt;dogsheep-beta 0.4a0&lt;/a&gt; - 2020-09-03&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.3"&gt;dogsheep-beta 0.3&lt;/a&gt; - 2020-09-02&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.2"&gt;dogsheep-beta 0.2&lt;/a&gt; - 2020-09-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.1"&gt;dogsheep-beta 0.1&lt;/a&gt; - 2020-09-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.1a2"&gt;dogsheep-beta 0.1a2&lt;/a&gt; - 2020-09-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dogsheep/dogsheep-beta/releases/tag/0.1a"&gt;dogsheep-beta 0.1a&lt;/a&gt; - 2020-09-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.4"&gt;airtable-export 0.4&lt;/a&gt; - 2020-08-30&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-yaml/releases/tag/0.1a"&gt;datasette-yaml 0.1a&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.3.1"&gt;airtable-export 0.3.1&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.3"&gt;airtable-export 0.3&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.2"&gt;airtable-export 0.2&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.1.1"&gt;airtable-export 0.1.1&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/airtable-export/releases/tag/0.1"&gt;airtable-export 0.1&lt;/a&gt; - 2020-08-29&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/releases/tag/0.49a0"&gt;datasette 0.49a0&lt;/a&gt; - 2020-08-28&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/2.16.1"&gt;sqlite-utils 2.16.1&lt;/a&gt; - 2020-08-28&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/yaml"&gt;yaml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dogsheep"&gt;dogsheep&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/airtable"&gt;airtable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="yaml"/><category term="markdown"/><category term="dogsheep"/><category term="weeknotes"/><category term="github-actions"/><category term="airtable"/><category term="puppeteer"/></entry><entry><title>html-to-svg</title><link href="https://simonwillison.net/2020/May/7/html-svg/#atom-tag" rel="alternate"/><published>2020-05-07T06:01:44+00:00</published><updated>2020-05-07T06:01:44+00:00</updated><id>https://simonwillison.net/2020/May/7/html-svg/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/as-a-service/html-to-svg"&gt;html-to-svg&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
‪This is absolutely ingenious: 50 lines of JavaScript which uses Puppeteer to get headless Chrome to grab a PDF screenshot of a page, then shells out to Inkscape to convert the PDF to SVG. Wraps the whole thing up in a Docker container and ships it to Cloud Run as a web service you can call by passing it a URL.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/steren/status/1258273345843290118"&gt;@steren&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/chrome"&gt;chrome&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/svg"&gt;svg&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudrun"&gt;cloudrun&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;&lt;/p&gt;



</summary><category term="chrome"/><category term="svg"/><category term="cloudrun"/><category term="puppeteer"/></entry></feed>