<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: How I blog</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/series/blogging.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-02-20T23:47:10+00:00</updated><author><name>Simon Willison</name></author><entry><title>Adding TILs, releases, museums, tools and research to my blog</title><link href="https://simonwillison.net/2026/Feb/20/beats/#atom-series" rel="alternate"/><published>2026-02-20T23:47:10+00:00</published><updated>2026-02-20T23:47:10+00:00</updated><id>https://simonwillison.net/2026/Feb/20/beats/#atom-series</id><summary type="html">
    &lt;p&gt;I've been wanting to add indications of my various other online activities to my blog for a while now. I just turned on a new feature I'm calling "beats" (after story beats, naming this was hard!) which adds five new types of content to my site, all corresponding to activity elsewhere.&lt;/p&gt;
&lt;p&gt;Here's what beats look like:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/three-beats.jpg" alt="Screenshot of a fragment of a page showing three entries from 30th Dec 2025. First: [RELEASE] &amp;quot;datasette-turnstile 0.1a0 — Configurable CAPTCHAs for Datasette paths usin…&amp;quot; at 7:23 pm. Second: [TOOL] &amp;quot;Software Heritage Repository Retriever — Download archived Git repositories f…&amp;quot; at 11:41 pm. Third: [TIL] &amp;quot;Downloading archived Git repositories from archive.softwareheritage.org — …&amp;quot; at 11:43 pm." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Those three are from &lt;a href="https://simonwillison.net/2025/Dec/30/"&gt;the 30th December 2025&lt;/a&gt; archive page.&lt;/p&gt;
&lt;p&gt;Beats are little inline links with badges that fit into different content timeline views around my site, including the homepage, search and archive pages.&lt;/p&gt;
&lt;p&gt;There are currently five types of beats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/elsewhere/release/"&gt;Releases&lt;/a&gt; are GitHub releases of my many different open source projects, imported from &lt;a href="https://github.com/simonw/simonw/blob/main/releases_cache.json"&gt;this JSON file&lt;/a&gt; that was constructed &lt;a href="https://simonwillison.net/2020/Jul/10/self-updating-profile-readme/"&gt;by GitHub Actions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/elsewhere/til/"&gt;TILs&lt;/a&gt; are the posts from my &lt;a href="https://til.simonwillison.net/"&gt;TIL blog&lt;/a&gt;, imported using &lt;a href="https://github.com/simonw/simonwillisonblog/blob/f883b92be23892d082de39dbada571e406f5cfbf/blog/views.py#L1169"&gt;a SQL query over JSON and HTTP&lt;/a&gt; against the Datasette instance powering that site.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/elsewhere/museum/"&gt;Museums&lt;/a&gt; are new posts on my &lt;a href="https://www.niche-museums.com/"&gt;niche-museums.com&lt;/a&gt; blog, imported from &lt;a href="https://github.com/simonw/museums/blob/909bef71cc8d336bf4ac1f13574db67a6e1b3166/plugins/export.py"&gt;this custom JSON feed&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/elsewhere/tool/"&gt;Tools&lt;/a&gt; are HTML and JavaScript tools I've vibe-coded on my &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; site, as described in &lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/"&gt;Useful patterns for building HTML tools&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/elsewhere/research/"&gt;Research&lt;/a&gt; is for AI-generated research projects, hosted in my &lt;a href="https://github.com/simonw/research"&gt;simonw/research repo&lt;/a&gt; and described in &lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/"&gt;Code research projects with async coding agents like Claude Code and Codex&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's five different custom integrations to pull in all of that data. The good news is that this kind of integration project is the kind of thing that coding agents &lt;em&gt;really&lt;/em&gt; excel at. I knocked most of the feature out in a single morning while working in parallel on various other things.&lt;/p&gt;
&lt;p&gt;I didn't have a useful structured feed of my Research projects, and it didn't matter because I gave Claude Code a link to &lt;a href="https://raw.githubusercontent.com/simonw/research/refs/heads/main/README.md"&gt;the raw Markdown README&lt;/a&gt; that lists them all and it &lt;a href="https://github.com/simonw/simonwillisonblog/blob/f883b92be23892d082de39dbada571e406f5cfbf/blog/importers.py#L77-L80"&gt;spun up a parser regex&lt;/a&gt;. Since I'm responsible for both the source and the destination I'm fine with a brittle solution that would be too risky against a source that I don't control myself.&lt;/p&gt;
&lt;p&gt;Claude also handled all of the potentially tedious UI integration work with my site, making sure the new content worked on all of my different page types and was handled correctly by my &lt;a href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/"&gt;faceted search engine&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="prototyping-with-claude-artifacts"&gt;Prototyping with Claude Artifacts&lt;/h4&gt;
&lt;p&gt;I actually prototyped the initial concept for beats in regular Claude - not Claude Code - taking advantage of the fact that it can clone public repos from GitHub these days. I started with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Clone simonw/simonwillisonblog and tell me about the models and views&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then later in the brainstorming session said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;use the templates and CSS in this repo to create a new artifact with all HTML and CSS inline that shows me my homepage with some of those inline content types mixed in&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After some iteration we got to &lt;a href="https://gisthost.github.io/?c3f443cc4451cf8ce03a2715a43581a4/preview.html"&gt;this artifact mockup&lt;/a&gt;, which was enough to convince me that the concept had legs and was worth handing over to full &lt;a href="https://code.claude.com/docs/en/claude-code-on-the-web"&gt;Claude Code for web&lt;/a&gt; to implement.&lt;/p&gt;
&lt;p&gt;If you want to see how the rest of the build played out the most interesting PRs are &lt;a href="https://github.com/simonw/simonwillisonblog/pull/592"&gt;Beats #592&lt;/a&gt; which implemented the core feature and &lt;a href="https://github.com/simonw/simonwillisonblog/pull/595/changes"&gt;Add Museums Beat importer #595&lt;/a&gt; which added the Museums content type.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/museums"&gt;museums&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/til"&gt;til&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-artifacts"&gt;claude-artifacts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="museums"/><category term="ai"/><category term="til"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude-artifacts"/><category term="claude-code"/><category term="site-upgrades"/></entry><entry><title>How I automate my Substack newsletter with content from my blog</title><link href="https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-series" rel="alternate"/><published>2025-11-19T22:00:34+00:00</published><updated>2025-11-19T22:00:34+00:00</updated><id>https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-series</id><summary type="html">
    &lt;p&gt;I sent out &lt;a href="https://simonw.substack.com/p/trying-out-gemini-3-pro-with-audio"&gt;my weekly-ish Substack newsletter&lt;/a&gt; this morning and took the opportunity to record &lt;a href="https://www.youtube.com/watch?v=BoPZltKDM-s"&gt;a YouTube video&lt;/a&gt; demonstrating my process and describing the different components that make it work. There's a &lt;em&gt;lot&lt;/em&gt; of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="BoPZltKDM-s" js-api="js-api"
  title="How I automate my Substack newsletter with content from my blog"
  playlabel="Play: How I automate my Substack newsletter with content from my blog"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;p&gt;The core process is the same as I described &lt;a href="https://simonwillison.net/2023/Apr/4/substack-observable/"&gt;back in 2023&lt;/a&gt;. I have an Observable notebook called &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;blog-to-newsletter&lt;/a&gt; which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/copy-to-newsletter.jpg" alt="Screenshot of the interface. An item in a list says 9080: Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark. A huge button reads Copy rich text newsletter to clipboard - below is a smaller button that says Copy just the links/quotes/TILs. A Last X days slider is set to 2. There are checkboxes for SKip content sent in prior newsletters and only include post content prior to the cutoff comment." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.&lt;/p&gt;
&lt;p&gt;I make very minor edits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.&lt;/li&gt;
&lt;li&gt;Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.&lt;/li&gt;
&lt;li&gt;Blocks of preformatted text often have an extra blank line at the end, which I remove.&lt;/li&gt;
&lt;li&gt;Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.&lt;/li&gt;
&lt;li&gt;I pick the featured image for the newsletter and add some tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's the whole process!&lt;/p&gt;
&lt;h4 id="the-observable-notebook"&gt;The Observable notebook&lt;/h4&gt;
&lt;p&gt;The most important cell in the Observable notebook is this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;raw_content&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c1"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;await&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
      &lt;span class="pl-s"&gt;`https://datasette.simonwillison.net/simonwillisonblog.json?sql=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-en"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;        &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;      &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&amp;amp;_shape=array&amp;amp;numdays=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;numDays&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;
    &lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;json&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses the JavaScript &lt;code&gt;fetch()&lt;/code&gt; function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.&lt;/p&gt;
&lt;p&gt;Here's a link to &lt;a href="https://datasette.simonwillison.net/simonwillisonblog?sql=with+content+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27entry%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Ch3%3E%3Ca+href%3D%22%27+%7C%7C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27+%7C%7C+%27%22%3E%27+%0D%0A++++++%7C%7C+title+%7C%7C+%27%3C%2Fa%3E+-+%27+%7C%7C+date%28created%29+%7C%7C+%27%3C%2Fh3%3E%27+%7C%7C+body%0D%0A++++++as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_entry%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27blogmark%27+as+type%2C%0D%0A++++link_title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ELink%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+link_url+%7C%7C+%27%22%3E%27%0D%0A++++++%7C%7C+link_title+%7C%7C+%27%3C%2Fa%3E%3A%3C%2Fp%3E%3Cp%3E%27+%7C%7C+%27+%27+%7C%7C+replace%28commentary%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%27%3C%2Fp%3E%27%0D%0A++++++as+html%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+link_url%2C%0D%0A++++++%27link_title%27%2C+link_title%2C%0D%0A++++++%27commentary%27%2C+commentary%2C%0D%0A++++++%27use_markdown%27%2C+use_markdown%0D%0A++++%29+as+json%2C%0D%0A++link_url+as+external_url%0D%0A++from+blog_blogmark%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27quotation%27+as+type%2C%0D%0A++++source%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cstrong%3Equote%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C%0D%0A++++%27%3Cblockquote%3E%3Cp%3E%3Cem%3E%27+%7C%7C%0D%0A++++replace%28quotation%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%0D%0A++++%27%3C%2Fem%3E%3C%2Fp%3E%3C%2Fblockquote%3E%3Cp%3E%3Ca+href%3D%22%27+%7C%7C%0D%0A++++coalesce%28source_url%2C+%27%23%27%29+%7C%7C+%27%22%3E%27+%7C%7C+source+%7C%7C+%27%3C%2Fa%3E%27+%7C%7C%0D%0A++++case+%0D%0A++++++++when+nullif%28trim%28context%29%2C+%27%27%29+is+not+null+%0D%0A++++++++then+%27%2C+%27+%7C%7C+context+%0D%0A++++++++else+%27%27+%0D%0A++++end+%7C%7C%0D%0A++++%27%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++source_url+as+external_url%0D%0A++from+blog_quotation%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27note%27+as+type%2C%0D%0A++++case%0D%0A++++++when+title+is+not+null+and+title+%3C%3E+%27%27+then+title%0D%0A++++++else+%27Note+on+%27+%7C%7C+date%28created%29%0D%0A++++end%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27No+HTML%27%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%2C%0D%0A++++++%27link_title%27%2C+%27%27%2C%0D%0A++++++%27commentary%27%2C+body%2C%0D%0A++++++%27use_markdown%27%2C+1%0D%0A++++%29%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_note%0D%0A++union+all%0D%0A++select%0D%0A++++rowid%2C%0D%0A++++%27til%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++%27null%27+as+slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ETIL%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%22%3E%27+%7C%7C+title+%7C%7C+%27%3C%2Fa%3E%3A%27+%7C%7C+%27+%27+%7C%7C+substr%28html%2C+1%2C+instr%28html%2C+%27%3C%2Fp%3E%27%29+-+1%29+%7C%7C+%27+%26%238230%3B%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+as+external_url%0D%0A++from+til%0D%0A%29%2C%0D%0Acollected+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++type%2C%0D%0A++++title%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27til%27%0D%0A++++++then+external_url%0D%0A++++++else+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%7C%7C+%0D%0A++++++%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%0D%0A++++++end+as+url%2C%0D%0A++++created%2C%0D%0A++++html%2C%0D%0A++++json%2C%0D%0A++++external_url%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27entry%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_entry_tags+on+blog_tag.id+%3D+blog_entry_tags.tag_id%0D%0A++++++++where+blog_entry_tags.entry_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27blogmark%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_blogmark_tags+on+blog_tag.id+%3D+blog_blogmark_tags.tag_id%0D%0A++++++++where+blog_blogmark_tags.blogmark_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27quotation%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_quotation_tags+on+blog_tag.id+%3D+blog_quotation_tags.tag_id%0D%0A++++++++where+blog_quotation_tags.quotation_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++else+%27%5B%5D%27%0D%0A++++end+as+tags%0D%0A++from+content%0D%0A++where+created+%3E%3D+date%28%27now%27%2C+%27-%27+%7C%7C+%3Anumdays+%7C%7C+%27+days%27%29+++%0D%0A++order+by+created+desc%0D%0A%29%0D%0Aselect+id%2C+type%2C+title%2C+url%2C+created%2C+html%2C+json%2C+external_url%2C+tags%0D%0Afrom+collected+%0D%0Aorder+by+%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+0+%0D%0A++++else+1+%0D%0A++end%2C%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+created+%0D%0A++++else+-strftime%28%27%25s%27%2C+created%29+%0D%0A++end+desc%3B&amp;amp;numdays=7"&gt;see and execute that query&lt;/a&gt; directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;with content &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    id,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type,
    title,
    created,
    slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;h3&amp;gt;&amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
      &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;) 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; title &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt; - &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/h3&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; body
      &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;null&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; json,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_entry
  &lt;span class="pl-k"&gt;union all&lt;/span&gt;
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ...&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My blog's URLs look like &lt;code&gt;/2025/Nov/18/gemini-3/&lt;/code&gt; - this SQL constructs that three letter month abbreviation from the month number using a substring operation.&lt;/p&gt;
&lt;p&gt;This is a &lt;em&gt;terrible&lt;/em&gt; way to assemble HTML, but I've stuck with it because it amuses me.&lt;/p&gt;
&lt;p&gt;The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.&lt;/p&gt;
&lt;p&gt;Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Object&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;assign&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-en"&gt;html&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;button&lt;/span&gt; &lt;span class="pl-c1"&gt;style&lt;/span&gt;="&lt;span class="pl-s"&gt;font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;Copy rich text newsletter to clipboard`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-en"&gt;onclick&lt;/span&gt;: &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;newsletterHTML&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Create a temporary element to hold the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createElement&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"div"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;appendChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Select the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;range&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;selectNode&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Copy the selected HTML content to the clipboard&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;selection&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;window&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getSelection&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;execCommand&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"copy"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="from-django-postgresql-to-datasette-sqlite"&gt;From Django+Postgresql to Datasette+SQLite&lt;/h4&gt;
&lt;p&gt;My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;the source code for that Django application&lt;/a&gt;. I use the Django admin as my CMS.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.&lt;/p&gt;
&lt;p&gt;My system for doing that lives in the &lt;a href="https://github.com/simonw/simonwillisonblog-backup"&gt;simonw/simonwillisonblog-backup&lt;/a&gt; GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/db-to-sqlite"&gt;db-to-sqlite&lt;/a&gt; tool is responsible for that conversion. I call it &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L44-L62"&gt;like this&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;db-to-sqlite \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;heroku config:get DATABASE_URL -a simonwillisonblog &lt;span class="pl-k"&gt;|&lt;/span&gt; sed s/postgres:/postgresql+psycopg2:/&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt; \
  simonwillisonblog.db \
  --table auth_permission \
  --table auth_user \
  --table blog_blogmark \
  --table blog_blogmark_tags \
  --table blog_entry \
  --table blog_entry_tags \
  --table blog_quotation \
  --table blog_quotation_tags \
  --table blog_note \
  --table blog_note_tags \
  --table blog_tag \
  --table blog_previoustagname \
  --table blog_series \
  --table django_content_type \
  --table redirects_redirect&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;heroku config:get DATABASE_URL&lt;/code&gt; command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;db-to-sqlite&lt;/code&gt; can then export that data and write it to a SQLite database file called &lt;code&gt;simonwillisonblog.db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--table&lt;/code&gt; options specify the tables that should be included in the export.&lt;/p&gt;
&lt;p&gt;The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a &lt;a href="https://github.com/simonw/simonwillisonblog-backup/commits/main/simonwillisonblog"&gt;commit history&lt;/a&gt; of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.&lt;/p&gt;
&lt;p&gt;At the &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L200-L204"&gt;end of my GitHub Actions workflow&lt;/a&gt; is this code that publishes the resulting database to Datasette running on &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; using the &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette publish fly&lt;/a&gt; plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly simonwillisonblog.db \
  -m metadata.yml \
  --app simonwillisonblog-backup \
  --branch 1.0a2 \
  --extra-options &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --install datasette-block-robots \
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ... more plugins&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/newsletter"&gt;newsletter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/substack"&gt;substack&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="django"/><category term="javascript"/><category term="postgresql"/><category term="sql"/><category term="sqlite"/><category term="youtube"/><category term="heroku"/><category term="datasette"/><category term="observable"/><category term="github-actions"/><category term="fly"/><category term="newsletter"/><category term="substack"/><category term="site-upgrades"/></entry><entry><title>My approach to running a link blog</title><link href="https://simonwillison.net/2024/Dec/22/link-blog/#atom-series" rel="alternate"/><published>2024-12-22T18:37:16+00:00</published><updated>2024-12-22T18:37:16+00:00</updated><id>https://simonwillison.net/2024/Dec/22/link-blog/#atom-series</id><summary type="html">
    &lt;p&gt;I started running a basic link blog on this domain &lt;a href="https://simonwillison.net/2003/Nov/24/blogmarks/"&gt;back in November 2003&lt;/a&gt; - publishing links (which I called "blogmarks") with a title, URL, short snippet of commentary and a "via" link where appropriate.&lt;/p&gt;
&lt;p&gt;So far I've published &lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;7,607 link blog posts&lt;/a&gt; and counting.&lt;/p&gt;
&lt;p&gt;In April of this year I finally &lt;a href="https://simonwillison.net/2024/Apr/25/blogmarks-that-use-markdown/"&gt;upgraded my link blog to support Markdown&lt;/a&gt;, allowing me to expand my link blog into something with a lot more room.&lt;/p&gt;
&lt;p&gt;The way I use my link blog has evolved substantially in the eight months since then. I'm going to describe the informal set of guidelines I've set myself for how I link blog, in the hope that it might encourage other people to give this a try themselves.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Dec/22/link-blog/#writing-about-things-i-ve-found"&gt;Writing about things I've found&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Dec/22/link-blog/#trying-to-add-something-extra"&gt;Trying to add something extra&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Dec/22/link-blog/#the-technology"&gt;The technology&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Dec/22/link-blog/#more-people-should-do-this"&gt;More people should do this&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="writing-about-things-i-ve-found"&gt;Writing about things I've found&lt;/h4&gt;
&lt;p&gt;Back in November 2022 I wrote &lt;a href="https://simonwillison.net/2022/Nov/6/what-to-blog-about/"&gt;What to blog about&lt;/a&gt;, which started with this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You should start a blog. Having your own little corner of the internet is good for the soul!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The point of that article was to emphasize that blogging doesn't have to be about unique insights. The value is in writing frequently and having something to show for it over time - worthwhile even if you don't attract much of an audience (or any audience at all).&lt;/p&gt;
&lt;p&gt;In that article I proposed two categories of content that are low stakes and high value: &lt;strong&gt;things I learned&lt;/strong&gt; and &lt;strong&gt;descriptions of my projects&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I realize now that link blogging deserves to be included a third category of low stakes, high value writing. We could think of that category as &lt;strong&gt;things I've found&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;That's the purpose of my link blog: it's an ongoing log of things I've found - effectively a combination of public bookmarks and my own thoughts and commentary on why those things are interesting.&lt;/p&gt;
&lt;h4 id="trying-to-add-something-extra"&gt;Trying to add something extra&lt;/h4&gt;
&lt;p&gt;When I first started link blogging I would often post a link with a one sentence summary of the linked content, and maybe a tiny piece of opinionated commentary.&lt;/p&gt;
&lt;p&gt;After I upgraded my link blog to support additional markup (links, images, quotations) I decided to be more ambitious. Here are some of the things I try to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I always include &lt;strong&gt;the names of the people&lt;/strong&gt; who created the content I am linking to, if I can figure that out. Credit is really important, and it's also useful for myself because I can later search for someone's name and find other interesting things they have created that I linked to in the past. If I've linked to someone's work three or more times I also try to notice and upgrade them to &lt;a href="https://simonwillison.net/tags/"&gt;a dedicated tag&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;I try to &lt;strong&gt;add something extra&lt;/strong&gt;. My goal with any link blog post is that if you read both my post and the source material you'll have an enhanced experience over if you read just the source material itself.
&lt;ul&gt;
&lt;li&gt;Ideally I'd like you to take something useful away even if you don't follow the link itself. This can be a slightly tricky balance: I don't want to steal attention from the authors and plagiarize their message. Generally I'll try to find some key idea that's worth emphasizing. Slightly cynically, I may try to capture that idea as backup against the original source vanishing from the internet. Link rot is real!&lt;/li&gt;
&lt;li&gt;My most basic version of this is trying to provide context as to why I think this particular thing is worth reading - especially important for longer content. A good recent example is my post about Anthropic's &lt;a href="https://simonwillison.net/2024/Dec/20/building-effective-agents/"&gt;Building effective agents&lt;/a&gt; essay the other day.&lt;/li&gt;
&lt;li&gt;I might tie it together to other similar concepts, including things I've written about in the past, for example linking &lt;a href="https://simonwillison.net/2024/Aug/14/prompt-caching-with-claude/"&gt;Prompt caching with Claude&lt;/a&gt; to my coverage of &lt;a href="https://simonwillison.net/2024/May/14/context-caching-for-google-gemini/"&gt;Context caching for Google Gemini&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If part of the material is a video, I might &lt;strong&gt;quote a snippet of the transcript&lt;/strong&gt; (often extracted using MacWhisper) like I did in &lt;a href="https://simonwillison.net/2024/Dec/12/clio/"&gt;this post about Anthropic's Clio&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A lot of stuff I link to involves programming. I'll often include a &lt;strong&gt;direct link to relevant code&lt;/strong&gt;, using the GitHub feature where I can link to a snippet as-of a particular commit. One example is the &lt;a href="https://simonwillison.net/2024/Oct/5/uv-with-github-actions-to-run-an-rss-to-readme-project/"&gt;fetch-rss.py link in this post&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;I'm liberal with &lt;strong&gt;quotations&lt;/strong&gt;. Finding and quoting a paragraph that captures the key theme of a post is a very quick and effective way to summarize it and help people decide if it's worth reading the whole thing. My post on &lt;a href="https://simonwillison.net/2024/Dec/20/openai-o3-breakthrough/"&gt;François Chollet's o3 ARC-AGI analysis&lt;/a&gt; is an example of that.&lt;/li&gt;
&lt;li&gt;If the original author reads my post, I want them to &lt;strong&gt;feel good about it&lt;/strong&gt;. I know from my own experience that often when you publish something online the silence can be deafening. Knowing that someone else read, appreciated, understood and then shared your work can be very pleasant.&lt;/li&gt;
&lt;li&gt;A slightly self-involved concern I have is that I like to &lt;strong&gt;prove that I've read it&lt;/strong&gt;. This is more for me than for anyone else: I don't like to recommend something if I've not read that thing myself, and sticking in a detail that shows I read past the first paragraph helps keep me honest about that.&lt;/li&gt;
&lt;li&gt;I've started leaning more into &lt;strong&gt;screenshots&lt;/strong&gt; and even short video or audio clips. A screenshot can be considered a visual quotation - I'll sometimes snap these from interesting frames in a YouTube video or live demo associated with the content I'm linking to. I used a screenshot of the Clay debugger in &lt;a href="https://simonwillison.net/2024/Dec/21/clay-ui-library/"&gt;my post about Clay&lt;/a&gt;.&lt;/li&gt;
&lt;p style="margin-top: 0.5em"&gt;There are a lot of great link blogs out there, but the one that has influenced me the most in how I approach my own is John Gruber's &lt;a href="https://daringfireball.net/"&gt;Daring Fireball&lt;/a&gt;. I really like the way he mixes commentary, quotations and value-added relevant information.&lt;/p&gt;
&lt;/ul&gt;
&lt;h4 id="the-technology"&gt;The technology&lt;/h4&gt;
&lt;p&gt;The technology behind my link blog is probably the least interesting thing about it. It's part of my &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;simonwillisonblog&lt;/a&gt; Django application - the main model is called &lt;a href="https://github.com/simonw/simonwillisonblog/blob/c781a1a42ab0a0237f75c7790f069bacc2d70d3f/blog/models.py#L328-L337"&gt;Blogmark&lt;/a&gt; and it inherits from a &lt;a href="https://github.com/simonw/simonwillisonblog/blob/c781a1a42ab0a0237f75c7790f069bacc2d70d3f/blog/models.py#L172-L203"&gt;BaseModel&lt;/a&gt; defining things like tags and draft modes that are shared across my other types of content (entries and quotations).&lt;/p&gt;
&lt;p&gt;I use the Django Admin to create and edit entries, &lt;a href="https://github.com/simonw/simonwillisonblog/blob/c781a1a42ab0a0237f75c7790f069bacc2d70d3f/blog/admin.py#L73-L76"&gt;configured here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The most cumbersome part of link blogging for me right now is images. I convert these into smaller JPEGs using a &lt;a href="https://tools.simonwillison.net/image-resize-quality"&gt;tiny custom tool&lt;/a&gt; I built (&lt;a href="https://gist.github.com/simonw/58a06a8028515999e5949a0166cd4c4f"&gt;with Claude&lt;/a&gt;), then upload them to my &lt;code&gt;static.simonwillison.net&lt;/code&gt; S3 bucket using Transmit and drop them into my posts using a Markdown image reference. I generate a first draft of the alt text using a Claude Project with &lt;a href="https://gist.github.com/simonw/1fa7e4e3dcb18fdeca2b3d6ac2c6c628"&gt;these custom instructions&lt;/a&gt;, then usually make a few changes  before including that in the markup. At some point I'll wire together a UI that makes this process a little smoother.&lt;/p&gt;
&lt;p&gt;That &lt;code&gt;static.simonwillison.net&lt;/code&gt; bucket is then served via Cloudflare's free tier, which means I effectively never have to think about the cost of serving up those image files.&lt;/p&gt;
&lt;p&gt;I wrote up a TIL about &lt;a href="https://til.simonwillison.net/django/building-a-blog-in-django"&gt;Building a blog in Django&lt;/a&gt; a while ago which describes a similar setup to the one I'm using for my link blog, including how the RSS feed works (using &lt;a href="https://docs.djangoproject.com/en/4.2/ref/contrib/syndication/"&gt;Django's syndication framework&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The most technically interesting component is my &lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;search feature&lt;/a&gt;. I wrote about how that works in &lt;a href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/"&gt;Implementing faceted search with Django and PostgreSQL&lt;/a&gt; - the most recent code for that can be found in &lt;a href="https://github.com/simonw/simonwillisonblog/blob/main/blog/search.py"&gt;blog/search.py&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;p&gt;One of the most useful small enhancements I added was &lt;a href="https://github.com/simonw/simonwillisonblog/issues/488"&gt;draft mode&lt;/a&gt;, which lets me assign a URL to an item and preview it in my browser without publishing it to the world. This really helps when I am editing posts on my mobile phone as it gives me a reliable preview so I can check for any markup mistakes.&lt;/p&gt;
&lt;p&gt;I also send out an approximately weekly &lt;a href="https://simonw.substack.com/"&gt;email newsletter&lt;/a&gt; version of my blog, for people who want to subscribe in their inbox. This is a straight copy of content from my blog - Substack doesn't have an API for this but their editor does accept copy and paste, so I have a delightful digital duct tape solution for assembling the newsletter which I described in &lt;a href="https://simonwillison.net/2023/Apr/4/substack-observable/"&gt;Semi-automating a Substack newsletter with an Observable notebook&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="more-people-should-do-this"&gt;More people should do this&lt;/h4&gt;
&lt;p&gt;I posted this on Bluesky &lt;a href="https://bsky.app/profile/simonwillison.net/post/3ldu6jywnos2j"&gt;last night&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I wish people would post more links to interesting things&lt;/p&gt;
&lt;p&gt;I feel like Twitter and LinkedIn and Instagram and TikTok have pushed a lot of people out of the habit of doing that, by penalizing shared links in the various "algorithms"&lt;/p&gt;
&lt;p&gt;Bluesky doesn't have that misfeature, thankfully!&lt;/p&gt;
&lt;p&gt;(In my ideal world everyone would get their own link blog too, but sharing links on Bluesky and Mastodon is almost as good)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sharing interesting links with commentary is a low effort, high value way to contribute to internet life at large.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django-admin"&gt;django-admin&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/john-gruber"&gt;john-gruber&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="django"/><category term="django-admin"/><category term="john-gruber"/></entry><entry><title>A homepage redesign for my blog's 22nd birthday</title><link href="https://simonwillison.net/2024/Jun/12/homepage-redesign/#atom-series" rel="alternate"/><published>2024-06-12T19:59:17+00:00</published><updated>2024-06-12T19:59:17+00:00</updated><id>https://simonwillison.net/2024/Jun/12/homepage-redesign/#atom-series</id><summary type="html">
    &lt;p&gt;This blog is 22 years old today! I wrote up &lt;a href="https://simonwillison.net/2022/Jun/12/twenty-years/"&gt;a whole bunch of higlights&lt;/a&gt; for the 20th birthday a couple of years ago. Today I'm celebrating with something a bit smaller: I finally redesigned the homepage.&lt;/p&gt;
&lt;p&gt;I publish three kinds of content on my blog: &lt;a href="https://simonwillison.net/search/?type=entry"&gt;entries&lt;/a&gt; (like this one), "&lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;blogmarks&lt;/a&gt;" (aka annotated links) and &lt;a href="https://simonwillison.net/search/?type=quotation"&gt;quotations&lt;/a&gt;. Until recently the entries were the main feature on the (desktop) homepage, with blogmarks and quotations relegated to the sidebar.&lt;/p&gt;
&lt;p&gt;Back in April I &lt;a href="https://simonwillison.net/2024/Apr/25/blogmarks-that-use-markdown/"&gt;implemented Markdown support&lt;/a&gt; for my blogmarks, allowing me to include additional links and quotations in the body of those descriptions.&lt;/p&gt;
&lt;p&gt;I was inspired in this by &lt;a href="https://daringfireball.net/"&gt;Daring Fireball&lt;/a&gt;, which has long published a combination of annotated links combined with longer essay style entries.&lt;/p&gt;
&lt;p&gt;It turns out I &lt;em&gt;really like&lt;/em&gt; posting longer-form content attached to links! Here's one from &lt;a href="https://simonwillison.net/2024/Jun/12/generative-ai-is-not-going-to-build-your-engineering-team/"&gt;earlier today&lt;/a&gt; which rivals my full entries in length.&lt;/p&gt;
&lt;p&gt;These were looking pretty cramped in the sidebar:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/homepage-old.jpg" alt="Screenshot of my blog with a big entry about Thoughts on the WWDC 2024 keynote on the left and a sidebar with a long blogmark description in the sidebar on the right" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;So I've done a small redesign. The left hand column on my homepage now displays entries, quotations and blogmarks as a combined list, reusing the format I already had in place for the &lt;a href="https://simonwillison.net/tags/blogging/"&gt;tag page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The right hand column is for "highlights", aka my longer form blog entries.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/homepage-new.jpg" alt="Screenshot of my blog with a blogmark on the left and a list of article headlines on the right" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The mobile version of my site was already serving content mixed together like this, so this change mainly brings the desktop version in line with the mobile one.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/simonwillisonblog/issues/438"&gt;the issue on GitHub&lt;/a&gt; and &lt;a href="https://github.com/simonw/simonwillisonblog/commit/8e38a3f51ec50501fcb6fcc19a26acde2fa5cd4b"&gt;the commit that implemented the change&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="site-upgrades"/></entry><entry><title>Semi-automating a Substack newsletter with an Observable notebook</title><link href="https://simonwillison.net/2023/Apr/4/substack-observable/#atom-series" rel="alternate"/><published>2023-04-04T17:55:28+00:00</published><updated>2023-04-04T17:55:28+00:00</updated><id>https://simonwillison.net/2023/Apr/4/substack-observable/#atom-series</id><summary type="html">
    &lt;p&gt;I recently started sending out &lt;a href="https://simonw.substack.com/"&gt;a weekly-ish email newsletter&lt;/a&gt; consisting of content from my blog. I've mostly automated that, using &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;an Observable Notebook&lt;/a&gt; to generate the HTML. Here's how that system works.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/substack-index.jpg" alt="Screenshot of Substack: Simon Willison' Newsletter, with a big podcast promo image next to Think of language models like GhatGPT as a calculator for words, followed by two other recent newsletter headlines." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;What goes in my newsletter&lt;/h4&gt;
&lt;p&gt;My blog has three types of content: &lt;a href="https://simonwillison.net/search/?type=entry"&gt;entries&lt;/a&gt;, &lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;blogmarks&lt;/a&gt; and &lt;a href="https://simonwillison.net/search/?type=quotation"&gt;quotations&lt;/a&gt;. "Blogmarks" is a name I came up with for bookmarks &lt;a href="https://simonwillison.net/2003/Nov/24/blogmarks/"&gt;in 2003&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Blogmarks and quotations show up in my blog's sidebar, entries get the main column - but on mobile the three are combined into a single flow.&lt;/p&gt;
&lt;p&gt;These live in a PostgreSQL database managed by Django. You can see them defined &lt;a href="https://github.com/simonw/simonwillisonblog/blob/main/blog/models.py"&gt;in models.py&lt;/a&gt; in my blog's open source repo.&lt;/p&gt;
&lt;p&gt;My newsletter consists of all of the new entries, blogmarks and quotations since I last sent it out. I include the entries first in reverse chronological order, since usually the entry I've just written is the one I want to use for the email subject. The blogmarks and quotations come in chronological order afterwards.&lt;/p&gt;
&lt;p&gt;I'm including the full HTML for everything: people don't need to click through back to my blog to read it, all of the content should be right there in their email client.&lt;/p&gt;
&lt;h4&gt;The Substack API: RSS and copy-and-paste&lt;/h4&gt;
&lt;p&gt;Substack doesn't yet offer an API, and &lt;a href="https://support.substack.com/hc/en-us/articles/360038433912-Does-Substack-have-an-API-"&gt;have no public plans&lt;/a&gt; to do so.&lt;/p&gt;
&lt;p&gt;They do offer an RSS feed of each newsletter though - add &lt;code&gt;/feed&lt;/code&gt; to the newsletter subdomain to get it. Mine is at &lt;a href="https://simonw.substack.com/feed"&gt;https://simonw.substack.com/feed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So we can get data back out again... but what about getting data in? I don't want to manually assemble a newsletter from all of these different sources of data.&lt;/p&gt;
&lt;p&gt;That's where copy-and-paste comes in.&lt;/p&gt;
&lt;p&gt;The Substack compose editor incorporates a well built rich-text editor. You can paste content into it and it will clean it up to fit the subset of HTML that Substack supports... but that's a pretty decent subset. Headings, paragraphs, lists, links, code blocks and images are all supported.&lt;/p&gt;
&lt;p&gt;The vast majority of content on my blog fits that subset neatly.&lt;/p&gt;
&lt;p&gt;Crucially, pasting in images as part of that rich text content Just Works: Substack automatically copies any images to their &lt;code&gt;substack-post-media&lt;/code&gt; S3 bucket and embeds links to their CDN in the body of the newsletter.&lt;/p&gt;
&lt;p&gt;So... if I can generate the intended rich-text HTML for my whole newsletter, I can copy and paste it directly into the Substack.&lt;/p&gt;
&lt;p&gt;That's exactly what my new Observable notebook does: &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;https://observablehq.com/@simonw/blog-to-newsletter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Generating HTML is a well trodden path, but I also wanted a "copy to clipboard" button that would copy the rich text version of that HTML such that pasting it into Substack would do the right thing.&lt;/p&gt;
&lt;p&gt;With a bit of help from &lt;a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Interact_with_the_clipboard"&gt;MDN&lt;/a&gt; and &lt;a href="https://til.simonwillison.net/javascript/copy-rich-text-to-clipboard"&gt;ChatGPT (my TIL)&lt;/a&gt; I figured out the following:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-en"&gt;copyRichText&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;html&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;html&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Create a temporary element to hold the HTML content&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createElement&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"div"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;appendChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Select the HTML content&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;range&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;selectNode&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Copy the selected HTML content to the clipboard&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;selection&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;window&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getSelection&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;execCommand&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"copy"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This works great! Set up a button that triggers that function and clicking that button will copy a rich text version of the HTML to the clipboard, such that pasting it directly into the Substack editor has the desired effect.&lt;/p&gt;
&lt;h4&gt;Assembling the HTML&lt;/h4&gt;
&lt;p&gt;I love using &lt;a href="https://observablehq.com/"&gt;Observable Notebooks&lt;/a&gt; for this kind of project: quick data integration tools that need a UI and will likely be incrementally improved over time.&lt;/p&gt;
&lt;p&gt;Using Observable for these means I don't need to host anything and I can iterate my way to the right solution really quickly.&lt;/p&gt;
&lt;p&gt;First, I needed to retrieve my entries, blogmarks and quotations.&lt;/p&gt;
&lt;p&gt;I never built an API for my Django blog directly, but a while ago I set up a mechanism that &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/main/.github/workflows/backup.yml"&gt;exports the contents&lt;/a&gt; of my blog to &lt;a href="https://github.com/simonw/simonwillisonblog-backup"&gt;my simonwillisonblog-backup&lt;/a&gt; GitHub repository for safety, and then deploys a Datasette/SQLite copy of that data to &lt;a href="https://datasette.simonwillison.net/"&gt;https://datasette.simonwillison.net/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; offers a JSON API for querying that data, and exposes open CORS headers which means JavaScript running in Observable can query it directly.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.simonwillison.net/simonwillisonblog?sql=select+*+from+blog_entry+order+by+id+desc+limit+5"&gt;Here's an example SQL query&lt;/a&gt; running against that Datasette instance - click the &lt;code&gt;.json&lt;/code&gt; link on that page to get that data back as JSON instead.&lt;/p&gt;
&lt;p&gt;My Observable notebook can then retrieve the exact data it needs to construct the HTML for the newsletter.&lt;/p&gt;
&lt;p&gt;The smart thing to do would have been to retrieve the data from the API and then use JavaScript inside Observable to compose that together into the HTML for the newsletter.&lt;/p&gt;
&lt;p&gt;I decided to challenge myself to doing most of the work in SQL instead, and came up with the following absolute monster of a query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;with content &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type, title, created, slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;h3&amp;gt;&amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
      &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;) 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; title &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt; - &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/h3&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; body
      &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_entry
  &lt;span class="pl-k"&gt;union all&lt;/span&gt;
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;blogmark&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type,
    link_title, created, slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Link&amp;lt;/strong&amp;gt; &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt; &amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-k"&gt;||&lt;/span&gt; link_url &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
      &lt;span class="pl-k"&gt;||&lt;/span&gt; link_title &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt;:&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt; &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; commentary &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/p&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
      &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
  link_url &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_blogmark
  &lt;span class="pl-k"&gt;union all&lt;/span&gt;
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;quotation&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type,
    source, created, slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;strong&amp;gt;Quote&amp;lt;/strong&amp;gt; &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;blockquote&amp;gt;&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
    &lt;span class="pl-k"&gt;||&lt;/span&gt; replace(quotation, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;br&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;/blockquote&amp;gt;&amp;lt;p&amp;gt;&amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt;
    coalesce(source_url, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;#&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; source &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
    &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
    source_url &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_quotation
),
collected &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    type,
    title,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
      &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; 
      &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
      &lt;span class="pl-k"&gt;as&lt;/span&gt; url,
    created,
    html,
    external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; content
  &lt;span class="pl-k"&gt;where&lt;/span&gt; created &lt;span class="pl-k"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;now&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;-&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; :numdays &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt; days&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;)   
  &lt;span class="pl-k"&gt;order by&lt;/span&gt; created &lt;span class="pl-k"&gt;desc&lt;/span&gt;
)
&lt;span class="pl-k"&gt;select&lt;/span&gt; type, title, url, created, html, external_url
&lt;span class="pl-k"&gt;from&lt;/span&gt; collected 
&lt;span class="pl-k"&gt;order by&lt;/span&gt; 
  case type 
    when &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; then &lt;span class="pl-c1"&gt;0&lt;/span&gt; 
    else &lt;span class="pl-c1"&gt;1&lt;/span&gt; 
  end,
  case type 
    when &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; then created 
    else &lt;span class="pl-k"&gt;-&lt;/span&gt;strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%s&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) 
  end &lt;span class="pl-k"&gt;desc&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This logic really should be in the JavaScript instead! You can &lt;a href="https://datasette.simonwillison.net/simonwillisonblog?sql=with+content+as+%28%0D%0A++select%0D%0A++++%27entry%27+as+type%2C+title%2C+created%2C+slug%2C%0D%0A++++%27%3Ch3%3E%3Ca+href%3D%22%27+%7C%7C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27+%7C%7C+%27%22%3E%27+%0D%0A++++++%7C%7C+title+%7C%7C+%27%3C%2Fa%3E+-+%27+%7C%7C+date%28created%29+%7C%7C+%27%3C%2Fh3%3E%27+%7C%7C+body%0D%0A++++++as+html%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_entry%0D%0A++union+all%0D%0A++select%0D%0A++++%27blogmark%27+as+type%2C%0D%0A++++link_title%2C+created%2C+slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ELink%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+link_url+%7C%7C+%27%22%3E%27%0D%0A++++++%7C%7C+link_title+%7C%7C+%27%3C%2Fa%3E%3A%27+%7C%7C+%27+%27+%7C%7C+commentary+%7C%7C+%27%3C%2Fp%3E%27%0D%0A++++++as+html%2C%0D%0A++link_url+as+external_url%0D%0A++from+blog_blogmark%0D%0A++union+all%0D%0A++select%0D%0A++++%27quotation%27+as+type%2C%0D%0A++++source%2C+created%2C+slug%2C%0D%0A++++%27%3Cstrong%3EQuote%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27%3Cblockquote%3E%3Cp%3E%3Cem%3E%27%0D%0A++++%7C%7C+replace%28quotation%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%27%3C%2Fem%3E%3C%2Fp%3E%3C%2Fblockquote%3E%3Cp%3E%3Ca+href%3D%22%27+%7C%7C%0D%0A++++coalesce%28source_url%2C+%27%23%27%29+%7C%7C+%27%22%3E%27+%7C%7C+source+%7C%7C+%27%3C%2Fa%3E%3C%2Fp%3E%27%0D%0A++++as+html%2C%0D%0A++++source_url+as+external_url%0D%0A++from+blog_quotation%0D%0A%29%2C%0D%0Acollected+as+%28%0D%0A++select%0D%0A++++type%2C%0D%0A++++title%2C%0D%0A++++%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%7C%7C+%0D%0A++++++%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%0D%0A++++++as+url%2C%0D%0A++++created%2C%0D%0A++++html%2C%0D%0A++++external_url%0D%0A++from+content%0D%0A++where+created+%3E%3D+date%28%27now%27%2C+%27-%27+%7C%7C+%3Anumdays+%7C%7C+%27+days%27%29+++%0D%0A++order+by+created+desc%0D%0A%29%0D%0Aselect+type%2C+title%2C+url%2C+created%2C+html%2C+external_url%0D%0Afrom+collected+%0D%0Aorder+by+%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+0+%0D%0A++++else+1+%0D%0A++end%2C%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+created+%0D%0A++++else+-strftime%28%27%25s%27%2C+created%29+%0D%0A++end+desc&amp;amp;numdays=7"&gt;try that query in Datasette&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are a bunch of tricks in there, but my favourite is this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
  &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;,
    (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;
  ) &lt;span class="pl-k"&gt;||&lt;/span&gt;  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-k"&gt;as&lt;/span&gt; url&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the trick I'm using to generate the URL for each entry, blogmark and quotation.&lt;/p&gt;
&lt;p&gt;These are stored as datetime values in the database, but the eventual URLs look like this:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/2023/Apr/2/calculator-for-words/"&gt;https://simonwillison.net/2023/Apr/2/calculator-for-words/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So I need to turn that date into a YYYY/Mon/DD URL component.&lt;/p&gt;
&lt;p&gt;One problem: SQLite doesn't have a date format string that produces a three letter month abbreviation. But... with cunning application of the &lt;code&gt;substr()&lt;/code&gt; function and a string of all the month abbreviations I can get what I need.&lt;/p&gt;
&lt;p&gt;The above SQL query plus a little bit of JavaScript provides almost everything I need to generate the HTML for my newsletter.&lt;/p&gt;
&lt;h4&gt;Excluding previously sent content&lt;/h4&gt;
&lt;p&gt;There's one last problem to solve: I want to send a newsletter containing everything that's new since my last edition - I don't want to send out the same content twice.&lt;/p&gt;
&lt;p&gt;I came up with a delightfully gnarly solution to that as well.&lt;/p&gt;
&lt;p&gt;As mentioned earlier, Substack provides an RSS feed of previous editions. I can use that data to avoid including content that's already been sent.&lt;/p&gt;
&lt;p&gt;One problem: the Substack RSS feed does't include CORS headers, which means I can't access it directly from my notebook.&lt;/p&gt;
&lt;p&gt;GitHub offers CORS headers for every file in every repository. I already had a repo that was backing up my blog... so why not set that to backup my RSS feed from Substack as well?&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/c42b3afd6bd8cb2a4e8fa928c77426ec71552194/.github/workflows/backup.yml#L70-L74"&gt;added this&lt;/a&gt; to my existing &lt;code&gt;backup.yml&lt;/code&gt; GitHub Actions workflow:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Backup Substack&lt;/span&gt;
  &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;    curl 'https://simonw.substack.com/feed' | \&lt;/span&gt;
&lt;span class="pl-s"&gt;      python -c "import sys, xml.dom.minidom; print(xml.dom.minidom.parseString(sys.stdin.read()).toprettyxml(indent='  '))" \&lt;/span&gt;
&lt;span class="pl-s"&gt;      &amp;gt; simonw-substack-com.xml&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I'm piping it through a tiny Python script here to pretty-print the XML before saving it, because pretty-printed XML is easier to read diffs against later on.&lt;/p&gt;
&lt;p&gt;Now &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/c42b3afd6bd8cb2a4e8fa928c77426ec71552194/simonw-substack-com.xml"&gt;simonw-substack-com.xml&lt;/a&gt; is a copy of my RSS feed in a GitHub repo, which means I can access the data directly from JavaScript running on Observable.&lt;/p&gt;
&lt;p&gt;Here's the code I wrote there to fetch that RSS feed, parse it as XML and return a string containing just the HTML of all of the posts:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;previousNewsletters&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-s1"&gt;const&lt;/span&gt; response &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-s"&gt;"https://raw.githubusercontent.com/simonw/simonwillisonblog-backup/main/simonw-substack-com.xml"&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;rss&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;parser&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-v"&gt;DOMParser&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;xmlDoc&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;parser&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;parseFromString&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;rss&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"application/xml"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;xpathExpression&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"//content:encoded"&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-en"&gt;namespaceResolver&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;prefix&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;ns&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;content&lt;/span&gt;: &lt;span class="pl-s"&gt;"http://purl.org/rss/1.0/modules/content/"&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;ns&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s1"&gt;prefix&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt; &lt;span class="pl-c1"&gt;||&lt;/span&gt; &lt;span class="pl-c1"&gt;null&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;xmlDoc&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;evaluate&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-s1"&gt;xpathExpression&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-s1"&gt;xmlDoc&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-en"&gt;namespaceResolver&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-v"&gt;XPathResult&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;ANY_TYPE&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;null&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;node&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;text&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;while&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;node&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;iterateNext&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-s1"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;push&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;node&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;textContent&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;join&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"\n"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I span up a regular expression to extract all of the URLs from that HTML:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;previousLinks&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-s1"&gt;const&lt;/span&gt; regex &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-pds"&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;span class="pl-kos"&gt;(?:&lt;/span&gt;"&lt;span class="pl-c1"&gt;|&lt;/span&gt;&amp;amp;quot;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;https?:&lt;span class="pl-cce"&gt;\/&lt;/span&gt;&lt;span class="pl-cce"&gt;\/&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;^&lt;span class="pl-cce"&gt;\s&lt;/span&gt;"&amp;lt;&amp;gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-c1"&gt;+&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;(?:&lt;/span&gt;"&lt;span class="pl-c1"&gt;|&lt;/span&gt;&amp;amp;quot;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;g&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-v"&gt;Array&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;from&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;previousNewsletters&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;matchAll&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;regex&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;match&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;match&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;1&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Added a "skip existing" toggle checkbox to my notebook:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;viewof&lt;/span&gt; &lt;span class="pl-s1"&gt;skipExisting&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Inputs&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;toggle&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c1"&gt;label&lt;/span&gt;: &lt;span class="pl-s"&gt;"Skip content sent in prior newsletters"&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And added this code to filter the raw content based on whether or not the toggle was selected:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;content&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;skipExisting&lt;/span&gt;
  ? &lt;span class="pl-s1"&gt;raw_content&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;filter&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
      &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="pl-c1"&gt;!&lt;/span&gt;&lt;span class="pl-s1"&gt;previousLinks&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;includes&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;url&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
        &lt;span class="pl-c1"&gt;!&lt;/span&gt;&lt;span class="pl-s1"&gt;previousLinks&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;includes&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;external_url&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
    &lt;span class="pl-kos"&gt;)&lt;/span&gt;
  : &lt;span class="pl-s1"&gt;raw_content&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;url&lt;/code&gt; is the URL to the post on my blog. &lt;code&gt;external_url&lt;/code&gt; is the URL to the original source of the blogmark or quotation. A match against ether of those should exclude the content from my next newsletter.&lt;/p&gt;
&lt;h4&gt;My workflow for sending a newsletter&lt;/h4&gt;
&lt;p&gt;Given all of the above, sending a newsletter out is hardly any work at all:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ensure the most recent backup of my blog has run, such that the Datasette instance contains my latest content. I do that by &lt;a href="https://github.com/simonw/simonwillisonblog-backup/actions/workflows/backup.yml"&gt;triggering this action&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Navigate to &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;https://observablehq.com/@simonw/blog-to-newsletter&lt;/a&gt; - select "Skip content sent in prior newsletters" and then click the "Copy rich text newsletter to clipboard" button.&lt;/li&gt;
&lt;li&gt;Navigate to the Substack "publish" interface and paste that content into the rich text editor.&lt;/li&gt;
&lt;li&gt;Pick a title and subheading, and maybe add a bit of introductory text.&lt;/li&gt;
&lt;li&gt;Preview it. If the preview looks good, hit "send".&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/newsletter-small.gif" alt="Animated screenshot showing the process of sending the newsletter as described above" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;h4&gt;Copy and paste APIs&lt;/h4&gt;
&lt;p&gt;I think copy and paste is under-rated as an API mechanism.&lt;/p&gt;
&lt;p&gt;There are no rate limits or API keys to worry about.&lt;/p&gt;
&lt;p&gt;It's supported by almost every application, even ones that are resistant to API integrations.&lt;/p&gt;
&lt;p&gt;It even works great on mobile phones, especially if you include a "copy to clipboard" button.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://datasette.io/plugins/datasette-copyable"&gt;datasette-copyable&lt;/a&gt; plugin for Datasette is one of my earlier explorations of this. It makes it easy to copy data out of Datasette in a variety of useful formats.&lt;/p&gt;
&lt;p&gt;This Observable newsletter project has further convinced me that the clipboard is an under-utilized mechanism for building tools to help integrate data together in creative ways.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cors"&gt;cors&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/newsletter"&gt;newsletter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/substack"&gt;substack&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="projects"/><category term="datasette"/><category term="observable"/><category term="cors"/><category term="newsletter"/><category term="substack"/><category term="site-upgrades"/></entry><entry><title>What to blog about</title><link href="https://simonwillison.net/2022/Nov/6/what-to-blog-about/#atom-series" rel="alternate"/><published>2022-11-06T17:05:37+00:00</published><updated>2022-11-06T17:05:37+00:00</updated><id>https://simonwillison.net/2022/Nov/6/what-to-blog-about/#atom-series</id><summary type="html">
    &lt;p&gt;You should start a blog. Having your own little corner of the internet is good for the soul!&lt;/p&gt;
&lt;p&gt;But what should you write about?&lt;/p&gt;
&lt;p&gt;It's easy to get hung up on this. I've definitely felt the self-imposed pressure to only write something if it's new, and unique, and feels like it's never been said before. This is a mental trap that does nothing but hold you back.&lt;/p&gt;
&lt;p&gt;Here are two types of content that I guarantee you can produce and feel great about producing: TILs, and writing descriptions of your projects.&lt;/p&gt;
&lt;h4 id="tils"&gt;Today I Learned&lt;/h4&gt;
&lt;p&gt;A TIL - Today I Learned - is the most liberating form of content I know of.&lt;/p&gt;
&lt;p&gt;Did you just learn how to do something? Write about that.&lt;/p&gt;
&lt;p&gt;Call it a TIL - that way you're not promising anyone a revelation or an in-depth tutorial. You're saying "I just figured this out: here are my notes, you may find them useful too".&lt;/p&gt;
&lt;p&gt;I also like the humility of this kind of content. Part of the reason I publish them is to emphasize that even with 25 years of professional experience you should still celebrate learning even the most basic of things.&lt;/p&gt;
&lt;p&gt;I learned the "interact" command in &lt;code&gt;pdb&lt;/code&gt; the other day! &lt;a href="https://til.simonwillison.net/python/pdb-interact"&gt;Here's my TIL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I started publishing TILs &lt;a href="https://simonwillison.net/2020/Apr/20/self-rewriting-readme/"&gt;in April 2020&lt;/a&gt;. I'm up to 346 now, and most of them took less than 10 minutes to write. It's such a great format for quick and satisfying online writing.&lt;/p&gt;
&lt;p&gt;My collection lives at &lt;a href="https://til.simonwillison.net"&gt;https://til.simonwillison.net&lt;/a&gt; - which publishes content from my &lt;a href="https://github.com/simonw/til"&gt;simonw/til&lt;/a&gt; GitHub repository.&lt;/p&gt;
&lt;h4 id="projects"&gt;Write about your projects&lt;/h4&gt;
&lt;p&gt;If you do a project, you should write about it.&lt;/p&gt;
&lt;p&gt;I recommend adding "write about it" to your definition of "done" for anything that you build or create.&lt;/p&gt;
&lt;p&gt;Like with TILs, this takes away the pressure to be unique. It doesn't matter if your project overlaps with thousands of others: the experience of building it is unique to you. You deserve to have a few paragraphs and a screenshot out there explaining (and quietly celebrating) what you made.&lt;/p&gt;
&lt;p&gt;The screenshot is particularly important. Will your project still exist and work in a decade? I hope so, but we all know how quickly things succumb to bit-rot.&lt;/p&gt;
&lt;p&gt;Even better than a screenshot: an animated GIF screenshot! I capture these with &lt;a href="https://www.cockos.com/licecap/"&gt;LICEcap&lt;/a&gt;. And a video is even better than that, but those take a lot more effort to produce.&lt;/p&gt;
&lt;p&gt;It's &lt;em&gt;incredibly&lt;/em&gt; tempting to skip the step where you write about a project. But any time you do that you're leaving a huge amount of uncaptured value from that project on the table.&lt;/p&gt;
&lt;p&gt;These days I make myself do it: I tell myself that writing about something is the cost I have to pay for building it. And I &lt;em&gt;always&lt;/em&gt; end up feeling that the effort was more than worthwhile.&lt;/p&gt;
&lt;p&gt;Check out my &lt;a href="https://simonwillison.net/tags/projects/"&gt;projects tag&lt;/a&gt; for examples of this kind of content.&lt;/p&gt;
&lt;p&gt;So that's my advice for blogging: &lt;strong&gt;write about things you've learned, and write about things you've built!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update 22nd December 2024: I identified a third useful category: writing about &lt;a href="https://simonwillison.net/2024/Dec/22/link-blog/"&gt;things you've found&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/writing"&gt;writing&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="writing"/></entry><entry><title>Twenty years of my blog</title><link href="https://simonwillison.net/2022/Jun/12/twenty-years/#atom-series" rel="alternate"/><published>2022-06-12T22:59:31+00:00</published><updated>2022-06-12T22:59:31+00:00</updated><id>https://simonwillison.net/2022/Jun/12/twenty-years/#atom-series</id><summary type="html">
    &lt;p&gt;I started this blog on &lt;a href="https://simonwillison.net/2002/Jun/12/"&gt;June 12th 2002&lt;/a&gt; - twenty years ago today! To celebrate two decades of blogging, I decided to pull together some highlights and dive down a self-indulgent nostalgia hole.&lt;/p&gt;
&lt;h4&gt;Some highlights&lt;/h4&gt;
&lt;p&gt;Some of my more influential posts, in chronological order.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2002/Sep/2/aNewXMLRPCLibraryForPHP/"&gt;A new XML-RPC library for PHP&lt;/a&gt;&lt;/strong&gt; - 2nd September 2002&lt;/p&gt;
&lt;p&gt;I was really excited about XML-RPC, one of the earliest technologies for building Web APIs. IXR, the Incutio library for XML-RPC, was one of my earliest ever open source library releases. Here's &lt;a href="https://web.archive.org/web/20060316094837/http://scripts.incutio.com/xmlrpc/"&gt;a capture of the old site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/ixr-20060316094837.jpg" alt="Website: The Incutio XML-RPC Library for PHP. Version 1.6, pbulished May 25th 2003." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've not touched anything relating to this project in over 15 years now, but it has lived on in both &lt;a href="https://github.com/WordPress/WordPress/blob/270f2011f8ec7265c3f4ddce39c77ef5b496ed1c/wp-includes/class-IXR.php"&gt;WordPress&lt;/a&gt; and &lt;a href="https://github.com/d6lts/drupal/blob/6.x/includes/xmlrpc.inc"&gt;Drupal&lt;/a&gt; (now only in Drupal 6 LTS).&lt;/p&gt;
&lt;p&gt;It's also been responsible for &lt;a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-5266"&gt;at least one CVE vulnerability&lt;/a&gt; in those platforms!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2003/Mar/25/getElementsBySelector/"&gt;getElementsBySelector()&lt;/a&gt;&lt;/strong&gt; - 25th March 2003&lt;/p&gt;
&lt;p&gt;Andrew Hayward had posted a delightful snippet of JavaScript called &lt;a href="https://web.archive.org/web/20030402172546/http://blog.mooncalf.me.uk/archive/2003/03/25"&gt;document.getElementsByClassName()&lt;/a&gt; - like &lt;code&gt;document.getElementsByTagName()&lt;/code&gt; but for classes instead.&lt;/p&gt;
&lt;p&gt;Inspired by this, I built &lt;code&gt;document.getElementsBySelector()&lt;/code&gt; - a function that could take a CSS selector and return all of the matching elements.&lt;/p&gt;
&lt;p&gt;This ended up being very influential indeed! Paul Irish offers &lt;a href="https://www.paulirish.com/2008/javascript-css-selector-engine-timeline/"&gt;a timeline of JavaScript CSS selector engines&lt;/a&gt; which tracks some of what happens next. Most notably, &lt;code&gt;getElementsBySelector()&lt;/code&gt; was part of John Resig's &lt;a href="https://johnresig.com/blog/annotated-version-of-the-original-jquery-release/"&gt;inspiration&lt;/a&gt; in creating the first version of jQuery. To this day, the jQuery source includes &lt;a href="https://github.com/jquery/jquery/blob/a684e6ba836f7c553968d7d026ed7941e1a612d8/test/data/qunit-fixture.html"&gt;this testing fixture&lt;/a&gt; which is derived from &lt;a href="https://static.simonwillison.net/static/2003/getElementsBySelector.html"&gt;my original demo page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I guess you could call &lt;code&gt;document.getElementsBySelector()&lt;/code&gt; the original &lt;a href="https://developer.mozilla.org/en-US/docs/Glossary/Polyfill"&gt;polyfill&lt;/a&gt; for &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll"&gt;document.querySelectorAll()&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2003/Aug/27/kansas/"&gt;I'm in Kansas&lt;/a&gt;&lt;/strong&gt; - 27th August 2003&lt;/p&gt;
&lt;p&gt;In May 2003 Adrian Holovaty &lt;a href="https://www.holovaty.com/writing/211/"&gt;posted about a job opportunity&lt;/a&gt; for a web developer at at the Lawrence Journal-World newspaper in Lawrence, Kansas.&lt;/p&gt;
&lt;p&gt;This coincided with my UK university offering a "year in industry" placement, which meant I could work for a year anywhere in the world with a student visa program. I'd been reading Adrian's blog for a while and really liked the way he thought about building for the web - we were big fans of Web Standards and CSS and cleanly-designed URLs, all of which were very hot new things at the time!&lt;/p&gt;
&lt;p&gt;So I talked to Adrian about if this could work as a year-long opportunity, and we figured out how to make it work.&lt;/p&gt;
&lt;p&gt;At the Lawrence Journal-Word Adrian and I decided to start using Python instead of PHP, in order to build a CMS for that local newspaper...&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2005/Jul/17/django/"&gt;Introducing Django&lt;/a&gt;&lt;/strong&gt; - 17th July 2005&lt;/p&gt;
&lt;p&gt;... and this was the eventual outcome! Adrian and I didn't even know we were building a web framework at first - we called it "the CMS". But we kept having to solve new foundational problems: how should database routing work? What about templating? What's the best way to represent the incoming HTTP request?&lt;/p&gt;
&lt;p&gt;I had left the Lawrence Journal-World in 2004, but by 2005 the team there had grown what's now known as Django far beyond where it was when I had left, and they got the go-ahead from the company to release it as open source (partly thanks to the example set by Ruby on Rails, which first released in August 2004).&lt;/p&gt;
&lt;p&gt;In 2010 I wrote up &lt;a href="https://simonwillison.net/2010/Aug/24/what-is-the-history/"&gt;a more detailed history&lt;/a&gt; of Django in a Quora answer, now mirrored to my blog.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2006/Dec/15/upgrade/"&gt;Finally powered by Django&lt;/a&gt;&lt;/strong&gt; - 15th December 2006&lt;/p&gt;
&lt;p&gt;In which I replaced my duct-tape-and-mud PHP blogging engine with a new Django app. I sadly don't have the version history for this anymore (this was pre-git, I think I probably had it in Subversion or Mercurial somewhere) but today's implementation is still based on the same code, upgraded to Django 1.8 &lt;a href="https://github.com/simonw/simonwillisonblog/commit/e6d0327b37debdf820b5cfef4fb7d09a9624cea9"&gt;in 2015&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That 2006 version did include a very pleasing Flickr integration to import my photos (&lt;a href="https://web.archive.org/web/20080720032451/https://simonwillison.net/2005/Jan/6/"&gt;example on the Internet Archive&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of my blog's archive page for 6th January 2005 with my old design - it included photosets from Flickr mixed in among the links, as well as a set of photo thumbnails in the right hand navigation underneath the calendar widget" src="https://static.simonwillison.net/static/2022/web-archive-org-web-20080720032451-https--simonwillison-net-2005-Jan-6.jpg" style="max-width: 100%" /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2006/Dec/19/openid/"&gt;How to turn your blog in to an OpenID&lt;/a&gt;&lt;/strong&gt; - 19th December 2006&lt;/p&gt;
&lt;p&gt;In late 2006 I got very, very excited about &lt;a href="https://en.wikipedia.org/wiki/OpenID"&gt;OpenID&lt;/a&gt;. I was convinced that Microsoft Passport was going to take over SSO on the internet, and that the only way to stop that was to promote an open, decentralized solution. I wrote posts about it, &lt;a href="https://web.archive.org/web/20070227104926/http://simonwillison.net/2006/openid-screencast/"&gt;made screencasts&lt;/a&gt; (that one &lt;a href="https://web.archive.org/web/20070302124121/https://digg.com/programming/Screencast_How_to_use_OpenID"&gt;got 840 diggs&lt;/a&gt;! Annoyingly I was serving it from the Internet Archive who &lt;a href="http://archive.org/download/openid-howto/openid-howto.mp4"&gt;appear to have deleted it&lt;/a&gt;) and gave a whole bunch of conference talks about it too.&lt;/p&gt;
&lt;p&gt;I spent the next few years advocating for OpenID - in particular the URL-based OpenID mechanism where any website can be turned into an identifier. It didn't end up taking off, and with hindsight I think that's likely for the best: expecting people to take control of their own security by chosing their preferred authentication provider sounded great to me in 2006, but I can understand why companies chose to instead integrate with a smaller, tightly controlled set of SSO partners over time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2009/Mar/10/openplatform/"&gt;A few notes on the Guardian Open Platform&lt;/a&gt;&lt;/strong&gt; - 10th March 2009&lt;/p&gt;
&lt;p&gt;In 2009 I was working at &lt;a href="https://www.theguardian.co.uk/"&gt;the Guardian&lt;/a&gt; newspaper in London in my first proper data journalism role - my work at the Lawrence Journal-World had hinted towards that a little, but I spent the vast majority of my time there building out a CMS.&lt;/p&gt;
&lt;p&gt;In March we launched two major initiatives: the Datablog (also known as the Data Store) and the Guardian's Open Platform (an API that is &lt;a href="https://open-platform.theguardian.com/"&gt;still offered to this day&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The goal of the Datablog was to &lt;a href="https://www.theguardian.com/news/datablog/2009/mar/10/blogpost1"&gt;share the data behind the stories&lt;/a&gt;. Simon Rogers, the Guardian's data editor, had been collecting meticulous datasets about the world to help power infographics in the paper for years. The new plan was to share that raw data with the world.&lt;/p&gt;
&lt;p&gt;We started out using Google Sheets for this. I desperately wanted to come up with something less proprietary than that - I spent quite some time experimenting with &lt;a href="https://en.wikipedia.org/wiki/Apache_CouchDB"&gt;CouchDB&lt;/a&gt; - but Google Sheets was more than enough to get the project started.&lt;/p&gt;
&lt;p&gt;Many years later my continued mulling of this problem formed part of the inspiration for my creation of &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;, a story I told in my 2018 PyBay talk &lt;a href="https://simonwillison.net/2018/Aug/19/instantly-publish-datasette/"&gt;How to Instantly Publish Data to the Internet with Datasette&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2009/Oct/22/redis/"&gt;Why I like Redis&lt;/a&gt;&lt;/strong&gt; -  22nd October 2009&lt;/p&gt;
&lt;p&gt;I got interested in NoSQL for a few years starting around 2009. I still think Redis was the most interesting new piece of technology to come out of that whole movement - an in-memory data structure server exposed over the network turns out to be a fantastic complement for other data stores, and even though I now default to PostgreSQL or SQLite for almost everything else I can still find problems for which Redis is a great solution.&lt;/p&gt;
&lt;p&gt;In April 2010 I gave a three hour Redis tutorial at NoSQL Europe which I wrote up in &lt;a href="https://simonwillison.net/2010/Apr/25/redis/"&gt;Comprehensive notes from my three hour Redis tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2009/Nov/23/node/"&gt;Node.js is genuinely exciting&lt;/a&gt;&lt;/strong&gt; -  23rd November 2009&lt;/p&gt;
&lt;p&gt;In December 2009 I found out about Node.js. As a Python web developer I had been following the evolution of &lt;a href="https://en.wikipedia.org/wiki/Twisted_(software)"&gt;Twisted&lt;/a&gt; with great interest, but I'd also run into the classic challenge that once you start using event-driven programming almost every library you might want to use likely doesn't work for you any more.&lt;/p&gt;
&lt;p&gt;Node.js had server-side event-driven programming baked into its very core. You couldn't accidentally make a blocking call and break your event loop because it didn't ever give you the option to do so!&lt;/p&gt;
&lt;p&gt;I liked it so much I switched out my talk for &lt;a href="https://2009.ffconf.org/"&gt;Full Frontal 2009&lt;/a&gt; at the last minute for one about Node.js instead.&lt;/p&gt;
&lt;p&gt;I think this was an influential decision. I won't say who they are (for fear of mis-representing or mis-quoting them), but I've talked to entrepreneurs who built significant products on top of server-side JavaScript who told me that they heard about Node.js from me first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2009/Dec/20/crowdsourcing/"&gt;Crowdsourced document analysis and MP expenses&lt;/a&gt;&lt;/strong&gt; - 20th December 2009&lt;/p&gt;
&lt;p&gt;This was my biggest data journalism project at the Guardian.&lt;/p&gt;
&lt;p&gt;The UK government had finally got around to releasing our Member of Parliament expense reports, and there was a giant scandal brewing about the expenses that had been claimed. We recruited our audience to help dig through 10,000s of pages of PDFs to help us find more stories.&lt;/p&gt;
&lt;p&gt;The first round of the MP's expenses crowdsourcing project launched &lt;a href="https://simonwillison.net/2009/Jun/18/"&gt;in June&lt;/a&gt;, but I was too busy working on it to properly write about it! Charles Arthur wrote about it for the Guardian in &lt;a href="https://www.theguardian.com/politics/2009/jun/18/mps-expenses-crowdsourcing-app"&gt;The breakneck race to build an application to crowdsource MPs' expenses&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In December we launched round two, and I took the time to &lt;a href="https://simonwillison.net/2009/Dec/20/crowdsourcing/"&gt;write about it properly&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's a &lt;a href="https://scholar.google.com/scholar?hl=en&amp;amp;as_sdt=0%2C5&amp;amp;q=guardian+mp+expenses&amp;amp;btnG="&gt;Google Scholar search for guardian mps expenses&lt;/a&gt; - I think it was pretty influential. It's definitely one of the projects I'm most proud of in my career so far.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2010/Jan/12/wildlifenearyou/"&gt;WildlifeNearYou: It began on a fort...&lt;/a&gt;&lt;/strong&gt; - 12th January 2010&lt;/p&gt;
&lt;p&gt;In October 2008 I participated in the first &lt;a href="https://devfort.com/"&gt;/dev/fort&lt;/a&gt; - a bunch of nerds rent a fortress (or similar historic building) for a week and hack on a project together.&lt;/p&gt;
&lt;p&gt;Following that week of work it took 14 months to add the "final touches" before putting the site we had built live (partly because I insisted on implementing OpenID for it) but in January 2010 we finally went live with WildlifeNearYou.com (sadly no longer available). It was a fabulous website, which crowdsourced places that people had seen animals in order to answer the crucial question "where is my nearest Llama?".&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://web.archive.org/web/20100324200252/http://www.wildlifenearyou.com:80/"&gt;what it looked like&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/wildlifenearyou-20100324200252.jpg" alt="Find and share places to see wildlife: WildlifeNearYou is a site for sharing your passion for wildlife. Search for animals or places near you, or register to add your own trips and photos." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Although it shipped after the Guardian MP's expenses project most of the work on WildlifeNearYou had come before that - building WildlifeNearYou (in Django) was the reason I was confident that the MP's expenses project was feasible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2010/Jun/21/married/"&gt;Getting married and going travelling&lt;/a&gt;&lt;/strong&gt; - 21st June 2010&lt;/p&gt;
&lt;p&gt;One June 5th 2010 I married Natalie Downe, and we both quit our jobs to set off travelling around the world and see how far we could get.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2010/me-nat-eagle.jpg" alt="Natalie is wearing a bridal gown. I am in a suit. I have a terrifying Golden Eagle perched on my arm." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;We got as far as Casablanca, Morocco before we accidentally launched a startup together: Lanyrd, launched &lt;a href="https://simonwillison.net/2010/Aug/31/lanyrd/"&gt;in August 2010&lt;/a&gt;. "Sign in with Twitter to see conferences that your friends are speaking at, attending or tracking, then add your own events."&lt;/p&gt;
&lt;p&gt;We ended up spending the next three years on this: we went through Y Combinator, raised a sizable seed round, moved to London, hired a team and shipped a LOT of features. We even managed to ship some features that made the company money!&lt;/p&gt;
&lt;p&gt;This also coincided with me putting the blog on the back-burner for a few years.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://web.archive.org/web/20101118055319/https://lanyrd.com/"&gt;an early snapshot&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/lanyrd-20101118055319.jpg" alt="Welcome to Lanyrd. The social conference directory. Get more out of conferences. Find great conferences to attend: See what your friends are going to or speaking at, find conferences near you or browse conferences by topic. Discover what's hot while it's on: Track what's going on during the conference, even if you aren't there. Who is tweeting what, what links are doing the rounds. Use our useful mobile version to decide what to go to next. Catch up on anything you missed: Easily discover slides, video and podcasts from conferences you attended or tracked. If you spoke at an event you can build up your speaker portfolio of talks you gave." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;In 2013 we sold Lanyrd to Eventbrite, and moved our entire team (and their families) from London to San Francisco. It had been a very wild ride.&lt;/p&gt;
&lt;p&gt;Sadly the site itself is no longer available: as Eventbrite grew it became impossible to justify the work needed to keep Lanyrd maintained, safe and secure. Especially as it started to attract overwhelming volumes of spam.&lt;/p&gt;
&lt;p&gt;Natalie told the full story of Lanyrd on her blog in September 2013: &lt;a href="https://blog.natbat.net/post/61658401806/lanyrd-from-idea-to-exit"&gt;Lanyrd: from idea to exit - the story of our startup&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2017/Sep/10/scraping-irma/"&gt;Scraping hurricane Irma&lt;/a&gt;&lt;/strong&gt; - 10th September 2017&lt;/p&gt;
&lt;p&gt;In 2017 &lt;a href="https://en.wikipedia.org/wiki/Hurricane_Irma"&gt;hurricane Irma&lt;/a&gt; devastated large areas of the Caribbean and the southern USA.&lt;/p&gt;
&lt;p&gt;I got involved with the Irma Response project, helping crowdsource and publish critical information for people affected by the storm.&lt;/p&gt;
&lt;p&gt;I came up with a trick to help with scraping: I ran scrapers against important information sources and recorded the results to a git repository, in order to cheaply track changes to those sources over time.&lt;/p&gt;
&lt;p&gt;I later coined the term "Git scraping" for this technique, see my &lt;a href="https://simonwillison.net/series/git-scraping/"&gt;series of posts&lt;/a&gt; about Git scraping over time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2017/Oct/1/ship/"&gt;Getting the blog back together&lt;/a&gt;&lt;/strong&gt; - 1st October 2017&lt;/p&gt;
&lt;p&gt;Running a startup, and then working at Eventbrite afterwards, had resulted in an almost 7 year gap in blogging for me. In October 2017 I decided to finally get my blog going again. I also back-filled content for the intervening years by scraping my content &lt;a href="https://simonwillison.net/tags/quora/"&gt;from Quora&lt;/a&gt; and &lt;a href="https://simonwillison.net/tags/askmetafilter/"&gt;from Ask Metafilter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you've been meaning to start a new blog or revive an old one this is a trick that I can thoroughly recommend: just because you initially wrote something elsewhere doesn't mean you shouldn't repost it on a site you own.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2017/Oct/8/missing-content/"&gt;Recovering missing content from the Internet Archive&lt;/a&gt;&lt;/strong&gt; - 8th October 2017&lt;/p&gt;
&lt;p&gt;The other step in recovering my old blog's content was picking up some content that was missing from my old database backup. Here's how I pulled in that content by scraping the Internet Archive.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/"&gt;Implementing faceted search with Django and PostgreSQL&lt;/a&gt;&lt;/strong&gt; -  5th October 2017&lt;/p&gt;
&lt;p&gt;I absolutely love building faceted search engines. I realized a while ago that most of my career has been spent applying the exact same trick - faceted search - to different problem spaces. WildlifeNearYou offered faceted search over animal sightings. MP's expenses had faceted search across crowdsourced expense analysis. Lanyrd was faceted search for conferences.&lt;/p&gt;
&lt;p&gt;I implemented &lt;a href="https://simonwillison.net/search/"&gt;faceted search&lt;/a&gt; for this blog on top of PostgreSQL, and wrote about how I did it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2017/Nov/13/datasette/"&gt;Datasette: instantly create and publish an API for your SQLite databases&lt;/a&gt;&lt;/strong&gt; - 13th November 2017&lt;/p&gt;
&lt;p&gt;I shipped the first release of &lt;a href="https://github.com/simonw/datasette"&gt;simonw/datasette&lt;/a&gt; in Nevember 2017. Nearly five years later it's now my number-one focus, and I don't see myself losing interest in it for many decades to come.&lt;/p&gt;
&lt;p&gt;Datasette was inspired by the Guardian Datablog, combined with my realization that Zeit Now (today called &lt;a href="https://vercel.com/"&gt;Vercel&lt;/a&gt;) meant you could bundle data up in a SQLite database and deploy it as part of an exploratory application almost for free.&lt;/p&gt;
&lt;p&gt;My blog has &lt;a href="https://simonwillison.net/search/?q=&amp;amp;tag=datasette"&gt;284 items tagged datasette&lt;/a&gt; at this point.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2018/May/20/datasette-facets/"&gt;Datasette Facets&lt;/a&gt;&lt;/strong&gt; - 20th May 2018&lt;/p&gt;
&lt;p&gt;Given how much I love faceted search, it's surprising it took me until May 2018 to realize that I could bake them into Datasette itself - turning it into a tool for building faceted search engines against any data. It turns out to be my ideal solution to my favourite problem!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;&lt;/strong&gt; - 28th July 2018&lt;/p&gt;
&lt;p&gt;I figured out a pattern for using unit tests to ensure that features of my projects were covered by the documentation. Four years later I can confirm that this technique works &lt;em&gt;really well&lt;/em&gt; - though I wish I'd called it Test-driven documentation instead!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2018/Sep/18/letterboxing-lundy/"&gt;Letterboxing on Lundy&lt;/a&gt;&lt;/strong&gt; - 18th September 2018&lt;/p&gt;
&lt;p&gt;A brief foray into travel writing: Natalie and I spent a few days staying in a small castle on the delightful island of Lundy off the coast of North Devon, and I used it as an opportunity to enthuse about letterboxing and the Landmark Trust.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2018/lundy/IMG_5999.JPG" alt="A small, battered looking castle on a beautiful, remote looking moor" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2019/Feb/25/sqlite-utils/"&gt;sqlite-utils: a Python library and CLI tool for building SQLite databases&lt;/a&gt;&lt;/strong&gt; - 25th February 2019&lt;/p&gt;
&lt;p&gt;Datasette helps you explore and publish data stored in SQLite, but how do you get data into SQLite in the first place?&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt; is my answer to that question - a combined CLI tool and Python library with all sorts of utilites for working with and creating SQLite databases.&lt;/p&gt;
&lt;p&gt;It recently had its &lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;100th release&lt;/a&gt;!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2019/Mar/7/oil-painting/"&gt;I commissioned an oil painting of Barbra Streisand’s cloned dogs&lt;/a&gt;&lt;/strong&gt; - 7th March 2019&lt;/p&gt;
&lt;p&gt;Not much I can add that's not covered by the title. It's a really good painting!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2019/oil-painting-framed.jpg" alt="A framed oil painting showing two small fluffy white dogs in a stroller, gazing at the tombstone of the dog from which they were cloned." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2019/Sep/10/jsk-fellowship/"&gt;My JSK Fellowship: Building an open source ecosystem of tools for data journalism&lt;/a&gt;&lt;/strong&gt; - 10th September 2019&lt;/p&gt;
&lt;p&gt;In late 2019 I left Eventbrite to join the JSK fellowship program at Stanford. It was an opportunity to devote myself full-time to working on my growing collection of open source tools for data journalism, centered around Datasette.&lt;/p&gt;
&lt;p&gt;I jumped on that opportunity with both hands, and I've been mostly working full-time on Datasette and &lt;a href="https://github.com/simonw/simonw/blob/main/releases.md"&gt;associated projects&lt;/a&gt; (without being paid for it since the fellowship ended) ever since.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2019/Sep/13/weeknotestwitter-sqlite-datasette-rure/"&gt;Weeknotes: ONA19, twitter-to-sqlite, datasette-rure&lt;/a&gt;&lt;/strong&gt; - 13th September 2019&lt;/p&gt;
&lt;p&gt;At the start of my fellowship I decided to publish weeknotes, to keep myself accountable for what I was working on now that I didn't have the structure of a full-time job.&lt;/p&gt;
&lt;p&gt;I've managed to post them roughly once a week ever since - &lt;a href="https://simonwillison.net/tags/weeknotes/"&gt;128 posts and counting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I absolutely love weeknotes as a format. Even if no-one else ever reads them, I find them really useful as a way to keep track of my progress and ensure that I have motivation to get projects to a point where I can write about them at the end of the week!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2020/Apr/20/self-rewriting-readme/"&gt;Using a self-rewriting README powered by GitHub Actions to track TILs&lt;/a&gt;&lt;/strong&gt; - 20th April 2020&lt;/p&gt;
&lt;p&gt;In April 2020 I started publishing TILs - Today I Learneds - at &lt;a href="https://til.simonwillison.net/"&gt;til.simonwillison.net&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The idea behind TILs is to dramatically reduce the friction involved in writing a blog post. If I learned something that was useful to me, I'll write it up as a TIL. These often take less than ten minutes to throw together and I find myself referring back to them all the time.&lt;/p&gt;
&lt;p&gt;My main blog is a Django application, but my TILs run entirely using Datasette. You can see how that all works in the &lt;a href="https://github.com/simonw/til"&gt;simonw/til&lt;/a&gt; GitHub repository.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2020/May/21/dogsheep-photos/"&gt;Using SQL to find my best photo of a pelican according to Apple Photos&lt;/a&gt;&lt;/strong&gt; - 21st May 2020&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dogsheep.github.io/"&gt;Dogsheep&lt;/a&gt; is my ongoing side project in which I explore ways to analyze my own personal data using SQLite and Datasette.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/tools/dogsheep-photos"&gt;dogsheep-photos&lt;/a&gt; is my tool for extracting metadata about my photos from the undocumented Apple Photos SQLite database (building on &lt;a href="https://github.com/RhetTbull/osxphotos"&gt;osxphotos&lt;/a&gt; by Rhet Turnbull). I had been wanting to solve the photo problem for years and was delighted when osxphotos provided the capability I had been missing. And I really like pelicans, so I celebrated by using my photos of them for the demo.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://photos.simonwillison.net/i/a444857c4ac71ceae6af5192c8acc5ac35934ed589259136df0ed11295dbb085.jpeg?w=800" alt="A glorious pelican, wings oustretched" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraping: track changes over time by scraping to a Git repository&lt;/a&gt;&lt;/strong&gt; - 9th October 2020&lt;/p&gt;
&lt;p&gt;If you really want people to engage with a technique, it's helpful to give it a name. I defined Git scraping in this post, and I've been promoting it heavily ever since.&lt;/p&gt;
&lt;p&gt;There are now &lt;a href="https://github.com/topics/git-scraping?o=desc&amp;amp;s=updated"&gt;275 public repositories&lt;/a&gt; on GitHub with the &lt;code&gt;git-scraping&lt;/code&gt; topic, and if you &lt;a href="https://github.com/topics/git-scraping?o=desc&amp;amp;s=updated"&gt;sort them by recently updated&lt;/a&gt; you can see the scrapers on there that most recently captured some new data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2020/Nov/14/personal-data-warehouses/"&gt;Personal Data Warehouses: Reclaiming Your Data&lt;/a&gt;&lt;/strong&gt; - 14th November 2020&lt;/p&gt;
&lt;p&gt;I gave this talk for GitHub's OCTO (previously Office of the CTO, since rebranded to GitHub Next) speaker series.&lt;/p&gt;
&lt;p&gt;It's the Dogsheep talk, with a better title (thanks, &lt;a href="https://twitter.com/idangazit"&gt;Idan&lt;/a&gt;!) It includes a full video demo of my personal Dogsheep instance, including my dog's Foursquare checkins, my Twitter data, Apple Watch GPS trails and more.&lt;/p&gt;
&lt;p&gt;I also explain why I called it Dogsheep: it's a devastatingly terrible pun &lt;a href="https://simonwillison.net/2019/Feb/22/seeking-productive-life-some-details-my-personal-infrastructure/"&gt;on Wolfram&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm frustrated when information like this is only available in video format, so when I give particularly information-dense talks I like to turn them into full write-ups as well, providing extra notes and resources alongside screen captures from the talk.&lt;/p&gt;
&lt;p&gt;For this one &lt;a href="https://github.com/simonw/simonwillisonblog/commit/c4ecae7a17e6bbe8aee60a23d9f45ee874cadfbc"&gt;I added a custom template mechanism&lt;/a&gt; to my blog, to allow me to break out of my usual entry page design.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2021/Feb/28/vaccinateca/"&gt;Trying to end the pandemic a little earlier with VaccinateCA&lt;/a&gt;&lt;/strong&gt; - 28th February 2021&lt;/p&gt;
&lt;p&gt;In February 2021 I joined the VaccinateCA effort to try and help end the pandemic a little bit earlier by crowdsourcing information about the best places to get vaccinated. It was a classic match-up for my skills and interests: a huge crowdsourcing effort that needed to be spun up as a fresh Django application as quickly as possible.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://django-sql-dashboard.datasette.io/"&gt;Django SQL Dashboard&lt;/a&gt; was one project that spun directly out of that effort.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2021/Jul/28/baked-data/"&gt;The Baked Data architectural pattern&lt;/a&gt;&lt;/strong&gt; - 28th July 2021&lt;/p&gt;
&lt;p&gt;My second attempt at coining a new term, after Git scraping: Baked Data is the name I'm using for the architectural pattern embodied by Datasette where you bundle a read-only copy of your data alongside the code for your application, as part of the same deployment. I think it's a really good idea, and more people should be doing it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2022/Jan/12/how-i-build-a-feature/"&gt;How I build a feature&lt;/a&gt;&lt;/strong&gt; - 12th January 2022&lt;/p&gt;
&lt;p&gt;Over the years I’ve evolved a processes for feature development that works really well for me, and scales down to small personal projects as well as scaling up to much larger pieces of work. I described that in detail in this post.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Picking out these highlights wasn't easy. I ended up setting myself a time limit (to ensure I could put this post live within a minute of midnight UTC time on my blog's 20th birthday) so there's plenty more that I would have liked to dig up.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://simonwillison.net/tags/"&gt;tags index page&lt;/a&gt; includes a 2010s-style word cloud that you can visit if you want to explore the rest of my content. Or use the faceted search!&lt;/p&gt;
&lt;p&gt;A few more project release highlights:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2020/Aug/7/datasette-graphql/"&gt;GraphQL in Datasette with the new datasette-graphql plugin&lt;/a&gt; - 7th August 2020&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2021/Dec/7/git-history/"&gt;git-history: a tool for analyzing scraped data collected using Git and SQLite&lt;/a&gt; - 7th December 2021&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;shot-scraper: automated screenshots for documentation, built on Playwright&lt;/a&gt; - 10th March 2022&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2021/May/10/django-sql-dashboard/"&gt;Django SQL Dashboard&lt;/a&gt; - 10th May 2021&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2021/Sep/8/datasette-desktop/"&gt;Datasette Desktop—a macOS desktop application for Datasette&lt;/a&gt; - 8th September 2021&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/May/4/datasette-lite/"&gt;Datasette Lite: a server-side Python web application running in a browser&lt;/a&gt; - 4th May 2022&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Evolution over time&lt;/h4&gt;
&lt;p&gt;I started my blog in my first year of as a student studying computer science at the University of Bath.&lt;/p&gt;
&lt;p&gt;You can tell that Twitter wasn't a thing yet, because I wrote &lt;a href="https://simonwillison.net/2002/Jun/"&gt;107 posts in that first month&lt;/a&gt;. Lots of links to other people's blog posts (we did a lot of that back then) with extra commentary. Lots of blogging about blogging.&lt;/p&gt;
&lt;p&gt;That first version of the site was hosted at &lt;code&gt;http://www.bath.ac.uk/~cs1spw/blog/&lt;/code&gt; - on my university's student hosting. Sadly the Internet Archive doesn't have a capture of it there, since I moved it to &lt;code&gt;http://simon.incutio.com/&lt;/code&gt; (my part-time employer at the time) in September 2002. Here's &lt;a href="https://simonwillison.net/2002/Sep/9/newHosting/"&gt;my note&lt;/a&gt; from then about rewriting it to use MySQL instead of flat file storage.&lt;/p&gt;
&lt;p&gt;This is the earliest capture I could find &lt;a href="https://web.archive.org/web/20030610004652/http://simon.incutio.com/"&gt;on the Internet Archive&lt;/a&gt;, from June 2003:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/simonwillison-blog-20030610004652.jpg" alt="My blog in June 2003. The header and highlight colours were orange, the rest was black on white text. The tagline reads: PHP, PYthon, CSS, XML and general web development. The sidebar includes a &amp;quot;Blogs I read&amp;quot; section with notes as to when each one was last updated. My top post that day talks about Using boomarklets to experiment with CSS." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Full entry on &lt;a href="https://simonwillison.net/2003/Jun/3/bookmarkletsAndCSS/"&gt;Using bookmarklets to experiment with CSS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By &lt;a href="https://web.archive.org/web/20061113072435/http://simon.incutio.com/"&gt;November 2006&lt;/a&gt; I had redesigned from orange to green, and started writing Blogmarks - the name I used for small, bookmark-style link posts. I've collected &lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;6,304 of them&lt;/a&gt; over the years!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/simonwillison-blog-20061113072435.jpg" alt="My blog in June 2003. The header and highlight colours were orange, the rest was black on white text. The tagline reads: PHP, PYthon, CSS, XML and general web development. The sidebar includes a &amp;quot;Blogs I read&amp;quot; section with notes as to when each one was last updated. My top post that day talks about Using boomarklets to experiment with CSS." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://web.archive.org/web/20100726061941/http://simonwillison.net/"&gt;By 2010&lt;/a&gt; I'd reached more-or-less my current purple on white design, albeit with the ability to sign in with OpenID to post a comment. I dropped comments entirely when I relaunched in 2017 - constantly fighting against spam comments makes blogging much less fun.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/simonwillison-20100726061941.jpg" alt="My blog in July 2010. It's the same visual design as today, but with an option to sign in with OpenID and a little bubble next to each item showing the number of comments." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The source code for the current iteration of my blog is available &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="internet-archive-screenshots"&gt;Taking screenshots of the Internet Archive with shot-scraper&lt;/h4&gt;
&lt;p&gt;Here's how I generated the screenshots in this post, using &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; against the Internet Archive but with a line of JavaScript to hide the banner the display at the top of every archived page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://web.archive.org/web/20030610004652/http://simon.incutio.com/' \
  --javascript 'document.querySelector("#wm-ipp-base").style.display="none"' \
   --width 800 --height 600 --retina
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;mgdlbp on Hacker News &lt;a href="https://news.ycombinator.com/item?id=31729477"&gt;pointed out&lt;/a&gt; that you can instead add &lt;code&gt;if_&lt;/code&gt; to the date part of the archive URLs to hide the banner, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://web.archive.org/web/20030610004652if_/http://simon.incutio.com/' \
   --width 800 --height 600 --retina
&lt;/code&gt;&lt;/pre&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/adrian-holovaty"&gt;adrian-holovaty&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/personal-news"&gt;personal-news&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="adrian-holovaty"/><category term="blogging"/><category term="personal-news"/></entry><entry><title>One year of TILs</title><link href="https://simonwillison.net/2021/May/2/one-year-of-tils/#atom-series" rel="alternate"/><published>2021-05-02T18:01:44+00:00</published><updated>2021-05-02T18:01:44+00:00</updated><id>https://simonwillison.net/2021/May/2/one-year-of-tils/#atom-series</id><summary type="html">
    &lt;p&gt;Just over &lt;a href="https://simonwillison.net/2020/Apr/20/self-rewriting-readme/"&gt;a year ago&lt;/a&gt; I started tracking TILs, inspired by &lt;a href="https://github.com/jbranchaud/til"&gt;Josh Branchaud's collection&lt;/a&gt;. I've since published &lt;a href="https://til.simonwillison.net/"&gt;148 TILs&lt;/a&gt; across 43 different topics. It's a great format!&lt;/p&gt;
&lt;p&gt;TIL stands for Today I Learned. The thing I like most about TILs is that they drop the barrier to publishing something online to almost nothing.&lt;/p&gt;
&lt;p&gt;If I'm writing a blog entry, I feel like it needs to say something new. This pressure for originality leads to vast numbers of incomplete, draft posts and a sporadic publishing schedule that trends towards not publishing anything at all.&lt;/p&gt;
&lt;p&gt;(Establishing a &lt;a href="https://simonwillison.net/tags/weeknotes/"&gt;weeknotes habit&lt;/a&gt; has helped enormously here too.)&lt;/p&gt;
&lt;p&gt;The bar for a TIL is literally "did I just learn something?" - they effectively act as a public notebook.&lt;/p&gt;
&lt;p&gt;They also reflect my values as a software engineer. The thing I love most about this career is that the opportunities to learn new things never reduce - there will always be new sub-disciplines to explore, and I aspire to learn something new every single working day.&lt;/p&gt;
&lt;p&gt;My hope is that by publishing a constant stream of TILs I can reinforce the idea that even if you've been working in this industry for twenty years there will always be new things to learn, and learning any new trick - even the most basic thing - should be celebrated.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/til"&gt;til&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="til"/></entry><entry><title>Implementing faceted search with Django and PostgreSQL</title><link href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/#atom-series" rel="alternate"/><published>2017-10-05T14:12:27+00:00</published><updated>2017-10-05T14:12:27+00:00</updated><id>https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/#atom-series</id><summary type="html">
    &lt;p&gt;I’ve added &lt;a href="https://simonwillison.net/search/"&gt;a faceted search engine&lt;/a&gt; to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL"like" queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=postgresql"&gt;All content matching “postgresql”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=django&amp;amp;type=quotation"&gt;Just quotations matching “django”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=python+javascript&amp;amp;tag=mozilla&amp;amp;year=2007"&gt;All content matching “python” and “javascript” with the tag “mozilla” posted in 2007&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also provides facet counts, so you can tell how many results you will get back before you apply one of these filters - and get a general feeling for the shape of the corpus as you navigate it.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2017/faceted-search.png" alt="Screenshot of my faceted search interface" style="width: 100%"/&gt;&lt;/p&gt;

&lt;p&gt;I love this kind of search interface, because the counts tell you so much more about the underlying data. Turns out I was &lt;a href="https://simonwillison.net/search/?q=javascript&amp;amp;type=quotation"&gt;most active in quoting people talking about JavaScript back in 2007&lt;/a&gt;, for example.&lt;/p&gt;
&lt;p&gt;I usually build faceted search engines using either &lt;a href="https://simonwillison.net/tags/solr/"&gt;Solr&lt;/a&gt; or &lt;a href="https://simonwillison.net/tags/solr/"&gt;Elasticsearch&lt;/a&gt; (though the first version of search on this blog was actually powered by &lt;a href="http://fallabs.com/hyperestraier/intro-en.html"&gt;Hyper Estraier&lt;/a&gt;) - but I’m hosting this blog as simply and inexpensively as possible on Heroku and I don’t want to shell out for a SaaS search solution or run an Elasticsearch instance somewhere myself. I thought I’d have to go back to using &lt;a href="https://developers.google.com/custom-search/"&gt;Google Custom Search&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I read &lt;a href="http://rachbelaid.com/postgres-full-text-search-is-good-enough/"&gt;Postgres full-text search is Good Enough!&lt;/a&gt; by Rachid Belaid - closely followed by &lt;a href="http://blog.lotech.org/postgres-full-text-search-with-django.html"&gt;Postgres Full-Text Search With Django&lt;/a&gt; by Nathan Shafer - and I decided to have a play with the new PostgreSQL search functionality that was &lt;a href="https://docs.djangoproject.com/en/1.11/releases/1.10/#full-text-search-for-postgresql"&gt;introduced in Django 1.10&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;… and wow! Full-text search is yet another example of a feature that’s been in PostgreSQL for &lt;a href="https://www.postgresql.org/docs/8.3/static/release-8-3.html"&gt;nearly a decade now&lt;/a&gt;, incrementally improving with every release to the point where it’s now  really, &lt;em&gt;really&lt;/em&gt; good.&lt;/p&gt;
&lt;p&gt;At its most basic level a search system needs to handle four things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It needs to take user input and find matching documents.&lt;/li&gt;
&lt;li&gt;It needs to understand and ignore stopwords (common words like “the” and “and”) and apply stemming - knowing that “ridicule” and “ridiculous” should be treated as the same root, for example. Both of these features need to be language-aware.&lt;/li&gt;
&lt;li&gt;It needs to be able to apply relevance ranking, calculating which documents are the best match for a search query.&lt;/li&gt;
&lt;li&gt;It needs to be &lt;em&gt;fast&lt;/em&gt; - working against some kind of index rather than scanning every available document in full.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Modern PostgreSQL &lt;a href="https://www.postgresql.org/docs/9.5/static/textsearch.html"&gt;ticks all of those boxes&lt;/a&gt;. Let’s put it to work.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Simple_search_without_an_index_29"&gt;&lt;/a&gt;Simple search without an index&lt;/h3&gt;
&lt;p&gt;Here’s how to execute a full-text search query against a simple text column:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; blog.models &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; Entry
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.search &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; SearchVector

results = Entry.objects.annotate(
    searchable=SearchVector(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;)
).filter(searchable=&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The generated SQL looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="hljs-operator"&gt;&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;, ...,
to_tsvector(&lt;span class="hljs-keyword"&gt;COALESCE&lt;/span&gt;(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"body"&lt;/span&gt;, %s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"searchable"&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; to_tsvector(&lt;span class="hljs-keyword"&gt;COALESCE&lt;/span&gt;(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"body"&lt;/span&gt;, &lt;span class="hljs-string"&gt;"django"&lt;/span&gt;))
    @@ (plainto_tsquery(&lt;span class="hljs-string"&gt;"django"&lt;/span&gt;)) = &lt;span class="hljs-literal"&gt;true&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;SearchVector&lt;/code&gt; class constructs a stemmed, stopword-removed representation of the &lt;code&gt;body&lt;/code&gt; column ready to be searched. The resulting queryset contains entries that are a match for “django”.&lt;/p&gt;
&lt;p&gt;My blog entries are stored as HTML, but I don’t want search to include those HTML tags. One (extremely un-performant) solution is to use Django’s &lt;code&gt;Func&lt;/code&gt; helper to apply a regular expression inside PostgreSQL to strip tags before they are considered for search:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.db.models &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; Value, F, Func

results = Entry.objects.annotate(
    searchable=SearchVector(
        Func(
            F(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'&amp;lt;.*?&amp;gt;'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;''&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'g'&lt;/span&gt;),
            function=&lt;span class="hljs-string"&gt;'regexp_replace'&lt;/span&gt;
        )
    )
).filter(searchable=&lt;span class="hljs-string"&gt;'http'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 6th October 8:23pm UTC&lt;/strong&gt; - it turns out this step is entirely unnecessary. &lt;a href="https://github.com/simonw/simonwillisonblog/issues/1#issuecomment-334770577"&gt;Paolo Melchiorre points out&lt;/a&gt; that the PostgreSQL ts_vector() function already handles tag removal. Sure enough, executing &lt;samp&gt;SELECT to_tsvector('&amp;lt;div&amp;gt;Hey look what happens to &amp;lt;blockquote&amp;gt;this tag&amp;lt;/blockquote&amp;gt;&amp;lt;/div&amp;gt;')&lt;/samp&gt; &lt;a href="http://sqlfiddle.com/#!17/9eecb/4552"&gt;using SQL Fiddle&lt;/a&gt; returns &lt;samp&gt;'happen':4 'hey':1 'look':2 'tag':7&lt;/samp&gt;, with the tags already stripped.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This works, but performance isn’t great. PostgreSQL ends up having to scan every row and construct a list of search vectors for each one every time you execute a query.&lt;/p&gt;
&lt;p&gt;If you want it to go fast, you need to add a special search vector column to your table and then create the appropriate index on it. As of Django 1.11 this is trivial:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.search &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; SearchVectorField
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.indexes &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; GinIndex

&lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Entry&lt;/span&gt;&lt;span class="hljs-params"&gt;(models.Model)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-comment"&gt;# ...&lt;/span&gt;
    search_document = SearchVectorField(null=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)

    &lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Meta&lt;/span&gt;:&lt;/span&gt;
        indexes = [
            GinIndex(fields=[&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;])
        ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Django’s migration system will automatically add both the field and the special &lt;a href="https://www.postgresql.org/docs/9.5/static/gin-intro.html"&gt;GIN index&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What’s trickier is populating that &lt;code&gt;search_document&lt;/code&gt; field. Django does not yet support a easy method to populate it directly in your initial INSERT call, instead recommending that you populated with a SQL UPDATE statement after the fact. Here is a one-liner that will populate the field for everything in that table (and strip tags at the same time):&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;strip_tags_func&lt;/span&gt;&lt;span class="hljs-params"&gt;(field)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; Func(
        F(field), Value(&lt;span class="hljs-string"&gt;'&amp;lt;.*?&amp;gt;'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;''&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'g'&lt;/span&gt;),
        function=&lt;span class="hljs-string"&gt;'regexp_replace'&lt;/span&gt;
    )
 
Entry.objects.update(
    search_document=(
        SearchVector(&lt;span class="hljs-string"&gt;'title'&lt;/span&gt;, weight=&lt;span class="hljs-string"&gt;'A'&lt;/span&gt;) +
        SearchVector(strip_tags_func(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;), weight=&lt;span class="hljs-string"&gt;'C'&lt;/span&gt;)
    )
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m using a neat feature of the &lt;code&gt;SearchVector&lt;/code&gt; class here: it can be concatenated together using the &lt;code&gt;+&lt;/code&gt; operator, and each component can be assigned a weight of &lt;code&gt;A&lt;/code&gt;, &lt;code&gt;B&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt; or &lt;code&gt;D&lt;/code&gt;. These weights affect ranking calculations later on.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Updates_using_signals"&gt;&lt;/a&gt;Updates using signals&lt;/h3&gt;
&lt;p&gt;We could just set this up to run periodically (as I did in my &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7e3a02178e3ca71c464ae68a3b68d70e5fa66692#diff-1cbcc518bc02f9495bba963e698143e0"&gt;initial implementation&lt;/a&gt;), but we can get better real-time results by ensuring this field gets updated automatically when the rest of the model is modified. Some people solve this with PostgreSQL triggers, but I’m still more comfortable handling this kind of thing in python code - so I opted to use Django’s &lt;a href="https://docs.djangoproject.com/en/1.11/topics/signals/"&gt;signals mechanism&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;Since I need to run search queries across three different types of blog content - Entries, Blogmarks and Quotations - I added a method to each model that returns the text fragments corresponding to each of the weight values. Here’s that method for my Quotation model:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Quotation&lt;/span&gt;&lt;span class="hljs-params"&gt;(models.Model)&lt;/span&gt;:&lt;/span&gt;
    quotation = models.TextField()
    source = models.CharField(max_length=&lt;span class="hljs-number"&gt;255&lt;/span&gt;)
    tags = models.ManyToManyField(Tag, blank=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)

    &lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;index_components&lt;/span&gt;&lt;span class="hljs-params"&gt;(self)&lt;/span&gt;:&lt;/span&gt;
        &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; {
            &lt;span class="hljs-string"&gt;'A'&lt;/span&gt;: self.quotation,
            &lt;span class="hljs-string"&gt;'B'&lt;/span&gt;: &lt;span class="hljs-string"&gt;' '&lt;/span&gt;.join(self.tags.values_list(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;, flat=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)),
            &lt;span class="hljs-string"&gt;'C'&lt;/span&gt;: self.source,
        }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, I’m including the tags that have been assigned to the quotation in the searchable document.&lt;/p&gt;
&lt;p&gt;Here are my signals - loaded once via an import statement in my blog application’s &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/apps.py"&gt;&lt;code&gt;AppConfig.ready()&lt;/code&gt; method&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-decorator"&gt;@receiver(post_save)&lt;/span&gt;
&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_save&lt;/span&gt;&lt;span class="hljs-params"&gt;(sender, **kwargs)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; &lt;span class="hljs-keyword"&gt;not&lt;/span&gt; issubclass(sender, BaseModel):
        &lt;span class="hljs-keyword"&gt;return&lt;/span&gt;
    transaction.on_commit(make_updater(kwargs[&lt;span class="hljs-string"&gt;'instance'&lt;/span&gt;]))

&lt;span class="hljs-decorator"&gt;@receiver(m2m_changed)&lt;/span&gt;
&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_m2m_changed&lt;/span&gt;&lt;span class="hljs-params"&gt;(sender, **kwargs)&lt;/span&gt;:&lt;/span&gt;
    instance = kwargs[&lt;span class="hljs-string"&gt;'instance'&lt;/span&gt;]
    model = kwargs[&lt;span class="hljs-string"&gt;'model'&lt;/span&gt;]
    &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; model &lt;span class="hljs-keyword"&gt;is&lt;/span&gt; Tag:
        transaction.on_commit(make_updater(instance))
    &lt;span class="hljs-keyword"&gt;elif&lt;/span&gt; isinstance(instance, Tag):
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; obj &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; model.objects.filter(pk__in=kwargs[&lt;span class="hljs-string"&gt;'pk_set'&lt;/span&gt;]):
            transaction.on_commit(make_updater(obj))

&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;make_updater&lt;/span&gt;&lt;span class="hljs-params"&gt;(instance)&lt;/span&gt;:&lt;/span&gt;
    components = instance.index_components()
    pk = instance.pk

    &lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_commit&lt;/span&gt;&lt;span class="hljs-params"&gt;()&lt;/span&gt;:&lt;/span&gt;
        search_vectors = []
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; weight, text &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; components.items():
            search_vectors.append(
                SearchVector(Value(text, output_field=models.TextField()), weight=weight)
            )
        instance.__class__.objects.filter(pk=pk).update(
            search_document=reduce(operator.add, search_vectors)
        )
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; on_commit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The full code can be &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/signals.py"&gt;found here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;on_save&lt;/code&gt; method is pretty straightforward - it checks if the model that was just saved has my &lt;code&gt;BaseModel&lt;/code&gt; as a base class, then it calls &lt;code&gt;make_updater&lt;/code&gt; to get a function to be executed by the &lt;code&gt;transaction.on_commit&lt;/code&gt; hook.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;on_m2m_changed&lt;/code&gt; handler is &lt;a href="https://docs.djangoproject.com/en/1.11/ref/signals/#m2m-changed"&gt;significantly more complicated&lt;/a&gt;. There are a number of scenarios in which this will be called - I’m reasonably confident that the idiom I use here will capture all of the modifications that should trigger a re-indexing operation.&lt;/p&gt;
&lt;p&gt;Running a search now looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;results = Entry.objects.filter(
    search_document=SearchQuery(&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We need one more thing though: we need to sort our search results by relevance. PostgreSQL has pretty good relevance built in, and sorting by the relevance score can be done by  applying a Django ORM annotation:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;query = SearchQuery(&lt;span class="hljs-string"&gt;'ibm'&lt;/span&gt;)

results = Entry.objects.filter(
    search_document=query
).annotate(
    rank=SearchRank(F(&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;), query)
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have basic full text search implemented against a single Django model, making use of a GIN index. This is lightning fast.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Searching_multiple_tables_using_querysetunion_192"&gt;&lt;/a&gt;Searching multiple tables using queryset.union()&lt;/h3&gt;
&lt;p&gt;My site has three types of content, represented in three different models and hence three different underlying database tables.&lt;/p&gt;
&lt;p&gt;I’m using &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/models.py#L78-L107"&gt;an abstract base model&lt;/a&gt; to define common fields shared by all three: the created date, the slug (used to construct permalink urls) and the &lt;code&gt;search_document&lt;/code&gt; field populated above.&lt;/p&gt;
&lt;p&gt;As of Django 1.11 It’s possible to combine queries across different tables &lt;a href="https://docs.djangoproject.com/en/1.11/releases/1.11/#models"&gt;using the SQL union operator&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s  what that looks like for running a search across three tables, all with the same &lt;code&gt;search_document&lt;/code&gt; search vector field. I need to use &lt;code&gt;.values()&lt;/code&gt; to restrict the querysets I am unioning to the same subset of fields:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;query = SearchQuery(&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
rank_annotation = SearchRank(F(&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;), query)
qs = Blogmark.objects.annotate(
    rank=rank_annotation,
).filter(
    search_document=query
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).union(
    Entry.objects.annotate(
        rank=rank_annotation,
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
    Quotation.objects.annotate(
        rank=rank_annotation,
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)[:&lt;span class="hljs-number"&gt;5&lt;/span&gt;]

&lt;span class="hljs-comment"&gt;# Output&lt;/span&gt;
&amp;lt;QuerySet [
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;186&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.875179&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2008&lt;/span&gt;, &lt;span class="hljs-number"&gt;4&lt;/span&gt;, &lt;span class="hljs-number"&gt;8&lt;/span&gt;, &lt;span class="hljs-number"&gt;13&lt;/span&gt;, &lt;span class="hljs-number"&gt;48&lt;/span&gt;, &lt;span class="hljs-number"&gt;18&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;134&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.842655&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2007&lt;/span&gt;, &lt;span class="hljs-number"&gt;10&lt;/span&gt;, &lt;span class="hljs-number"&gt;20&lt;/span&gt;, &lt;span class="hljs-number"&gt;13&lt;/span&gt;, &lt;span class="hljs-number"&gt;46&lt;/span&gt;, &lt;span class="hljs-number"&gt;56&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;1591&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.804502&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2009&lt;/span&gt;, &lt;span class="hljs-number"&gt;9&lt;/span&gt;, &lt;span class="hljs-number"&gt;28&lt;/span&gt;, &lt;span class="hljs-number"&gt;23&lt;/span&gt;, &lt;span class="hljs-number"&gt;32&lt;/span&gt;, &lt;span class="hljs-number"&gt;4&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;5093&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.788616&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2010&lt;/span&gt;, &lt;span class="hljs-number"&gt;2&lt;/span&gt;, &lt;span class="hljs-number"&gt;26&lt;/span&gt;, &lt;span class="hljs-number"&gt;19&lt;/span&gt;, &lt;span class="hljs-number"&gt;22&lt;/span&gt;, &lt;span class="hljs-number"&gt;47&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;2598&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.786928&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2007&lt;/span&gt;, &lt;span class="hljs-number"&gt;1&lt;/span&gt;, &lt;span class="hljs-number"&gt;26&lt;/span&gt;, &lt;span class="hljs-number"&gt;12&lt;/span&gt;, &lt;span class="hljs-number"&gt;38&lt;/span&gt;, &lt;span class="hljs-number"&gt;46&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)}
]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is not enough information though - I have the primary keys, but I don’t know which type of model they belong to. In order to retrieve the actual resulting objects from the database I need to know which type of content is represented by each of those results.&lt;/p&gt;
&lt;p&gt;I can achieve that using another annotation:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;qs = Blogmark.objects.annotate(
    rank=rank_annotation,
    type=models.Value(&lt;span class="hljs-string"&gt;'blogmark'&lt;/span&gt;, output_field=models.CharField())
).filter(
    search_document=query
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).union(
    Entry.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
    Quotation.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)[:&lt;span class="hljs-number"&gt;5&lt;/span&gt;]

&lt;span class="hljs-comment"&gt;# Output:&lt;/span&gt;
&amp;lt;QuerySet [
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;186&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'quotation'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.875179&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;134&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'quotation'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.842655&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;1591&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'entry'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.804502&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;5093&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'blogmark'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.788616&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;2598&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'blogmark'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.786928&lt;/span&gt;}
]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I just need to write function which can take a list of types and primary keys and return the full objects from the database:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;load_mixed_objects&lt;/span&gt;&lt;span class="hljs-params"&gt;(dicts)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-string"&gt;"""
    Takes a list of dictionaries, each of which must at least have a 'type'
    and a 'pk' key. Returns a list of ORM objects of those various types.
    Each returned ORM object has a .original_dict attribute populated.
    """&lt;/span&gt;
    to_fetch = {}
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; d &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; dicts:
        to_fetch.setdefault(d[&lt;span class="hljs-string"&gt;'type'&lt;/span&gt;], set()).add(d[&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;])
    fetched = {}
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; key, model &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; (
        (&lt;span class="hljs-string"&gt;'blogmark'&lt;/span&gt;, Blogmark),
        (&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;, Entry),
        (&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, Quotation),
    ):
        ids = to_fetch.get(key) &lt;span class="hljs-keyword"&gt;or&lt;/span&gt; []
        objects = model.objects.prefetch_related(&lt;span class="hljs-string"&gt;'tags'&lt;/span&gt;).filter(pk__in=ids)
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; obj &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; objects:
            fetched[(key, obj.pk)] = obj
    &lt;span class="hljs-comment"&gt;# Build list in same order as dicts argument&lt;/span&gt;
    to_return = []
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; d &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; dicts:
        item = fetched.get((d[&lt;span class="hljs-string"&gt;'type'&lt;/span&gt;], d[&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;])) &lt;span class="hljs-keyword"&gt;or&lt;/span&gt; &lt;span class="hljs-keyword"&gt;None&lt;/span&gt;
        &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; item:
            item.original_dict = d
        to_return.append(item)
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; to_return
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One last challenge: when I add filtering by type, I’m going to want to selectively union together only a subset of these querysets. I need a queryset to start unions against, but I don’t yet know which queryset I will be using. I can abuse Django’s &lt;code&gt;queryset.none()&lt;/code&gt; method to crate an empty &lt;code&gt;ValuesQuerySet&lt;/code&gt; in the correct shape like this&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;qs = Entry.objects.annotate(
    type=models.Value(&lt;span class="hljs-string"&gt;'empty'&lt;/span&gt;, output_field=models.CharField()),
    rank=rank_annotation
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).none()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can progressively build up my union in a loop like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; klass &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; (Entry, Blogmark, Quotation):
    qs = qs.union(klass.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Django ORM is smart enough to compile away the empty queryset when it constructs the SQL, which ends up looking something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;(((&lt;span class="hljs-operator"&gt;&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
            &lt;span class="hljs-string"&gt;"entry"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
            ts_rank(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;))
 &lt;span class="hljs-keyword"&gt;UNION&lt;/span&gt;
   (&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
           &lt;span class="hljs-string"&gt;"blogmark"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
           ts_rank(&lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;))
&lt;span class="hljs-keyword"&gt;UNION&lt;/span&gt;
  (&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
          &lt;span class="hljs-string"&gt;"quotation"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
          ts_rank(&lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Applying_filters_345"&gt;&lt;/a&gt;Applying filters&lt;/h3&gt;
&lt;p&gt;So far, our search engine can only handle user-entered query strings. If I am going to build a faceted search interface I need to be able to handle filtering as well. I want the ability to filter by year, tag and type.&lt;/p&gt;
&lt;p&gt;The key difference between filtering and querying (borrowing &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html"&gt;these definitions from Elasticsearch&lt;/a&gt;) is that querying is loose - it involves stemming and stopwords - while filtering is exact. Additionally, querying affects the calculated relevance score while filtering does not - a document either matches the filter or it doesn’t.&lt;/p&gt;
&lt;p&gt;Since PostgreSQL is a relational database, filtering can be handled by simply constructing extra SQL where clauses using the Django ORM.&lt;/p&gt;
&lt;p&gt;Each of the filters I need requires a slightly different approach. Filtering by type is easy - I just selectively include or exclude that model from my union queryset.&lt;/p&gt;
&lt;p&gt;Year and month work like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;selected_year = request.GET.get(&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;, &lt;span class="hljs-string"&gt;''&lt;/span&gt;)
selected_month = request.GET.get(&lt;span class="hljs-string"&gt;'month'&lt;/span&gt;, &lt;span class="hljs-string"&gt;''&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; selected_year:
    qs = qs.filter(created__year=int(selected_year))
&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; selected_month:
    qs = qs.filter(created__month=int(selected_month))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tags involve a join through a many-2-many relationship against the Tags table.  We want to be able to apply more than one tag, for example this search for &lt;a href="https://simonwillison.net/search/?tag=python&amp;amp;tag=javascript"&gt;all items tagged both python and javascript&lt;/a&gt;. Django’s ORM makes this easy:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;selected_tags = request.GET.getlist(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; tag &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; selected_tags:
    qs = qs.filter(tags__tag=tag)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Adding_facet_counts_374"&gt;&lt;/a&gt;Adding facet counts&lt;/h3&gt;
&lt;p&gt;There is just one more ingredient needed to complete our faceted search: facet counts!&lt;/p&gt;
&lt;p&gt;Again, the way we calculate these is different for each of our filters. For types, we need to call &lt;code&gt;.count()&lt;/code&gt; on a separate queryset for each of the types we are searching:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;queryset = make_queryset(Entry, &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;)
type_counts[&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;] = queryset.count()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(the make_queryset function is &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/views.py#L408-L423"&gt;defined here&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;For years we can do this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.db.models.functions &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; TruncYear

&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; row &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; queryset.order_by().annotate(
    year=TruncYear(&lt;span class="hljs-string"&gt;'created'&lt;/span&gt;)
).values(&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;).annotate(n=models.Count(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;)):
    year_counts[row[&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;]] = year_counts.get(
        row[&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;], &lt;span class="hljs-number"&gt;0&lt;/span&gt;
    ) + row[&lt;span class="hljs-string"&gt;'n'&lt;/span&gt;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tags are trickiest. Let’s take advantage of he fact that Django’s ORM knows how to construct sub-selects if you pass another queryset to the &lt;code&gt;__in&lt;/code&gt;  operator.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;tag_counts = {}
type_name = &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;
queryset = make_queryset(Entry, &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; tag, count &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; Tag.objects.filter(**{
    &lt;span class="hljs-string"&gt;'%s__in'&lt;/span&gt; % type_name: queryset
}).annotate(
    n=models.Count(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;)
).values_list(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'n'&lt;/span&gt;):
    tag_counts[tag] = tag_counts.get(tag, &lt;span class="hljs-number"&gt;0&lt;/span&gt;) + count
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Rendering_it_all_in_a_template_414"&gt;&lt;/a&gt;Rendering it all in a template&lt;/h3&gt;
&lt;p&gt;Having constructed the various facets counts in the view function, the template is really simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;{% if type_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Types&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in type_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;type"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.type&lt;/span&gt; %}"&amp;gt;&lt;/span&gt;{{ t.type }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
{% if year_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Years&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in year_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;year"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.year&lt;/span&gt;|&lt;span class="hljs-attribute"&gt;date:&lt;/span&gt;"&lt;span class="hljs-attribute"&gt;Y&lt;/span&gt;" %}"&amp;gt;&lt;/span&gt;{{ t.year|date:"Y" }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
{% if tag_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Tags&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in tag_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;tag"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.tag&lt;/span&gt; %}"&amp;gt;&lt;/span&gt;{{ t.tag }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I am using &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/templatetags/blog_tags.py#L54-L68"&gt;custom templates tags&lt;/a&gt; here to add arguments to the current URL. I’ve built systems like this in the past where the URLs are instead generated in the view logic, which I think I prefer. As always, perfect is the enemy of shipped.&lt;/p&gt;
&lt;p&gt;And because the results are just a Django queryset, we can use Django’s pagination helpers for the pagination links.&lt;/p&gt;
&lt;h3&gt;&lt;a id="The_final_implementation_449"&gt;&lt;/a&gt;The final implementation&lt;/h3&gt;
&lt;p&gt;The full current version of the code at time of writing &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/views.py#L388-L552"&gt;can be seen here&lt;/a&gt;. You can follow my initial implementation of this feature through the following commits: &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7e3a0217"&gt;7e3a0217&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/c7e7b30c"&gt;c7e7b30c&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7f6b524c"&gt;7f6b524c&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/a16ddb5e"&gt;a16ddb5e&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7055c7e1"&gt;7055c7e1&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/74c194d9"&gt;74c194d9&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/f3ffc100"&gt;f3ffc100&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/6c24d9fd"&gt;6c24d9fd&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/cb88c2d4"&gt;cb88c2d4&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/2c262c75"&gt;2c262c75&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/776a562a"&gt;776a562a&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/b8484c50"&gt;b8484c50&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/0b361c78"&gt;0b361c78&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/1322ada2"&gt;1322ada2&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/79b1b13d"&gt;79b1b13d&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/3955f41b"&gt;3955f41b&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/3f5ca052"&gt;3f5ca052&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And that’s how I built faceted search on top of PostgreSQL and Django! I don’t have my blog comments up and running yet, so please post any thoughts or feedback over on &lt;a href="https://github.com/simonw/simonwillisonblog/issues/1"&gt;this GitHub issue&lt;/a&gt; or &lt;a href="https://news.ycombinator.com/item?id=15409733"&gt;over on this thread on Hacker News&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 9th September 2021&lt;/strong&gt;: A few years after implementing this I started to notice performance issues with my blog, which turned out to be caused by search engine crawlers hitting every possible combination of facets, triggering a ton of expensive SQL queries. I excluded &lt;code&gt;/search&lt;/code&gt; from being crawled using &lt;code&gt;robots.txt&lt;/code&gt; which fixed the problem.&lt;/em&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/orm"&gt;orm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facetedsearch"&gt;facetedsearch&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="django"/><category term="full-text-search"/><category term="orm"/><category term="postgresql"/><category term="projects"/><category term="search"/><category term="facetedsearch"/></entry><entry><title>1000th Blogmark</title><link href="https://simonwillison.net/2004/Aug/26/milestone/#atom-series" rel="alternate"/><published>2004-08-26T00:30:54+00:00</published><updated>2004-08-26T00:30:54+00:00</updated><id>https://simonwillison.net/2004/Aug/26/milestone/#atom-series</id><summary type="html">
    &lt;p id="p-0"&gt;I just posted my 1000th &lt;a href="http://simon.incutio.com/blogmarks/"&gt;blogmark&lt;/a&gt;. I can't emphasize enough how much of an impact this &lt;a href="/2003/Nov/24/blogmarks/"&gt;15 minute hack&lt;/a&gt; has had on both my browsing and my blogging habits. While I still tend to leave browser windows open for days at a time, I now at least have a procedure for getting rid of the ones that still interest me. More importantly, having blogmarks has eliminated the temptation to write a full blog entry (with quotation) just to share a link. This has dramatically reduced my posting rate, but has meant that when I do post an entry I usually have something moderately interesting to say.&lt;/p&gt;

&lt;p id="p-1"&gt;To celebrate this personal milestone, I've linked up the rudimentary LIKE query search engine I've been using for a while on the blogmarks index page. My long term aim is still to integrate them with my main content and add comments in the style of &lt;a href="http://photomatt.net/"&gt;photomatt&lt;/a&gt;, but that would require more time spent hacking on my blogging system (or switching to &lt;a href="http://wordpress.org/"&gt;WordPress&lt;/a&gt;) than I have to spend right now.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="site-upgrades"/></entry><entry><title>Blogmarks</title><link href="https://simonwillison.net/2003/Nov/24/blogmarks/#atom-series" rel="alternate"/><published>2003-11-24T00:52:16+00:00</published><updated>2003-11-24T00:52:16+00:00</updated><id>https://simonwillison.net/2003/Nov/24/blogmarks/#atom-series</id><summary type="html">
    &lt;p&gt;This entry was going to be another list of links, together with a note about how much I really needed to set up a separate link blog. Then I realised that it would make more sense just to set one up so that's exactly what I've done. I still need to implement the archive but it's &lt;span class="in-joke" title="and I am likely to be eaten by a grue"&gt;getting dark&lt;/span&gt; so I'm posting this and heading home.&lt;/p&gt;

&lt;p&gt;My main points of inspiration were Paul Hammond's &lt;a href="http://www.paranoidfish.org/links/"&gt;bookmark store&lt;/a&gt;, Mark Pilgrim's &lt;a href="http://diveintomark.org/"&gt;b-links&lt;/a&gt;, Anil Dash's &lt;a href="http://www.dashes.com/links/"&gt;Daily Links&lt;/a&gt; and Jason Kottke's &lt;a href="http://www.kottke.org/remainder/"&gt;Remaindered Links&lt;/a&gt;. Since there didn't seem to be any naming convention I decided to call them blogmarks, which &lt;a href="http://www.google.com/search?q=blogmarks" title="Google Search: blogmarks"&gt;isn't a new term&lt;/a&gt; but doesn't seem to have a widely accepted meaning yet either.&lt;/p&gt;

&lt;p&gt;The system is powered by a simple bookmarklet. To make things a little more interesting I'm capturing the referral information and using it to automatically generate the 'via' link; since the title of the previous page isn't available in Javascript I extract is using a server side script instead. I swayed briefly between using page extracts a la Hammond or sarcastic commentary a la Pilgrim and decided that commentary would be far more fun.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anil-dash"&gt;anil-dash&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jason-kottke"&gt;jason-kottke&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mark-pilgrim"&gt;mark-pilgrim&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paul-hammond"&gt;paul-hammond&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="anil-dash"/><category term="blogging"/><category term="jason-kottke"/><category term="mark-pilgrim"/><category term="paul-hammond"/><category term="site-upgrades"/></entry><entry><title>One year of blogging</title><link href="https://simonwillison.net/2003/Jun/12/oneYearOfBlogging/#atom-series" rel="alternate"/><published>2003-06-12T23:59:37+00:00</published><updated>2003-06-12T23:59:37+00:00</updated><id>https://simonwillison.net/2003/Jun/12/oneYearOfBlogging/#atom-series</id><summary type="html">
    &lt;p&gt;Today marks the first anniversary of &lt;a href="/2002/Jun/12/" title="Archive: 12th June 2002"&gt;the start of my blog&lt;/a&gt; (and, by a slightly contrived coincidence, my thousandth blog entry). It's been a fun year. Here are my highlights - if you can't stand lengthy self-congratulatory bullet points, stop reading now.&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;My &lt;a href="/2002/Jun/12/webStandards/" title="WaSP Phase II"&gt;first post&lt;/a&gt; covered the launch of phase two of the Web Standards project. I can remember agonising over a first post for ages, before eventually copping out and going for something dull but unchallenging.&lt;/li&gt;
 &lt;li&gt;A few days later I had my first weblog driven discussion, a &lt;a href="/2002/Jun/13/hixieOnWaSP/" title="Hixie on WaSP"&gt;debate with Hixie&lt;/a&gt; about standards compliance. Unsurprisingly, I lost - but it took me nearly a year to properly understand &lt;a href="/2003/May/06/knifeEdge/" title="Living on a knife edge"&gt;the issues involved&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;The 24th of June saw my &lt;a href="/2002/Jun/24/glastonburyFlash/" title="Glastonbury Flash"&gt;first rant about website usability&lt;/a&gt; with respect to the Glastobury festival site: amazingly the rant drew a response from the creator in October of that year.&lt;/li&gt;
 &lt;li&gt;My &lt;a href="/2002/Jul/11/aMillionPoundsDownTheDrain/" title="A million pounds down the drain"&gt;second rant&lt;/a&gt; was aimed at a far more deserving target: &lt;a href="http://www.connected-earth.com/"&gt;Connected Earth&lt;/a&gt;, whose site was so terrible it ended up as an example of what &lt;em&gt;not&lt;/em&gt; to do in Jeffrey Zeldman's &lt;a href="http://www.zeldman.com/dwws/" title="Designing with web standards"&gt;new book&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;On the 6th of July I &lt;a href="/2002/Jul/06/betterBlogrolling/" title="Better blogrolling"&gt;discovered blo.gs&lt;/a&gt;; I've been using it to power my blogroll ever since.&lt;/li&gt;
 &lt;li&gt;The 10th of July saw my first published &lt;acronym title="Cascading Style Sheets"&gt;CSS&lt;/acronym&gt; experiment: &lt;a href="/2002/Jul/10/numberedCodeListing/"&gt;Numbered code listings&lt;/a&gt;. It's cropped up in a few different places since then.&lt;/li&gt;
 &lt;li&gt;Around the 14th, I &lt;a href="/2002/Jul/14//"&gt;discovered wikis&lt;/a&gt;, setting up the initial MACCAWS wiki and the &lt;a href="http://smarty.incutio.com/"&gt;Smarty wiki&lt;/a&gt;, which is still going strong.&lt;/li&gt;
 &lt;li&gt;Amazon launched their &lt;a href="http://associates.amazon.com/exec/panama/associates/join/developer/resources.html"&gt;Web Service API&lt;/a&gt; on the 17th, and I &lt;a href="/2002/Jul/17/amazonWebServices/" title="Amazon web services"&gt;released&lt;/a&gt; a PHP &lt;a href="http://scripts.incutio.com/amazon/" title="PHP Amazon Search"&gt;sample implementation&lt;/a&gt; on the same day.&lt;/li&gt;
 &lt;li&gt;I &lt;a href="/2002/Aug/12/archivistGoesLive/" title="Archivist goes live"&gt;launched Archivist in August&lt;/a&gt;, the mailing list archive system used for the &lt;a href="http://archivist.incutio.com/viewlist/css-discuss/"&gt;css-discuss archive&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;September 2nd was another active day: I released the &lt;a href="/2002/Sep/02/aNewXMLRPCLibraryForPHP/" title="A new XML-RPC library for PHP"&gt;initial version of my XML-RPC library&lt;/a&gt;, and used it to create the &lt;a href="/2002/Sep/02/pingBackImplemented/" title="Pingback implemented"&gt;first Pingback implementation&lt;/a&gt;, based on an idea by &lt;a href="http://www.keyogenix.org/"&gt;Stuart&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;Towards the end of September I started an ill-fated experiment in &lt;a href="/2002/Sep/23/bloggingMyLectureNotes/"&gt;blogging my lecture notes&lt;/a&gt;. I soon realised that lecture notes work better when confined to a &lt;a href="http://www.bath.ac.uk/~cs1spw/notes/"&gt;separate site&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;My big project for October was the &lt;a href="http://css-discuss.incutio.com/"&gt;css-discuss wiki&lt;/a&gt;, &lt;a href="/2002/Oct/11/cssWikiLive/" title="The css-discuss Wiki is now live"&gt;launched on the 11th&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;Later in October came my experimental &lt;a href="/2002/Oct/28/w3cValidatorWebService/"&gt;XML-RPC interface to the W3C HTML validator&lt;/a&gt;. I keep meaning to return to that and finish it off.&lt;/li&gt;
 &lt;li&gt;In November I &lt;a href="/2002/Nov/28/phpTrainingUpdate/" title="PHP training update"&gt;attempted to run&lt;/a&gt; a PHP training course, and found that it was harder than I thought it would be. The training material I wrote is &lt;a href="http://www.bath.ac.uk/~cs1spw/php-training/" title="PHP Training: Introduction"&gt;still available though&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;I released &lt;a href="/2002/Dec/20/blockquoteCitations/"&gt;Blockquote Citations&lt;/a&gt; in December, my first useful Javascript hack.&lt;/li&gt;
 &lt;li&gt;In January I &lt;a href="/2003/Jan/29/switched/" title="Switched"&gt;switched to Linux&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;The two big hacks for February were my &lt;a href="/2003/Feb/13/bookmarkletFixed/"&gt;Image Drag bookmarklet&lt;/a&gt; and &lt;a href="/2003/Feb/23/safeHtmlChecker/"&gt;Safe HTML Checker&lt;/a&gt; class for my comments system.&lt;/li&gt;
 &lt;li&gt;March was a month for playing with Javascript: I released a &lt;a href="/2003/Mar/18/phpAndJavascriptSpellChecker/"&gt;PHP/Javascript spell checker&lt;/a&gt; and &lt;a href="/2003/Mar/25/getElementsBySelector/"&gt;getElementsBySelector&lt;/a&gt;, my most ambitious piece of javascript to date.&lt;/li&gt;
 &lt;li&gt;Also in March, I &lt;a href="/2003/Mar/04/BCSS/"&gt;announced BCSS&lt;/a&gt; - the new Computer Science society at my University.&lt;/li&gt;
 &lt;li&gt;In April I released a bunch of PHP hacks, the most important being &lt;a href="/2003/Apr/06/httpClient/"&gt;HttpClient&lt;/a&gt;, followed by code for &lt;a href="/2003/Apr/23/conditionalGet/"&gt;supporting conditional GET&lt;/a&gt; and an &lt;a href="/2003/Apr/29/xmlWriter/"&gt;XMLWriter class&lt;/a&gt;.&lt;/li&gt;
 &lt;li&gt;Finally, in May a &lt;a href="/2003/May/04/structuralMarkup/"&gt;rant about structural markup&lt;/a&gt; lead to my &lt;a href="http://simon.incutio.com/categories/csstutorial/"&gt;CSS tutorial series&lt;/a&gt;, of which there is plenty more on the way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've gained a huge amount from the last year, thanks almost entirely to the many excellent bloggers who have inspired me along the way (most of whom are listed on my blogroll). Here's to another exciting year.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/php"&gt;php&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="php"/></entry></feed>