<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: upsert</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/upsert.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2022-12-15T17:58:19+00:00</updated><author><name>Simon Willison</name></author><entry><title>Datasette 1.0a2: Upserts and finely grained permissions</title><link href="https://simonwillison.net/2022/Dec/15/datasette-1a2/#atom-tag" rel="alternate"/><published>2022-12-15T17:58:19+00:00</published><updated>2022-12-15T17:58:19+00:00</updated><id>https://simonwillison.net/2022/Dec/15/datasette-1a2/#atom-tag</id><summary type="html">
    &lt;p&gt;I've released the third alpha of Datasette 1.0. The &lt;a href="https://docs.datasette.io/en/latest/changelog.html#a2-2022-12-14"&gt;1.0a2 release&lt;/a&gt; introduces upsert support to the new JSON API and makes some major improvements to the Datasette permissions system.&lt;/p&gt;
&lt;p&gt;Here are the annotated releases (&lt;a href="https://simonwillison.net/series/datasette-release-notes/"&gt;see previous&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;You can install and try out the alpha using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install datasette==1.0a2
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Upserts for the JSON API&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;New &lt;code&gt;/db/table/-/upsert&lt;/code&gt; API, &lt;a href="https://docs.datasette.io/en/latest/json_api.html#tableupsertview"&gt;documented here&lt;/a&gt;. Upsert is an update-or-insert: existing rows will have specified keys updated, but if no row matches the incoming primary key a brand new row will be inserted instead. (&lt;a href="https://github.com/simonw/datasette/issues/1878"&gt;#1878&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I &lt;a href="https://simonwillison.net/2022/Dec/2/datasette-write-api/"&gt;wrote about the new JSON Write API&lt;/a&gt; when I released the first alpha a couple of weeks ago.&lt;/p&gt;
&lt;p&gt;The API can be used to create and drop tables, and to insert, update and delete rows in those tables.&lt;/p&gt;
&lt;p&gt;The new &lt;code&gt;/db/table/-/upsert&lt;/code&gt; API adds &lt;a href="https://docs.datasette.io/en/latest/json_api.html#tableupsertview"&gt;upsert support&lt;/a&gt; to Datasette.&lt;/p&gt;
&lt;p&gt;An &lt;em&gt;upsert&lt;/em&gt; is a update-or-insert. Consider the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;POST /books/authors/-/upsert
Authorization: Bearer $TOKEN
Content-Type: application/json
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"rows"&lt;/span&gt;: [
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Ursula K. Le Guin&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1929-10-21&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Terry Pratchett&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1948-04-28&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Neil Gaiman&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1960-11-10&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    }
  ]
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This table has a primary key of &lt;code&gt;id&lt;/code&gt;. The above API call will create three records if the table is empty. But if the table already has records matching any of those primary keys, their &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;born&lt;/code&gt; columns will be updated to match the incoming data.&lt;/p&gt;
&lt;p&gt;Upserts can be a really convenient way of synchronizing data with an external data source. I had a couple of enquiries about them when I published the first alpha, so I decided to make them a key feature for this release.&lt;/p&gt;
&lt;h4&gt;Ignore and replace for the create table API&lt;/h4&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;/db/-/create&lt;/code&gt; API for &lt;a href="https://docs.datasette.io/en/latest/json_api.html#tablecreateview"&gt;creating a table&lt;/a&gt; now accepts &lt;code&gt;"ignore": true&lt;/code&gt; and &lt;code&gt;"replace": true&lt;/code&gt; options when called with the &lt;code&gt;"rows"&lt;/code&gt; property that creates a new table based on an example set of rows. This means the API can be called multiple times with different rows, setting rules for what should happen if a primary key collides with an existing row. (&lt;a href="https://github.com/simonw/datasette/issues/1927"&gt;#1927&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/db/-/create&lt;/code&gt; API now requires actor to have &lt;code&gt;insert-row&lt;/code&gt; permission in order to use the &lt;code&gt;"row"&lt;/code&gt; or &lt;code&gt;"rows"&lt;/code&gt; properties. (&lt;a href="https://github.com/simonw/datasette/issues/1937"&gt;#1937&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This feature is a little less obvious, but I think it's going to be really useful.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;/db/-/create&lt;/code&gt; API can be used to create a new table. You can feed it &lt;a href="https://docs.datasette.io/en/latest/json_api.html#creating-a-table"&gt;an explicit list of columns&lt;/a&gt;, but you can also give it one or more rows and have it &lt;a href="https://docs.datasette.io/en/latest/json_api.html#creating-a-table-from-example-data"&gt;infer the correct schema&lt;/a&gt; based on those examples.&lt;/p&gt;
&lt;p&gt;Datasette inherits this feature &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data"&gt;from sqlite-utils&lt;/a&gt; - I've been finding this an incredibly productive way to work with SQLite databases for a few years now.&lt;/p&gt;
&lt;p&gt;The real magic of this feature is that you can pipe data into Datasette without even needing to first check that the appropriate table has been created. It's a really fast way of getting from data to a populated database and a working API.&lt;/p&gt;
&lt;p&gt;Prior to 1.0a2 you could call &lt;code&gt;/db/-/create&lt;/code&gt; with &lt;code&gt;"rows"&lt;/code&gt; more than once and it would &lt;em&gt;probably&lt;/em&gt; work... unless you attempted to insert rows with primary keys that were already in use - in which case you would get an error. This limited the utility of the feature.&lt;/p&gt;
&lt;p&gt;Now you can pass &lt;code&gt;"ignore": true&lt;/code&gt; or &lt;code&gt;"replace": true&lt;/code&gt; to the API call, to tell Datasette what to do if it encounters a primary key that already exists in the table.&lt;/p&gt;
&lt;p&gt;Heres an example using the author data from above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;POST /books/-/create
Authorization: Bearer $TOKEN
Content-Type: application/json
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"table"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;authors&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"pk"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;id&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"replace"&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"rows"&lt;/span&gt;: [
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Ursula K. Le Guin&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1929-10-21&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Terry Pratchett&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1948-04-28&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Neil Gaiman&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"born"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;1960-11-10&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    }
  ]
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create the &lt;code&gt;authors&lt;/code&gt; table if it does not exist and ensure that those three rows exist in it, in their exact state. If a row already exists it will be replaced.&lt;/p&gt;
&lt;p&gt;Note that this is subtly different from an upsert. An upsert will only update the columns that were provided in the incoming data, leaving any other columns unchanged. A replace will replace the entire row.&lt;/p&gt;
&lt;h4 id="finely-grained-permissions"&gt;Finely grained permissions&lt;/h4&gt;
&lt;p&gt;This is the most significant area of improvement in this release.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-register-permissions"&gt;register_permissions(datasette)&lt;/a&gt; plugin hook. Plugins can now register named permissions, which will then be listed in various interfaces that show available permissions. (&lt;a href="https://github.com/simonw/datasette/issues/1940"&gt;#1940&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Prior to this, permissions were just strings - things like &lt;code&gt;"view-instance"&lt;/code&gt; or &lt;code&gt;"view-table"&lt;/code&gt; or &lt;code&gt;"insert-row"&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Plugins can introduce their own permissions - many do already, like &lt;a href="https://datasette.io/plugins/datasette-edit-schema"&gt;datasette-edit-schema&lt;/a&gt; which adds a &lt;code&gt;"edit-schema"&lt;/code&gt; permission.&lt;/p&gt;
&lt;p&gt;In order to start building UIs for managing permissions, I needed Datasette to know what they were!&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-register-permissions"&gt;register_permissions() hook&lt;/a&gt; lets them do exactly that, and Datasette core uses it to register its own default set of permissions too.&lt;/p&gt;
&lt;p&gt;Permissions are registered using the following named tuple:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-v"&gt;Permission&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;collections&lt;/span&gt;.&lt;span class="pl-en"&gt;namedtuple&lt;/span&gt;(
    &lt;span class="pl-s"&gt;"Permission"&lt;/span&gt;, (
        &lt;span class="pl-s"&gt;"name"&lt;/span&gt;, &lt;span class="pl-s"&gt;"abbr"&lt;/span&gt;, &lt;span class="pl-s"&gt;"description"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"takes_database"&lt;/span&gt;, &lt;span class="pl-s"&gt;"takes_resource"&lt;/span&gt;, &lt;span class="pl-s"&gt;"default"&lt;/span&gt;
    )
)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;abbr&lt;/code&gt; is an abbreviation - e.g. &lt;code&gt;insert-row&lt;/code&gt; can be abbreviated to &lt;code&gt;ir&lt;/code&gt;. This is useful for creating things like signed API tokens where space is at a premium.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;takes_database&lt;/code&gt; and &lt;code&gt;takes_resource&lt;/code&gt; are booleans that indicate whether the permission can optionally be applied to a specific database (e.g. &lt;code&gt;execute-sql&lt;/code&gt;) or to a "resource", which is the name I'm now using for something that could be a SQL table, a SQL view or a &lt;a href="https://docs.datasette.io/en/stable/sql_queries.html#canned-queries"&gt;canned query&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;insert-row&lt;/code&gt; permission for example can be granted to the whole of Datasette, or to all tables in a specific database, or to specific tables.&lt;/p&gt;
&lt;p&gt;Finally, the &lt;code&gt;default&lt;/code&gt; value is a boolean that indicates whether the permission should be default-allow (&lt;code&gt;view-instance&lt;/code&gt; for example) or default-deny (&lt;code&gt;create-table&lt;/code&gt; and suchlike).&lt;/p&gt;
&lt;p&gt;This next feature explains why I needed those permission names to be known to Datasette:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;/-/create-token&lt;/code&gt; page can now be used to create API tokens which are restricted to just a subset of actions, including against specific databases or resources. See &lt;a href="https://docs.datasette.io/en/latest/authentication.html#createtokenview"&gt;API Tokens&lt;/a&gt; for details. (&lt;a href="https://github.com/simonw/datasette/issues/1947"&gt;#1947&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Datasette now has finely grained permissions for API tokens!&lt;/p&gt;
&lt;p&gt;This is the feature I always want when I'm working with other APIs: the ability to create a token that can only perform a restricted subset of actions.&lt;/p&gt;
&lt;p&gt;When I'm working with the GitHub API for example I frequently find myself wanting to create a "personal access token" that only has the ability to read issues from a specific repository. It's infuriating how many APIs leave this ability out.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;/-/create-token&lt;/code&gt; interface (which you can try out on &lt;a href="https://latest.datasette.io/"&gt;latest.datasette.io&lt;/a&gt; by first &lt;a href="https://latest.datasette.io/login-as-root"&gt;signing in as root&lt;/a&gt; and then &lt;a href="https://latest.datasette.io/-/create-token"&gt;visiting this page&lt;/a&gt;) lets you create an API token that can act on your behalf... and then optionally specify a subset of actions that the token is allowed to perform.&lt;/p&gt;
&lt;p&gt;Thanks to the new permissions registration system, the UI on that page knows which permissions can be applied to which entities within Datasette itself.&lt;/p&gt;
&lt;p&gt;Here's a partial screenshot of the UI:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/create-api-token.jpg" alt="Create an API token. This token will allow API access with the same abilities as your current user, root. Form lets you select Expires after X hours, 5 - and there's an expanded menu item for Restrict actions that can be performed using this token. Below that are lists of checkboxes - the first is for All databases and tables, with a list of every permission known to Datasette. Next is All tables in fixtures, which lists just permissions that make sense for a specific database. Finally is Specific tables, which lists fixtures: primary_key_multiple_columns with a much shorter list of permissions that can apply just to tables." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Select a subset of permissions, hit "Create token" and the result will be an access token you can copy and paste into another application, or use to call Datasette with &lt;code&gt;curl&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here's an example token I created that grants &lt;code&gt;view-instance&lt;/code&gt; permission against all of Datasette, and &lt;code&gt;view-database&lt;/code&gt; and &lt;code&gt;execute-sql&lt;/code&gt; permission against the &lt;code&gt;ephemeral&lt;/code&gt; database, which in that demo is hidden from anonymous Datasette users.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;dstok_eyJhIjoicm9vdCIsInQiOjE2NzEwODUzMDIsIl9yIjp7ImEiOlsidmkiXSwiZCI6eyJlcGhlbWVyYWwiOlsidmQiLCJlcyJdfX19.1uw0xyx8UND_Y_vTVg5kEF6m3GU&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Here's a screenshot of the screen I saw when I created it:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/token-created.jpg" alt="Create an API token. Your API token: a copy-to-clipboard box with a long token in it. The token details area is expanded to show JSON that represents the permissions granted with that token." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've expanded the "token details" section to show the JSON that is bundled inside the signed token. The &lt;code&gt;"_r"&lt;/code&gt; block records the specific permissions granted by the token:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"a": ["vi"]&lt;/code&gt; indicates that the &lt;code&gt;view-instance&lt;/code&gt; permission is granted against all of Datasette.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"d": {"ephemeral": ["vd", "es"]}&lt;/code&gt; indicates that the &lt;code&gt;view-database&lt;/code&gt; and &lt;code&gt;execute-sql&lt;/code&gt; permissions are granted against the &lt;code&gt;ephemeral&lt;/code&gt; database.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The token also contains the ID of the user who created it (&lt;code&gt;"a": "root"&lt;/code&gt;) and the time that the token was created (&lt;code&gt;"t": 1671085302&lt;/code&gt;). If the token was set to expire that expiry duration would be baked in here as well.&lt;/p&gt;
&lt;p&gt;You can see the effect this has on the command-line using &lt;code&gt;curl&lt;/code&gt; like so:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;curl 'https://latest.datasette.io/ephemeral.json?sql=select+3+*+5&amp;amp;_shape=array'&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This will return a forbidden error. But if you add the signed token:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;curl 'https://latest.datasette.io/ephemeral.json?sql=select+3+*+5&amp;amp;_shape=array' -H 'Authorization: Bearer dstok_eyJhIjoicm9vdCIsInQiOjE2NzEwODUzMDIsIl9yIjp7ImEiOlsidmkiXSwiZCI6eyJlcGhlbWVyYWwiOlsidmQiLCJlcyJdfX19.1uw0xyx8UND_Y_vTVg5kEF6m3GU'&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;You'll get back a JSON response:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;[{&lt;span class="pl-ent"&gt;"3 * 5"&lt;/span&gt;: &lt;span class="pl-c1"&gt;15&lt;/span&gt;}]&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;The datasette create-token CLI tool&lt;/h4&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Likewise, the &lt;code&gt;datasette create-token&lt;/code&gt; CLI command can now create tokens with &lt;a href="https://docs.datasette.io/en/latest/authentication.html#authentication-cli-create-token-restrict"&gt;a subset of permissions&lt;/a&gt;. (&lt;a href="https://github.com/simonw/datasette/issues/1855"&gt;#1855&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;New &lt;a href="https://docs.datasette.io/en/latest/internals.html#create-token-actor-id-expires-after-none-restrict-all-none-restrict-database-none-restrict-resource-none"&gt;datasette.create_token()&lt;/a&gt; API method for programmatically creating signed API tokens. (&lt;a href="https://github.com/simonw/datasette/issues/1951"&gt;#1951&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;The other way you can create Datasette tokens is on the command-line, using &lt;a href="https://docs.datasette.io/en/latest/authentication.html#datasette-create-token"&gt;the datasette create-token command&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That's been upgraded to support finely grained permissions too.&lt;/p&gt;
&lt;p&gt;Here's how you'd create a token for the same set of permissions as my ephemeral example above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette create-token root \
  --all view-instance \
  --database ephemeral view-database \
  --database ephemeral execute-sql \
  --secret MY_DATASETTE_SECRET
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order to sign the token you need to pass in the &lt;code&gt;--secret&lt;/code&gt; used by the server - although it will pick that up from the &lt;code&gt;DATASETTE_SECRET&lt;/code&gt; environment variable if it's available.&lt;/p&gt;
&lt;p&gt;This has the interesting side-effect that you can use that command to create valid tokens for other Datasette instances, provided you know the secret they're using. I think this ability will be really useful for people like myself who run lots of different Datasette instances on stateless hosting platforms such as Vercel and Google Cloud Run.&lt;/p&gt;
&lt;h4&gt;Configuring permissions in metadata.json/yaml&lt;/h4&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Arbitrary permissions can now be configured at the instance, database and resource (table, SQL view or canned query) level in Datasette's &lt;a href="https://docs.datasette.io/en/latest/metadata.html#metadata"&gt;Metadata&lt;/a&gt; JSON and YAML files. The new &lt;code&gt;"permissions"&lt;/code&gt; key can be used to specify which actors should have which permissions. See &lt;a href="https://docs.datasette.io/en/latest/authentication.html#authentication-permissions-other"&gt;Other permissions in metadata&lt;/a&gt; for details. (&lt;a href="https://github.com/simonw/datasette/issues/1636"&gt;#1636&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Datasette has long had the ability to set permissions for viewing databases and tables using blocks of configuration in the increasingly poorly named &lt;a href="https://docs.datasette.io/en/stable/metadata.html"&gt;metadata.json/yaml files&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As I've built new plugins that introduce new permissions, I've found myself wishing for an easier way to say "user X is allowed to perform action Y" for arbitrary other permissions.&lt;/p&gt;
&lt;p&gt;The new &lt;code&gt;"permissions"&lt;/code&gt; key in metadata.json/yaml files allows you to do that.&lt;/p&gt;
&lt;p&gt;Here's how to specify that the user with &lt;code&gt;"id": "simon"&lt;/code&gt; is allowed to use the API to create tables and insert data into the &lt;code&gt;docs&lt;/code&gt; database:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;databases&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;docs&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;permissions&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;create-table&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;id&lt;/span&gt;: &lt;span class="pl-s"&gt;simon&lt;/span&gt;
      &lt;span class="pl-ent"&gt;insert-row&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;id&lt;/span&gt;: &lt;span class="pl-s"&gt;simon&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's a demo you can run on your own machine. Save the above to &lt;code&gt;permissions.yaml&lt;/code&gt; and run the following in one terminal window:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette docs.db --create --secret sekrit -m permissions.yaml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will create the &lt;code&gt;docs.db&lt;/code&gt; database if it doesn't already exist, and start Datasette with the &lt;code&gt;permissions.yaml&lt;/code&gt; metadata file.&lt;/p&gt;
&lt;p&gt;It sets &lt;code&gt;--secret&lt;/code&gt; to a known value (you should always use a random secure secret in production) so we can easily use it with &lt;code&gt;create-token&lt;/code&gt; in the next step:&lt;/p&gt;
&lt;p&gt;Then in another terminal window run:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;export TOKEN=$(
  datasette create-token simon \
  --secret sekrit
)
curl -XPOST http://localhost:8001/docs/-/create \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "table": "demo",
    "row": {"id": 1, "name": "Simon"},
    "pk": "id"
  }'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first line creates a token that can act on behalf of the &lt;code&gt;simon&lt;/code&gt; actor. The second &lt;code&gt;curl&lt;/code&gt; line then uses that token to create a table using the &lt;code&gt;/-/create&lt;/code&gt; endpoint.&lt;/p&gt;
&lt;p&gt;Run this, then visit &lt;a href="http://localhost:8001/docs/demo"&gt;http://localhost:8001/docs/demo&lt;/a&gt; to see the newly created table.&lt;/p&gt;
&lt;h4&gt;What's next?&lt;/h4&gt;
&lt;p&gt;With the 1.0a2 release I'm reasonably confident that Datasette 1.0 is new-feature-complete. There's still a &lt;strong&gt;lot&lt;/strong&gt; of work to do before the final release, but the remaining work is far more intimidating: I need to make clean backwards-incompatible breakages to a whole host of existing features in order to ship a 1.0 that I can keep stable for as long as possible.&lt;/p&gt;
&lt;p&gt;First up: I'm going to &lt;a href="https://github.com/simonw/datasette/issues/1914"&gt;redesign Datasette's default API output&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The current default JSON output for a simple table looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"database"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;fixtures&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"table"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;facet_cities&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"is_view"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"human_description_en"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;sorted by name&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"rows"&lt;/span&gt;: [
    [
      &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
      &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Detroit&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    ],
    [
      &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
      &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Los Angeles&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    ],
    [
      &lt;span class="pl-c1"&gt;4&lt;/span&gt;,
      &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Memnonia&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    ],
    [
      &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
      &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;San Francisco&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    ]
  ],
  &lt;span class="pl-ent"&gt;"truncated"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"filtered_table_rows_count"&lt;/span&gt;: &lt;span class="pl-c1"&gt;4&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"expanded_columns"&lt;/span&gt;: [],
  &lt;span class="pl-ent"&gt;"expandable_columns"&lt;/span&gt;: [],
  &lt;span class="pl-ent"&gt;"columns"&lt;/span&gt;: [
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;id&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;name&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  &lt;span class="pl-ent"&gt;"primary_keys"&lt;/span&gt;: [
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;id&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  &lt;span class="pl-ent"&gt;"units"&lt;/span&gt;: {},
  &lt;span class="pl-ent"&gt;"query"&lt;/span&gt;: {
    &lt;span class="pl-ent"&gt;"sql"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;select id, name from facet_cities order by name limit 101&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span class="pl-ent"&gt;"params"&lt;/span&gt;: {}
  },
  &lt;span class="pl-ent"&gt;"facet_results"&lt;/span&gt;: {},
  &lt;span class="pl-ent"&gt;"suggested_facets"&lt;/span&gt;: [],
  &lt;span class="pl-ent"&gt;"next"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"next_url"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"private"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"allow_execute_sql"&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"query_ms"&lt;/span&gt;: &lt;span class="pl-c1"&gt;6.718471999647591&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"source"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;tests/fixtures.py&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"source_url"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;https://github.com/simonw/datasette/blob/main/tests/fixtures.py&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"license"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Apache License 2.0&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"license_url"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;https://github.com/simonw/datasette/blob/main/LICENSE&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In addition to being &lt;em&gt;really&lt;/em&gt; verbose, you'll note that the rows themselves are represented like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;[
  [
    &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Detroit&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  [
    &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Los Angeles&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  [
    &lt;span class="pl-c1"&gt;4&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Memnonia&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  [
    &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;San Francisco&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ]
]&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I originally designed it this way because I thought saving on repeating the column names for every row would be more efficient.&lt;/p&gt;
&lt;p&gt;In practice, every single time I've used Datasette's API I've found myself using the &lt;code&gt;?_shape=array&lt;/code&gt; parameter, which outputs this format instead:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;[
  {
    &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
    &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Detroit&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  {
    &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
    &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Los Angeles&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  {
    &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;4&lt;/span&gt;,
    &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Memnonia&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  {
    &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
    &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;San Francisco&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  }
]&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It's just so much more convenient to work with!&lt;/p&gt;
&lt;p&gt;So the new default format will look like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"rows"&lt;/span&gt;: [
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;3&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Detroit&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;2&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Los Angeles&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;4&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Memnonia&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    },
    {
      &lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;1&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;San Francisco&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    }
  ]
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;rows&lt;/code&gt; key is there so I can add extra keys to the output, based on additional &lt;code&gt;?_extra=&lt;/code&gt; request parameters. You'll be able to get back everything you can get in the current full-fat table API, but you'll have to ask for it.&lt;/p&gt;
&lt;p&gt;There are a ton of other changes I want to make to Datasette as a whole - things like renaming &lt;code&gt;metadata.yaml&lt;/code&gt; to &lt;code&gt;config.yaml&lt;/code&gt; to reflect that it's gone way beyond its origins as a way of attaching metadata to a database.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/datasette/milestone/7"&gt;1.0 milestone&lt;/a&gt; is a dumping ground for many of these ideas. It's not a canonical reference though: I'd be very surprised if everything currently in that milestone makes it into the final 1.0 release.&lt;/p&gt;
&lt;p&gt;As I get closer to 1.0 though I'll be refining that milestone so it should get more accurate over time.&lt;/p&gt;
&lt;p&gt;Once again: &lt;strong&gt;now is the time&lt;/strong&gt; to be providing feedback on this stuff! The &lt;a href="https://datasette.io/discord"&gt;Datasette Discord&lt;/a&gt; is a particularly valuable way for me to get feedback on the work so far, and my plans for the future.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/api"&gt;api&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/permissions"&gt;permissions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/upsert"&gt;upsert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="api"/><category term="permissions"/><category term="projects"/><category term="upsert"/><category term="datasette"/><category term="annotated-release-notes"/></entry><entry><title>sqlite-utils 2.0: real upserts</title><link href="https://simonwillison.net/2019/Dec/30/sqlite-utils-2/#atom-tag" rel="alternate"/><published>2019-12-30T06:48:40+00:00</published><updated>2019-12-30T06:48:40+00:00</updated><id>https://simonwillison.net/2019/Dec/30/sqlite-utils-2/#atom-tag</id><summary type="html">
    &lt;p&gt;I just &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/2.0"&gt;released version 2.0&lt;/a&gt; of my &lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt; library/CLI tool to &lt;a href="https://pypi.org/project/sqlite-utils/"&gt;PyPI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's 2.0 because this is my first backwards compatibility breaking version of the library. I was hoping to avoid this for as long as possible: given semantic versioning, &lt;a href="https://twitter.com/simonw/status/1192130232498569216"&gt;my opinion&lt;/a&gt; is that the highest quality libraries are the 1.x versions with the highest sub-version. Version 1.253 means a library managed to ship 253 new features without once breaking compatibility!&lt;/p&gt;

&lt;p&gt;For sqlite-utils I needed the clean break, because I made a big mistake in my definition of "upsert".&lt;/p&gt;

&lt;p&gt;An upsert is a database operation which says "Update this record with these new values. If the record does not exist, create it."&lt;/p&gt;

&lt;p&gt;This is actually a &lt;a href="https://www.sqlite.org/lang_UPSERT.html"&gt;documented feature of SQLite&lt;/a&gt; version 3.24.0 and higher. I've seen the term used for other databases &lt;a href="https://wiki.postgresql.org/wiki/UPSERT"&gt;such as PostgreSQL&lt;/a&gt; as well.&lt;/p&gt;

&lt;p&gt;The problem is: when I first implemented the &lt;code&gt;.upsert()&lt;/code&gt; method for sqlite-utils I incorrectly assumed that upsert was an alias for &lt;code&gt;INSERT OR REPLACE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So in sqlite-utils 1.x, the following code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db["dogs"].upsert({
    "id": 1,
    "name": "Cleo",
    "age": 4
}, pk="id")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Executes then following SQL:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;INSERT OR REPLACE INTO dogs
    ("id", "name", "age")
VALUES
    (1, "Cleo", 4);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It turns out this isn't how upsert should be expected to work. This query will entirely replace any existing record. But... an upsert should update the record in place. Crucially, it should leave any columns not referenced in the upsert alone. My implementation over-wrote the entire row, deleting any missing values.&lt;/p&gt;

&lt;p&gt;So if I run this Python code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db["dogs"].upsert({
    "id": 1,
    "age": 5
}, pk="id")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The existing record for id=1 should have its age updated, but the name should stay the same.&lt;/p&gt;

&lt;p&gt;I thought hard about whether I could keep the existing, incorrect &lt;code&gt;.upsert()&lt;/code&gt; behavior and introduce a different method name or argument that did things right - maintaining backwards compatibility at the price of a less intuitive API - but this felt like too big a discrepancy. And this is what major version number changes are for! You can follow my deliberations about this in &lt;a href="https://github.com/simonw/sqlite-utils/issues/66"&gt;this issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While this is technically a breaking change, when I look at the code I have already written that uses sqlite-utils it turns out all of my uses of &lt;code&gt;.upsert()&lt;/code&gt; are compatible with the new implementation anyway - they all send the full record object, so they will over-ride all of the fields no matter if they are running against 1.x or 2.x.&lt;/p&gt;

&lt;p&gt;If I'm lucky, other users of sqlite-utils will be unaffected by this behavior change either. So hopefully it will be a painless upgrade.&lt;/p&gt;

&lt;p&gt;If you do need to keep the old behavior, I've made it available as a &lt;code&gt;replace=True&lt;/code&gt; argument to &lt;code&gt;.insert()&lt;/code&gt; and &lt;code&gt;.insert_all()&lt;/code&gt;, like so:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db["dogs"].insert({
    "id": 1,
    "name": "Cleo",
    "age": 4
}, pk="id", replace=True)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I did think about introducing a new piece of terminology for this - &lt;code&gt;.inplace()&lt;/code&gt;, for insert-or-replace - but after getting in trouble redefining upsert by mistake I decided inventing new language wasn't worth the risk.&lt;/p&gt;

&lt;h3&gt;Using this for Niche Museums&lt;/h3&gt;

&lt;p&gt;The project that made me realize I needed this upsert change was &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Niche Museums works by building a SQLite database file from a &lt;a href="https://github.com/simonw/museums/blob/master/museums.yaml"&gt;museums.yaml&lt;/a&gt; file. The YAML file contains the data for all 81 museums on the site. To add or edit a listing, I edit that file - sometimes in Visual Code on my Mac, sometimes via the GitHub web interface and often by using the &lt;a href="https://workingcopyapp.com/"&gt;Working Copy&lt;/a&gt; git client on my iPhone.&lt;/p&gt;

&lt;p&gt;Creating the database is done using my &lt;a href="https://github.com/simonw/yaml-to-sqlite"&gt;yaml-to-sqlite&lt;/a&gt; tool:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ yaml-to-sqlite browse.db museums museums.yaml --pk=id&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The build process then runs &lt;a href="https://github.com/simonw/museums/blob/e91aa82ac4ea8d01439d0759d5d8ab640a1a6761/.circleci/config.yml#L13-L19"&gt;a few extra scripts&lt;/a&gt; to annotate the records with extra data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/simonw/museums/blob/e91aa82ac4ea8d01439d0759d5d8ab640a1a6761/annotate_timestamps.py"&gt;annotate_timestamps.py&lt;/a&gt; iterates through the full git history and uses it to figure out when each record was first created or last updated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/simonw/museums/blob/e91aa82ac4ea8d01439d0759d5d8ab640a1a6761/annotate_nominatum.py"&gt;annotate_nominatim.py&lt;/a&gt; uses the OpenStreetMap &lt;a href="https://nominatim.org/release-docs/develop/api/Overview/"&gt;Nominatim API&lt;/a&gt; to reverse geocode the latitude and longitude of each museum, adding &lt;a href="https://www.niche-museums.com/browse/museums"&gt;extra columns&lt;/a&gt; for the city, state, country and all kinds of other interesting geographical details.&lt;/p&gt;

&lt;p&gt;I'm not doing much with this OSM data yet - mainly using it to include the OSM country and city name in the full-text search index - but I hope to use it for faceting and other navigation improvements in the future (and maybe location-based breadcrumbs).&lt;/p&gt;

&lt;p&gt;Here's the problem though: every time I deploy the site, I run those scripts. Nominatim asks you not to hit their API more than once a second, so I have a &lt;code&gt;sleep(1)&lt;/code&gt; in the code. This means the more museums I add, the longer that build step takes to reverse geocode them all.&lt;/p&gt;

&lt;p&gt;It would be great to avoid geocoding museums that have already been geocoded. Properly implemented &lt;code&gt;upsert&lt;/code&gt; let's me do exactly that.&lt;/p&gt;

&lt;p&gt;I haven't made these changes yet, but my plan is to update the build script so that instead of starting from a blank database each time, it downloads the previously built version of the database and runs an upsert against it with the data from the YAML. This will overwrite all of the data that I might have updated in that file, but leave the &lt;code&gt;osm_*&lt;/code&gt; columns that were created by the &lt;code&gt;annotate_nominatim.py&lt;/code&gt; script in place.&lt;/p&gt;

&lt;p&gt;That script can then just run against records for which &lt;code&gt;osm_country&lt;/code&gt; is null. As such, it should only geocode newly created records.&lt;/p&gt;

&lt;h3&gt;Other improvements this week&lt;/h3&gt;

&lt;p&gt;I added a few new features to &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt; this week.&lt;/p&gt;

&lt;p&gt;First, I &lt;a href="https://github.com/simonw/museums/commit/be1ed749feb8e2e3e9d8c32eddda4ab226f2c296"&gt;added simple photo galleries&lt;/a&gt; to the listings pages, using CSS columns. This means I can add more than one photo to a museum! I've done this to a few now, including going back to the listing for &lt;a href="https://www.niche-museums.com/browse/museums/27"&gt;the Ilfracombe Museum&lt;/a&gt; to add a whole bunch of extra photos.&lt;/p&gt;

&lt;p&gt;I added photos and press links to &lt;a href="https://www.niche-museums.com/browse/feed.atom?_feed_title=Niche+Museums"&gt;the 
Atom feed&lt;/a&gt; for the site. Since the Atom feed is defined using a custom SQL query and &lt;a href="https://github.com/simonw/datasette-atom"&gt;datasette-atom&lt;/a&gt;, achieving this involved building &lt;a href="https://github.com/simonw/museums/issues/16"&gt;a truly diabolical SQL query&lt;/a&gt; (making extensive use of the SQLite &lt;a href="https://www.sqlite.org/json1.html"&gt;json1 extension&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://github.com/simonw/museums/commit/cd3e70a2814d03e4f1d518da55ee588bc7a6ef72"&gt;fixed an issue&lt;/a&gt; with the search engine where unexpected punctuation could cause an FTS syntax error in SQLite. I've extracted that into a fix for &lt;a href="https://github.com/simonw/datasette/issues/651"&gt;a Datasette issue&lt;/a&gt; as well.&lt;/p&gt;

&lt;p&gt;The site search engine also now attempts to geocode the user's query and offers a link to the corresponding lat/lon radius search for that point. Try &lt;a href="https://www.niche-museums.com/browse/search?q=new+orleans"&gt;a search for new orleans&lt;/a&gt; to see what that looks like - the implementation (entirely client-side, calling the Nominatim API) &lt;a href="https://github.com/simonw/museums/commit/dae47b94029e0f47b01c0aa48374ee5f325c08f5"&gt;is here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here are the museums I added this week:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/75"&gt;Jukebox London&lt;/a&gt; in London&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/76"&gt;Stonehenge&lt;/a&gt; in Wiltshire&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/77"&gt;Yolo Causeway Bat Colony&lt;/a&gt; in Yolo County&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/78"&gt;Horniman Museum and Gardens&lt;/a&gt; in London&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/79"&gt;Southern Food and Beverage Museum&lt;/a&gt; in New Orleans&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/80"&gt;Oakley Court&lt;/a&gt; in Windsor and Maidenhead&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/81"&gt;Seaton Tramway&lt;/a&gt; in East Devon&lt;/li&gt;&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/museums"&gt;museums&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/upsert"&gt;upsert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="museums"/><category term="projects"/><category term="sql"/><category term="sqlite"/><category term="upsert"/><category term="weeknotes"/><category term="sqlite-utils"/></entry><entry><title>Using Mongo for Real-Time Analytics</title><link href="https://simonwillison.net/2009/Jun/30/analytics/#atom-tag" rel="alternate"/><published>2009-06-30T19:28:43+00:00</published><updated>2009-06-30T19:28:43+00:00</updated><id>https://simonwillison.net/2009/Jun/30/analytics/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.10gen.com/blog/2009/3/using-mongo-for-real-time-analytics"&gt;Using Mongo for Real-Time Analytics&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
MongoDB supports an “upsert” query, which when combined with the $inc operator can cause counter fields to be incremented if they exist and created otherwise. This makes it a great fit for real-time analytics applications (one increment per page view), something that regular relational databases aren’t particularly good at.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/counters"&gt;counters&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/increment"&gt;increment&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mongodb"&gt;mongodb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/upsert"&gt;upsert&lt;/a&gt;&lt;/p&gt;



</summary><category term="counters"/><category term="databases"/><category term="increment"/><category term="mongodb"/><category term="upsert"/></entry></feed>