<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: enrichments</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/enrichments.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-04-01T22:00:34+00:00</updated><author><name>Simon Willison</name></author><entry><title>datasette-enrichments-llm 0.2a1</title><link href="https://simonwillison.net/2026/Apr/1/datasette-enrichments-llm-2/#atom-tag" rel="alternate"/><published>2026-04-01T22:00:34+00:00</published><updated>2026-04-01T22:00:34+00:00</updated><id>https://simonwillison.net/2026/Apr/1/datasette-enrichments-llm-2/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/datasette/datasette-enrichments-llm/releases/tag/0.2a1"&gt;datasette-enrichments-llm 0.2a1&lt;/a&gt;&lt;/p&gt;
    &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;actor&lt;/code&gt; who triggers an enrichment is now passed to the &lt;code&gt;llm.mode(... actor=actor)&lt;/code&gt; method. &lt;a href="https://github.com/datasette/datasette-enrichments-llm/issues/3"&gt;#3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="datasette"/><category term="llm"/><category term="enrichments"/></entry><entry><title>Six short video demos of LLM and Datasette projects</title><link href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#atom-tag" rel="alternate"/><published>2025-01-22T02:09:54+00:00</published><updated>2025-01-22T02:09:54+00:00</updated><id>https://simonwillison.net/2025/Jan/22/office-hours-demos/#atom-tag</id><summary type="html">
    &lt;p&gt;Last Friday Alex Garcia and I hosted a new kind of Datasette Public Office Hours session, inviting members of the Datasette community to share short demos of projects that they had built. The session lasted just over an hour and featured demos from six different people.&lt;/p&gt;
&lt;p&gt;We broadcast live on YouTube, but I've now edited the session into separate videos. These are listed below, along with project summaries and show notes for each presentation.&lt;/p&gt;
&lt;p&gt;You can also watch all six videos in &lt;a href="https://www.youtube.com/playlist?list=PLSocEbMlNGotyeonEbgFP1_uf9gk1z7zm"&gt;this YouTube playlist&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-logs-feedback-by-matthias-l-bken"&gt;llm-logs-feedback by Matthias Lübken&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-model-gateway-and-llm-consortium-by-thomas-hughes"&gt;llm-model-gateway and llm-consortium by Thomas Hughes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#congressional-travel-explorer-with-derek-willis"&gt;Congressional Travel Explorer with Derek Willis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-questioncache-with-nat-knight"&gt;llm-questioncache with Nat Knight&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#improvements-to-datasette-enrichments-with-simon-willison"&gt;Improvements to Datasette Enrichments with Simon Willison&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#datasette-comments-pins-and-write-ui-with-alex-garcia"&gt;Datasette comments, pins and write UI with Alex Garcia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="llm-logs-feedback-by-matthias-l-bken"&gt;llm-logs-feedback by Matthias Lübken&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="9pEP6auZmvg"
  title="llm-logs-feedback by Matthias Lübken"
  playlabel="Play: llm-logs-feedback by Matthias Lübken"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/luebken/llm-logs-feedback"&gt;llm-logs-feedback&lt;/a&gt; is a plugin by Matthias Lübken for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; which adds the ability to store feedback on prompt responses, using new &lt;code&gt;llm feedback+1&lt;/code&gt; and &lt;code&gt;llm feedback-1&lt;/code&gt; commands. These also accept an optional comment, and the feedback is stored in a &lt;code&gt;feedback&lt;/code&gt; table in SQLite.&lt;/p&gt;
&lt;p&gt;You can install the plugin from PyPI like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-logs-feedback&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The full plugin implementation is in the &lt;a href="https://github.com/luebken/llm-logs-feedback/blob/main/llm_logs_feedback.py"&gt;llm_logs_feedback.py file&lt;/a&gt; in Matthias' GitHub repository.&lt;/p&gt;
&lt;h4 id="llm-model-gateway-and-llm-consortium-by-thomas-hughes"&gt;llm-model-gateway and llm-consortium by Thomas Hughes&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="Th5WOyjuRdk"
  title="llm-model-gateway and llm-consortium by Thomas Hughes"
  playlabel="Play: llm-model-gateway and llm-consortium by Thomas Hughes"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;Tommy Hughes has been developing a whole array of LLM plugins, including his &lt;a href="https://github.com/irthomasthomas/llm-plugin-generator"&gt;llm-plugin-generator&lt;/a&gt; which is a plugin that can help write new plugins!&lt;/p&gt;
&lt;p&gt;He started by demonstrating &lt;a href="https://github.com/irthomasthomas/llm-model-gateway"&gt;llm-model-gateway&lt;/a&gt;, a plugin that adds a &lt;code&gt;llm serve&lt;/code&gt; command which starts a localhost server running an imitation of the OpenAI API against LLM models:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-model-gateway
llm serve&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Tommy's main demo was of &lt;a href="https://github.com/irthomasthomas/llm-consortium"&gt;llm-consortium&lt;/a&gt;, a plugin which can use a consortium of collaborating LLM models working together to solve problems.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-consortium
llm consortium &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Best way to prepare avocado toast&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --models claude-3.5-sonnet \
  --models gpt-4o \
  --arbiter gegemini-1.5-pro-latest \
  --confidence-threshold 0.8 \
  --max-iterations 3 \
  --output results.json&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I ran this and got back:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The best way to prepare avocado toast involves selecting quality ingredients and careful preparation. Start with a thick slice of good quality bread, such as sourdough or whole grain. Toast it to your preferred level of crispness.  While the bread is toasting, prepare a ripe avocado. Cut it in half, remove the pit, and scoop the flesh into a bowl. Mash the avocado with a fork, leaving some chunks for texture. Season with salt, black pepper, and a squeeze of fresh lemon juice to prevent browning.  Optional additions include a touch of red pepper flakes.&lt;/p&gt;
&lt;p&gt;Once the toast is ready, let it cool slightly before spreading the seasoned avocado evenly over it. Consider lightly rubbing a cut garlic clove on the warm toast for an extra layer of flavor (optional).&lt;/p&gt;
&lt;p&gt;Enhance your avocado toast with your favorite toppings. Popular choices include: everything bagel seasoning, sliced tomatoes, radishes, a poached or fried egg (for added protein), microgreens, smoked salmon (for a more savory option), feta cheese crumbles, or a drizzle of hot sauce.  For a finishing touch, drizzle with high-quality olive oil and sprinkle with sesame or chia seeds for added texture.&lt;/p&gt;
&lt;p&gt;Consider dietary needs when choosing toppings. For example, those following a low-carb diet might skip the tomatoes and opt for more protein and healthy fats.&lt;/p&gt;
&lt;p&gt;Finally, pay attention to presentation. Arrange the toppings neatly for a visually appealing toast. Serve immediately to enjoy the fresh flavors and crispy toast.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But the really interesting thing is the full log of the prompts and responses sent to Claude 3.5 Sonnet and GPT-4o, followed by a combined prompt to Gemini 1.5 Pro to have it arbitrate between the two responses. You can see &lt;a href="https://gist.github.com/simonw/425f42f8ec1a963ae13c5b57ba580f56"&gt;the full logged prompts and responses here&lt;/a&gt;. Here's that &lt;a href="https://gist.github.com/simonw/e82370f0e5986a15823c82200c1b77f8"&gt;results.json&lt;/a&gt; output file.&lt;/p&gt;
&lt;h4 id="congressional-travel-explorer-with-derek-willis"&gt;Congressional Travel Explorer with Derek Willis&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="CDilLbFP1DY"
  title="Congressional Travel Explorer with Derek Willis"
  playlabel="Play: Congressional Travel Explorer with Derek Willis"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;Derek Willis teaches data journalism at the Philip Merrill College of Journalism at the University of Maryland. For a recent project his students built a &lt;a href="https://cnsmaryland.org/interactives/fall-2024/congressional_travel_explorer/index.html"&gt;Congressional Travel Explorer&lt;/a&gt; interactive using Datasette, AWS Extract and Claude 3.5 Sonnet to analyze travel disclosures from members of Congress.&lt;/p&gt;
&lt;p&gt;One of the outcomes from the project was this story in Politico: &lt;a href="https://www.politico.com/news/2024/10/30/israel-aipac-funded-congress-travel-00185167"&gt;Members of Congress have taken hundreds of AIPAC-funded trips to Israel in the past decade&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="llm-questioncache-with-nat-knight"&gt;llm-questioncache with Nat Knight&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="lXwfEYXjsak"
  title="llm-questioncache with Nat Knight"
  playlabel="Play: llm-questioncache with Nat Knight"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/nathanielknight/llm-questioncache"&gt;llm-questioncache&lt;/a&gt; builds on top of &lt;a href="https://llm.datasette.io/"&gt;https://llm.datasette.io/&lt;/a&gt; to cache answers to questions, using embeddings to return similar answers if they have already been stored.&lt;/p&gt;
&lt;p&gt;Using embeddings for de-duplication of similar questions is an interesting way to apply LLM's &lt;a href="https://llm.datasette.io/en/stable/embeddings/python-api.html"&gt;embeddings feature&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="improvements-to-datasette-enrichments-with-simon-willison"&gt;Improvements to Datasette Enrichments with Simon Willison&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="GumAgaYpda0"
  title="Improvements to Datasette Enrichments with Simon Willison"
  playlabel="Play: Improvements to Datasette Enrichments with Simon Willison"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;I've demonstrated improvements I've been making to Datasette's &lt;a href="https://enrichments.datasette.io/"&gt;Enrichments&lt;/a&gt; system over the past few weeks.&lt;/p&gt;
&lt;p&gt;Enrichments allow you to apply an operation - such as geocoding, a QuickJS JavaScript transformation or an LLM prompt - against selected rows within a table.&lt;/p&gt;
&lt;p&gt;The latest release of &lt;a href="https://github.com/datasette/datasette-enrichments/releases/tag/0.5"&gt;datasette-enrichments&lt;/a&gt; adds visible progress bars and the ability to pause, resume and cancel an enrichment job that is running against a table.&lt;/p&gt;
&lt;h4 id="datasette-comments-pins-and-write-ui-with-alex-garcia"&gt;Datasette comments, pins and write UI with Alex Garcia&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="i0u4N6g15Zg"
  title="Datasette comments, pins and write UI with Alex Garcia"
  playlabel="Play: Datasette comments, pins and write UI with Alex Garcia"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;We finished with three plugin demos from Alex, showcasing collaborative features we have been developing for &lt;a href="https://www.datasette.cloud/"&gt;Datasette Cloud&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-write-ui"&gt;datasette-write-ui&lt;/a&gt; provides tools for editing and adding data to Datasette tables. A new feature here is the ability to shift-click a row to open the editing interface for that row.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-pins"&gt;datasette-pins&lt;/a&gt; allows users to pin tables and databases to their Datasette home page, making them easier to find.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-comments"&gt;datasette-comments&lt;/a&gt; adds a commenting interface to Datasette, allowing users to leave comments on individual rows in a table.&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/community"&gt;community&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/demos"&gt;demos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-public-office-hours"&gt;datasette-public-office-hours&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="community"/><category term="data-journalism"/><category term="demos"/><category term="derek-willis"/><category term="youtube"/><category term="ai"/><category term="datasette"/><category term="alex-garcia"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="enrichments"/><category term="datasette-public-office-hours"/></entry><entry><title>Datasette Public Office Hours Application</title><link href="https://simonwillison.net/2025/Jan/16/datasette-public-office-hours/#atom-tag" rel="alternate"/><published>2025-01-16T18:38:31+00:00</published><updated>2025-01-16T18:38:31+00:00</updated><id>https://simonwillison.net/2025/Jan/16/datasette-public-office-hours/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSf4EGqdTWUXII7gBxdvsUbIVR-vECjfssrVni-R3Bzc8ns-bA/viewform"&gt;Datasette Public Office Hours Application&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
We are running another Datasette Public Office Hours event &lt;a href="https://discord.gg/38DnWBvQ?event=1328432594295066664"&gt;on Discord&lt;/a&gt; tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / &lt;a href="https://www.timeanddate.com/worldclock/converter.html?iso=20250117T220000&amp;amp;p1=224&amp;amp;p2=75&amp;amp;p3=2485&amp;amp;p4=179&amp;amp;p5=136"&gt;more timezones here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The theme this time around is &lt;strong&gt;lightning talks&lt;/strong&gt; - we're looking for 5-8 minute long talks from community members about projects they are working on or things they have built using the Datasette family of tools (which includes &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt; as well).&lt;/p&gt;
&lt;p&gt;If you have a demo you'd like to share, please &lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSf4EGqdTWUXII7gBxdvsUbIVR-vECjfssrVni-R3Bzc8ns-bA/viewform"&gt;let us know&lt;/a&gt; via this form.&lt;/p&gt;
&lt;p&gt;I'm going to be demonstrating my recent work on the next generation of &lt;a href="https://enrichments.datasette.io/"&gt;Datasette Enrichments&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/lightning-talks"&gt;lightning-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/discord"&gt;discord&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-public-office-hours"&gt;datasette-public-office-hours&lt;/a&gt;&lt;/p&gt;



</summary><category term="lightning-talks"/><category term="datasette"/><category term="discord"/><category term="enrichments"/><category term="datasette-public-office-hours"/></entry><entry><title>datasette-enrichments-llm</title><link href="https://simonwillison.net/2024/Dec/5/datasette-enrichments-llm/#atom-tag" rel="alternate"/><published>2024-12-05T23:46:48+00:00</published><updated>2024-12-05T23:46:48+00:00</updated><id>https://simonwillison.net/2024/Dec/5/datasette-enrichments-llm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-llm"&gt;datasette-enrichments-llm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Today's new alpha release is &lt;strong&gt;datasette-enrichments-llm&lt;/strong&gt;, a plugin for Datasette 1.0a+ that provides an &lt;a href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/"&gt;enrichment&lt;/a&gt; that lets you run prompts against data from one or more column and store the result in another column.&lt;/p&gt;
&lt;p&gt;So far it's a light re-implementation of the existing &lt;a href="https://github.com/datasette/datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/a&gt; plugin, now using the new &lt;a href="https://llm.datasette.io/en/stable/python-api.html#python-api-listing-models"&gt;llm.get_async_models()&lt;/a&gt; method to allow users to select any async-enabled model that has been registered by a plugin - so currently any of the models from OpenAI, Anthropic, Gemini or Mistral via their &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html#remote-apis"&gt;respective plugins&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Still plenty to do on this one. Next step is to integrate it with &lt;a href="https://simonwillison.net/2024/Dec/2/datasette-llm-usage/"&gt;datasette-llm-usage&lt;/a&gt; and use it to drive a design-complete stable version of that.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/releases"&gt;releases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;



</summary><category term="plugins"/><category term="projects"/><category term="releases"/><category term="ai"/><category term="datasette"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="enrichments"/></entry><entry><title>Weeknotes: more datasette-secrets, plus a mystery video project</title><link href="https://simonwillison.net/2024/May/7/datasette-secrets/#atom-tag" rel="alternate"/><published>2024-05-07T19:49:02+00:00</published><updated>2024-05-07T19:49:02+00:00</updated><id>https://simonwillison.net/2024/May/7/datasette-secrets/#atom-tag</id><summary type="html">
    &lt;p&gt;I introduced &lt;code&gt;datasette-secrets&lt;/code&gt; &lt;a href="https://simonwillison.net/2024/Apr/23/weeknotes/#datasette-secrets"&gt;two weeks ago&lt;/a&gt;. The core idea is to provide a way for end-users to store secrets such as API keys in Datasette, allowing other plugins to access them.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/datasette/datasette-secrets/releases/tag/0.2"&gt;datasette-secrets 0.2&lt;/a&gt; is the first non-alpha release of that project. The big new feature is that the plugin is &lt;a href="https://github.com/datasette/datasette-secrets/issues/15"&gt;now compatible&lt;/a&gt; with both the Datasette 1.0 alphas and the stable releases of Datasette (currently Datasette 0.64.6).&lt;/p&gt;
&lt;p&gt;My policy at the moment is that a plugin that only works with the Datasette 1.0 alphas must itself be an alpha release. I've been feeling the weight of this as the number of plugins that depend on 1.0a has grown - on the one hand it's a great reason to push through to that 1.0 stable release, but it's painful to have so many features that are incompatible with current Datasette.&lt;/p&gt;
&lt;p&gt;This came to a head with &lt;a href="https://enrichments.datasette.io/"&gt;Datasette Enrichments&lt;/a&gt;. I wanted to start consuming secrets from enrichments such as &lt;a href="https://github.com/datasette/datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/a&gt; and &lt;a href="https://github.com/datasette/datasette-enrichments-opencage"&gt;datasette-enrichments-opencage&lt;/a&gt;, but I didn't want the whole enrichments ecosystem to become 1.0a only.&lt;/p&gt;
&lt;h4 id="patterns-multiple-datasette"&gt;Patterns for plugins that work against multiple Datasette versions&lt;/h4&gt;
&lt;p&gt;I ended up building out quite a bit of infrastructure to help support plugins that work with both versions.&lt;/p&gt;
&lt;p&gt;I already have &lt;a href="https://github.com/datasette/datasette-secrets/blob/0.2/.github/workflows/test.yml"&gt;a GitHub Actions pattern&lt;/a&gt; for running tests against both versions, which looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;jobs&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;test&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;strategy&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;matrix&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;["3.8", "3.9", "3.10", "3.11", "3.12"]&lt;/span&gt;
        &lt;span class="pl-ent"&gt;datasette-version&lt;/span&gt;: &lt;span class="pl-s"&gt;["&amp;lt;1.0", "&amp;gt;=1.0a13"]&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v4&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python ${{ matrix.python-version }}&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v5&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ matrix.python-version }}&lt;/span&gt;
        &lt;span class="pl-ent"&gt;cache&lt;/span&gt;: &lt;span class="pl-s"&gt;pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;cache-dependency-path&lt;/span&gt;: &lt;span class="pl-s"&gt;pyproject.toml&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install '.[test]'&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install "datasette${{ matrix.datasette-version }}"&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Run tests&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pytest&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses a GitHub Actions matrix to run the test suite ten times - five against Datasette &amp;lt;1.0 on different Python versions and then five again on Datasette &amp;gt;=1.0a13.&lt;/p&gt;
&lt;p&gt;One of the big changes in Datasette 1.0 involves the way plugins are configured. I have a &lt;a href="https://github.com/datasette/datasette-test"&gt;datasette-test&lt;/a&gt; library to help paper over those differences, which can be used like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette_test&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Datasette&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_something&lt;/span&gt;():
    &lt;span class="pl-s1"&gt;datasette&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Datasette&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;plugin_config&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;{
            &lt;span class="pl-s"&gt;"datasette-secrets"&lt;/span&gt;: {
                &lt;span class="pl-s"&gt;"database"&lt;/span&gt;: &lt;span class="pl-s"&gt;"_internal"&lt;/span&gt;,
                &lt;span class="pl-s"&gt;"encryption-key"&lt;/span&gt;: &lt;span class="pl-v"&gt;TEST_ENCRYPTION_KEY&lt;/span&gt;,
            }
        },
        &lt;span class="pl-s1"&gt;permissions&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;{&lt;span class="pl-s"&gt;"manage-secrets"&lt;/span&gt;: {&lt;span class="pl-s"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;"admin"&lt;/span&gt;}},
    )&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;plugin_config=&lt;/code&gt; argument there is unique to that &lt;code&gt;datasette_test.Datasette()&lt;/code&gt; class constructor, and does the right thing against both versions of Datasette. &lt;code&gt;permissions=&lt;/code&gt; is a similar utility function. Both are described in the &lt;a href="https://github.com/datasette/datasette-test/blob/main/README.md"&gt;datasette-test README&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/datasette/datasette-secrets/pull/16"&gt;PR adding &amp;lt;1.0 and &amp;gt;1.0a compatibility&lt;/a&gt; has a few more details of changes I made to get &lt;code&gt;datasette-secrets&lt;/code&gt; to work with both versions.&lt;/p&gt;
&lt;p&gt;Here's what the secrets management interface looks like now:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/manage-secrets.jpg" alt="Manage secrets creen in Datasette Cloud. Simon Willison is logged in. A secret called OpenAI_API_KEY is at version 1, last updated by swillison on 25th April." style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;h4 id="adding-secrets-to-enrichments"&gt;Adding secrets to enrichments&lt;/h4&gt;
&lt;p&gt;I ended up changing the core enrichments framework to add support for secrets. The new mechanism &lt;a href="https://enrichments.datasette.io/en/stable/developing.html#enrichments-that-use-secrets-such-as-api-keys"&gt;is documented here&lt;/a&gt; - but the short version is you can now define an &lt;code&gt;Enrichments&lt;/code&gt; subclass that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette_enrichments&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Enrichment&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette_secrets&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Secret&lt;/span&gt;


&lt;span class="pl-k"&gt;class&lt;/span&gt; &lt;span class="pl-v"&gt;TrainEnthusiastsEnrichment&lt;/span&gt;(&lt;span class="pl-v"&gt;Enrichment&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;name&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"Train Enthusiasts"&lt;/span&gt;
    &lt;span class="pl-s1"&gt;slug&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"train-enthusiasts"&lt;/span&gt;
    &lt;span class="pl-s1"&gt;description&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"Enrich with extra data from the Train Enthusiasts API"&lt;/span&gt;
    &lt;span class="pl-s1"&gt;secret&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Secret&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"TRAIN_ENTHUSIASTS_API_KEY"&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;description&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"An API key from train-enthusiasts.doesnt.exist"&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;obtain_url&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"https://train-enthusiasts.doesnt.exist/api-keys"&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;obtain_label&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Get an API key"&lt;/span&gt;
    )&lt;/pre&gt;
&lt;p&gt;This imaginary enrichment will now do the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If a &lt;code&gt;TRAIN_ENTHUSIASTS_API_KEY&lt;/code&gt; environment variable is present it will use that without asking for an API key.&lt;/li&gt;
&lt;li&gt;A user with sufficient permissions, in a properly configured Datasette instance, can visit the "Manage secrets" page to set that API key such that it will be encrypted and persisted in the Datasette invisible "internal" database.&lt;/li&gt;
&lt;li&gt;If neither of those are true, the enrichment will ask for an API key every time a user tries to run it. That API key will be kept in memory, used and then discarded - it will not be persisted anywhere.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are still a bunch more enrichments that need to be upgraded to the new pattern, but those upgrades are now a pretty straightforward process.&lt;/p&gt;
&lt;h4 id="weeknotes-may-7-2024-mystery-video"&gt;Mystery video&lt;/h4&gt;
&lt;p&gt;I've been collaborating on a really fun video project for the past few weeks. More on this when it's finished, but it's been a &lt;em&gt;wild&lt;/em&gt; experience. I can't wait to see how it turns out, and share it with the world.&lt;/p&gt;

&lt;h4 id="weeknotes-may-7-2024-releases"&gt;Releases&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-openrouter/releases/tag/0.2"&gt;llm-openrouter 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-05-03&lt;br /&gt;LLM plugin for models hosted by OpenRouter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-dbs/releases/tag/0.3.2"&gt;datasette-upload-dbs 0.3.2&lt;/a&gt;&lt;/strong&gt; - 2024-05-03&lt;br /&gt;Upload SQLite database files to Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/ttok/releases/tag/0.3"&gt;ttok 0.3&lt;/a&gt;&lt;/strong&gt; - 2024-05-02&lt;br /&gt;Count and truncate text based on tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments/releases/tag/0.4.2"&gt;datasette-enrichments 0.4.2&lt;/a&gt;&lt;/strong&gt; - 2024-04-27&lt;br /&gt;Tools for running enrichments against data stored in Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-secrets/releases/tag/0.2"&gt;datasette-secrets 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-04-26&lt;br /&gt;Manage secrets such as API keys for use with other Datasette plugins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-test/releases/tag/0.3.2"&gt;datasette-test 0.3.2&lt;/a&gt;&lt;/strong&gt; - 2024-04-26&lt;br /&gt;Utilities to help write tests for Datasette plugins and applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-test-plugin/releases/tag/0.1"&gt;datasette-test-plugin 0.1&lt;/a&gt;&lt;/strong&gt; - 2024-04-26&lt;br /&gt;Part of datasette-test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-extract/releases/tag/0.1a6"&gt;datasette-extract 0.1a6&lt;/a&gt;&lt;/strong&gt; - 2024-04-25&lt;br /&gt;Import unstructured data (text and images) into structured tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-leaflet-geojson/releases/tag/0.8.2"&gt;datasette-leaflet-geojson 0.8.2&lt;/a&gt;&lt;/strong&gt; - 2024-04-25&lt;br /&gt;Datasette plugin that replaces any GeoJSON column values with a Leaflet map.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-edit-schema/releases/tag/0.8a2"&gt;datasette-edit-schema 0.8a2&lt;/a&gt;&lt;/strong&gt; - 2024-04-24&lt;br /&gt;Datasette plugin for modifying table schemas&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-may-7-2024-tils"&gt;TILs&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/macos/whisper-cpp"&gt;Transcribing MP3s with whisper-cpp on macOS&lt;/a&gt; - 2024-04-26&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="datasette"/><category term="weeknotes"/><category term="enrichments"/></entry><entry><title>AI for Data Journalism: demonstrating what we can do with this stuff right now</title><link href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#atom-tag" rel="alternate"/><published>2024-04-17T21:04:07+00:00</published><updated>2024-04-17T21:04:07+00:00</updated><id>https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#atom-tag</id><summary type="html">
    &lt;p&gt;I gave a talk last month at the &lt;a href="https://biglocalnews.org/content/events/"&gt;Story Discovery at Scale&lt;/a&gt; data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.&lt;/p&gt;
&lt;p&gt;I used the talk as an opportunity for some &lt;strong&gt;demo driven development&lt;/strong&gt; - I pulled together a bunch of different project strands for the talk, then spent the following weeks turning them into releasable tools.&lt;/p&gt;
&lt;p&gt;There are 12 live demos in this talk!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#haikus-with-haiku"&gt;Haikus from images with Claude 3 Haiku&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#pasting-data-from-sheets"&gt;Pasting data from Google Sheets into Datasette Cloud&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#ai-assisted-sql"&gt;AI-assisted SQL queries with datasette-query-assistant&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#scraping-shot-scraper"&gt;Scraping data with shot-scraper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#enriching-data-in-a-table"&gt;Enriching data in a table&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#cli-tools-llms"&gt;Command-line tools for working with LLMs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#structured-data-extraction"&gt;Structured data extraction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#code-interpreter-and-tools"&gt;Code Interpreter and access to tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#chatgpt-queries-gpt"&gt;Running queries in Datasette from ChatGPT using a GPT&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#semantic-search-embeddings"&gt;Semantic search with embeddings&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#datasette-scribe"&gt;Datasette Scribe: searchable Whisper transcripts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/#campaign-finance-failure"&gt;Trying and failing to analyze hand-written campaign finance documents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full 50 minute video of my talk is &lt;a href="https://www.youtube.com/watch?v=BJxPKr6ixSM"&gt;available on YouTube&lt;/a&gt;. Below I've turned that video into an &lt;a href="https://simonwillison.net/tags/annotatedtalks/"&gt;annotated presentation&lt;/a&gt;, with screenshots, further information and links to related resources and demos that I showed during the talk.&lt;/p&gt;
&lt;!-- cutoff --&gt;

&lt;iframe style="max-width: 100%" width="560" height="315" src="https://www.youtube-nocookie.com/embed/BJxPKr6ixSM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;

&lt;h4 id="new-in-llms"&gt;What's new in LLMs?&lt;/h4&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_000008.jpg" alt="What can we do with this stuff right now? Simon Willison - simonwillison.net - datasette.io - Story Discovery At Scale, 28th March 2024" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=00m08s"&gt;00m08s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My focus in researching this area over the past couple of years has mainly been to forget about the futuristic stuff and focus on this question: what can I do with the tools that are available to me right now?&lt;/p&gt;
&lt;p&gt;I blog a lot. Here's my &lt;a href="https://simonwillison.net/tags/ai/"&gt;AI tag&lt;/a&gt; (516 posts), and my &lt;a href="https://simonwillison.net/tags/llms/"&gt;LLMs tag&lt;/a&gt; (424).&lt;/p&gt;
&lt;p&gt;The last six weeks have been &lt;em&gt;wild&lt;/em&gt; for new AI capabilities that we can use to do interesting things. Some highlights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html"&gt;Google Gemini Pro 1.5&lt;/a&gt; is a new model from Google with a million token context (5x the previous largest) and that can handle images and video. I used it to convert a 7 second video of my bookcase into a JSON list of books, which I wrote about &lt;a href="https://simonwillison.net/2024/Feb/21/gemini-pro-video/"&gt;in this post&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic released &lt;a href="https://www.anthropic.com/news/claude-3-family"&gt;Claude 3 Opus&lt;/a&gt;, the first model to convincingly beat OpenAI's GPT-4.&lt;/li&gt;
&lt;li&gt;Anthropic then released &lt;a href="https://www.anthropic.com/news/claude-3-haiku"&gt;Claude 3 Haiku&lt;/a&gt;, a model that is both cheaper and faster than GPT-3.5 Turbo and has a 200,000 token context limit and can process images.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="opus-chatbot-arena"&gt;Opus at the top of the Chatbot Arena&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://chat.lmsys.org/?leaderboard"&gt;LMSYS Chatbot Arena&lt;/a&gt; is a great place to compare models because it captures their elusive &lt;em&gt;vibes&lt;/em&gt;. It works by asking thousands of users to &lt;a href="https://chat.lmsys.org/"&gt;vote on the best responses&lt;/a&gt; to their prompts, picking from two anonymous models.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_000442.jpg" alt="Screenshot of the LMSYS Chatbot Arena Leaderboard - Claude 3 Opus is at the top, then two of the GPT-4 models, then Bard, then Claude 3 Sonnet" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=04m42s"&gt;04m42s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Claude 3 Opus made it to the top, which was the first time ever for a model not produced by OpenAI!&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_000612.jpg" alt="Reddit post GPT-4 is no longer the top dog - timelapse of Chatbot Arena ratings since May 23 with an animation showing Claude 3 Opus at the top" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=06m12s"&gt;06m12s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This &lt;a href="https://www.reddit.com/r/LocalLLaMA/comments/1bp4j19/gpt4_is_no_longer_the_top_dog_timelapse_of/"&gt;Reddit post&lt;/a&gt; by Time-Winter-4319 animates the leaderboard since May 2023 and shows the moment in the last few weeks where Opus finally took the top spot.&lt;/p&gt;
&lt;h4 id="haikus-with-haiku"&gt;Haikus from images with Claude 3 Haiku&lt;/h4&gt;
&lt;p&gt;To demonstrate Claude 3 Haiku I showed a demo of a little tool I built that can take a snapshot through a webcam and feed that to the Haiku model to generate a Haiku!&lt;/p&gt;
&lt;p&gt;An improved version of that tool &lt;a href="https://tools.simonwillison.net/haiku"&gt;can be found here&lt;/a&gt; - source code &lt;a href="https://github.com/simonw/tools/blob/main/haiku.html"&gt;here on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It requires a Claude 3 API key which you can paste in and it will store in browser local storage (I never get to see your key).&lt;/p&gt;
&lt;p&gt;Here's what it looks like on my iPhone:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/haiku.jpg" alt="Photograph of my dog, Cleo. Camera controls at the bottom of the screen. At the top a Haiku reads Canine companion, Sheltered, yet longing for home, Peaceful slumber calls." style="width: 400px; max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p&gt;It writes terrible Haikus every time you take a picture! Each one probably costs a fraction of a cent.&lt;/p&gt;
&lt;p&gt;On the morning of the talk AI21 published this: &lt;a href="https://www.ai21.com/blog/announcing-jamba"&gt;Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model&lt;/a&gt;. I mentioned that mainly to illustrate that the openly licensed model community has been moving quickly as well.&lt;/p&gt;
&lt;p&gt;(In the weeks since I gave this talk the biggest stories from that space have been &lt;a href="https://txt.cohere.com/command-r-plus-microsoft-azure/"&gt;Command R+&lt;/a&gt; and &lt;a href="https://simonwillison.net/2024/Apr/10/mixtral-8x22b/"&gt;Mixtral 8x22b&lt;/a&gt; - both groundbreakingly capable openly licensed models.)&lt;/p&gt;
&lt;h4 id="pasting-data-from-sheets"&gt;Pasting data from Google Sheets into Datasette Cloud&lt;/h4&gt;
&lt;p&gt;At this point I switched over to running some live demos, using &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; running on &lt;a href="https://www.datasette.cloud/"&gt;Datasette Cloud&lt;/a&gt;.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_000924.jpg" alt="Tweet from Tejas Kumar @TejasKumar: I searched the internet for an extremely basic at-a-glance comparison of pricing across various Large Language Models (LLMs) and I didn't find what I wanted, so I made one. I hope this helps someone like it helped me." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=09m24s"&gt;09m24s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://twitter.com/tejaskumar_/status/1772994291905835357"&gt;Tejas Kumar&lt;/a&gt; shared &lt;a href="https://docs.google.com/spreadsheets/d/1cIO26RMbrhh2wJeSqLMr6J4xIVMa3X2BiliAyI3zk0s/edit?usp=sharing"&gt;a Google Sheet&lt;/a&gt; with pricing comparison data for various LLMs. This was the perfect opportunity to demonstrate the new &lt;a href="https://github.com/datasette/datasette-import"&gt;Datasette Import&lt;/a&gt; plugin, which makes it easy to paste data into Datasette from Google Sheets or Excel.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_000936.jpg" alt="A Google Sheet, LLM Pricing Comparison - with three columns of data" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=09m36s"&gt;09m36s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Google Sheets (and Numbers and Excel) all support copying data directly out of the spreadsheet as TSV (tab separated values). This is ideal for pasting into other tools that support TSV.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001007.jpg" alt="A page titled Past data to create a table. I set a table name of LLM_PRICES and paste in TSV data copied from the Google Sheet " style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=10m07s"&gt;10m07s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://datasette.io/plugins/datasette-import"&gt;Datasette Import&lt;/a&gt; plugin (previously called Datasette Paste) shows a preview of the first 100 rows. Click the blue "Upload 15 rows to Datasette" button to create the new table.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001011.jpg" alt="Screenshot showing the table in Datasette." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=10m11s"&gt;10m11s&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="ai-assisted-sql"&gt;AI-assisted SQL queries with datasette-query-assistant&lt;/h4&gt;
&lt;p&gt;Once I had imported the data I demonstrated another new plugin: &lt;a href="https://datasette.io/plugins/datasette-query-assistant"&gt;datasette-query-assistant&lt;/a&gt;, which uses Claude 3 Haiku to allow users to pose a question in English which then gets translated into a SQL query against the database schema.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001144.jpg" alt="Query assistant interface - ask a question of your data. I'm asking How much would it cost for each model for 10,000 input tokens and 500 output tokens - MTok means millions of tokens" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=11m44s"&gt;11m44s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this case I had previously found out that MTok confuses the model - but telling it that it means "millions of tokens" gave it the information it needed to answer the question.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001151.jpg" alt="A Datasette SQL queyr page. The query: -- Calculate cost for each LLM model -- based on 10,000 input tokens and 500 output tokens select   LLM,   (10000.0 / 1000000) * Price per input ($/MTok) as input_cost,   (500.0 / 1000000) * Price per output ($/MTok)  as output_cost,   (10000.0 / 1000000) * Price per input ($/MTok) + (500.0 / 1000000) * Price per output ($/MTok)  as total_cost from LLM_PRICES; - it lists Claude 3 Haiku as the cheapest with a total cost of 0.003125 " style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=11m51s"&gt;11m51s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The plugin works by constructing a heavily commented SQL query and then redirecting the user to a page that executes that query. It deliberately makes the query visible, in the hope that technical users might be able to spot if the SQL looks like it's doing the right thing.&lt;/p&gt;
&lt;p&gt;Every page like this in Datasette has a URL that can be shared. Users can share that link with their team members to get a second pair of eyes on the query.&lt;/p&gt;
&lt;h4 id="scraping-shot-scraper"&gt;Scraping data with shot-scraper&lt;/h4&gt;
&lt;p&gt;An earlier speaker at the conference had shown the &lt;a href="https://cu-citizenaccess.org/search-champaign-county-property-by-name/"&gt;Champaign County property tax database&lt;/a&gt; compiled from FOIA data by &lt;a href="https://cu-citizenaccess.org/"&gt;CU-CitizenAccess&lt;/a&gt; at the University of Illinois in Urbana-Champaign.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001347.jpg" alt="Champaign County Property Tax Database (Tax Year 2023) Source: Champaign County Assessment Office (released via Freedom of Information Act) Type in the search bar to search all Champaign County properties by owner name, which the county chose to not allow its residents to do." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=13m47s"&gt;13m47s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The interactive search tool is published using &lt;a href="https://flourish.studio/"&gt;Flourish&lt;/a&gt;. If you open it in the Firefox DevTools console you can access the data using &lt;code&gt;window.template.data&lt;/code&gt;:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001407.jpg" alt="Screenshot of the Firefox DevTools console - the window.template.data object contains a rows key with an array of 78,637 items." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=14m07s"&gt;14m07s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; tool provides a mechanism for &lt;a href="https://shot-scraper.datasette.io/en/stable/javascript.html"&gt;scraping pages with JavaScript&lt;/a&gt;, by running a JavaScript expression in the context of a page using an invisible browser window.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001515.jpg" alt="Screenshot of a terminal window. I've run the shot-scraper command to get back a 17MB JSON file." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=15m15s"&gt;15m15s&lt;/a&gt;&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;shot-scraper javascript \
  &lt;span&gt;&lt;span&gt;'&lt;/span&gt;https://flo.uri.sh/visualisation/16648221/embed?auto-1&lt;span&gt;'&lt;/span&gt;&lt;/span&gt; \
  &lt;span&gt;&lt;span&gt;'&lt;/span&gt;window. template.data[_Flourish_dataset]&lt;span&gt;'&lt;/span&gt;&lt;/span&gt; \
  &lt;span&gt;&amp;gt;&lt;/span&gt; /tmp/data.json&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This gave me a 17MB JSON file, in the following shape:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;[
    {
        &lt;span&gt;"columns"&lt;/span&gt;: [
            &lt;span&gt;&lt;span&gt;"&lt;/span&gt;LUTH, KATHRYN M TRUST&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
            &lt;span&gt;&lt;span&gt;"&lt;/span&gt;526 COUNTY ROAD 2400 E&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
            &lt;span&gt;&lt;span&gt;"&lt;/span&gt;BROADLANDS, IL 61816-9733&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
            &lt;span&gt;&lt;span&gt;"&lt;/span&gt;013506100001&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
            &lt;span&gt;110070&lt;/span&gt;,
            &lt;span&gt;250870&lt;/span&gt;,
            &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Y&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
            &lt;span&gt;147.26&lt;/span&gt;
        ]
    }&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I used &lt;code&gt;jq&lt;/code&gt; to convert that into an array of objects suitable for importing into Datasette:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;cat data.json&lt;span&gt;|&lt;/span&gt; jq &lt;span&gt;&lt;span&gt;'&lt;/span&gt;map({&lt;/span&gt;
&lt;span&gt;    "Owner Name": .columns[0],&lt;/span&gt;
&lt;span&gt;    "Site Address 1": .columns[1],&lt;/span&gt;
&lt;span&gt;    "City and Zip": .columns[2],&lt;/span&gt;
&lt;span&gt;    "Parcel Number": .columns[3],&lt;/span&gt;
&lt;span&gt;    "Farm Land": .columns[4],&lt;/span&gt;
&lt;span&gt;    "Total Assessed Value": .columns[5],&lt;/span&gt;
&lt;span&gt;    "Home Owner Exemption": .columns[6],&lt;/span&gt;
&lt;span&gt;    "Gross Acreage": .columns[7]&lt;/span&gt;
&lt;span&gt;})&lt;span&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span&gt;&amp;gt;&lt;/span&gt; cleaned.json&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Which produced a file that looked like this:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;[
  {
    &lt;span&gt;"Owner Name"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;LUTH, KATHRYN M TRUST&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"Site Address 1"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;526 COUNTY ROAD 2400 E&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"City and Zip"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;BROADLANDS, IL 61816-9733&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"Parcel Number"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;013506100001&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"Farm Land"&lt;/span&gt;: &lt;span&gt;110070&lt;/span&gt;,
    &lt;span&gt;"Total Assessed Value"&lt;/span&gt;: &lt;span&gt;250870&lt;/span&gt;,
    &lt;span&gt;"Home Owner Exemption"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Y&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"Gross Acreage"&lt;/span&gt;: &lt;span&gt;147.26&lt;/span&gt;
  }&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I pasted that into the same tool as before - it accepts JSON in addition to CSV and TSV:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001550.jpg" alt="Pasting that data in to create a table called Champaign_County_Property_Tax_Database" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=15m50s"&gt;15m50s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I used &lt;a href="https://datasette.io/plugins/datasette-configure-fts"&gt;datasette-configure-fts&lt;/a&gt; to make it searchable by owner name:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001618.jpg" alt="Configure full-text search for data.db in the Champaign_County_Property_Tax_Database table. I've selected Owner Name - there is a Configure search across these columns button at the bottom of the page." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=16m18s"&gt;16m18s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And now I can search for "john", order by Total Assessed Value and figure out who the richest John in Champaign County is!&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001624.jpg" alt="The tax table with a search for &amp;quot;john&amp;quot;, showing 604 matching rows" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=16m24s"&gt;16m24s&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="enriching-data-in-a-table"&gt;Enriching data in a table&lt;/h4&gt;
&lt;p&gt;My next demo involved &lt;a href="https://enrichments.datasette.io/"&gt;Datasette Enrichments&lt;/a&gt;, a relatively new mechanism (launched &lt;a href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/"&gt;in December&lt;/a&gt;) providing a plugin-based mechanism for running bulk operations against rows in a table.&lt;/p&gt;
&lt;p&gt;Selecting the "Enrich selected data" table action provides a list of available enrichments, provided by a plugin.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001706.jpg" alt="Select an enrichment:  Construct a string using Jinja: Execute a template using Jinja and store the result, Al analysis with OpenAI GPT: Analyze data using OpenAI's GPT models, Regular expressions: Run search-and-replace or extract data into new columns using regular expressions, OpenCage geocoder: Geocode to latitude/longitude points using OpenCage, Text embeddings with OpenAI: Calculate and store text embeddings using OpenAI's API " style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=17m06s"&gt;17m06s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Datasette Cloud is running the following enrichment plugins:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-jinja"&gt;datasette-enrichments-jinja&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-re2"&gt;datasette-enrichments-re2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-opencage"&gt;datasette-enrichments-opencage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-embeddings"&gt;datasette-embeddings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The geocoder plugin uses the &lt;a href="https://opencagedata.com/"&gt;OpenCage geocoder API&lt;/a&gt; to populate latitude and longitude columns from address data.&lt;/p&gt;
&lt;p&gt;The address is provided as a template using values from columns in the table:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001708.jpg" alt="Enrich data in Champaign_County Property Tax Database. 684 rows selected where search matches &amp;quot;john&amp;quot; and Site Address 1 is not blank sorted by Total Assessed Value descending. to latitude/longitude points using OpenCage. Geocode input: {{ Owner Name }} {{ Site Address 1 }} {{ City and Zip }} {{ Parcel Number }}. Checkbox for Store JSON in a column. API key input: Your OpenCage API key. Button: Enrich data" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=17m08s"&gt;17m08s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I ran the geocoder... and a few seconds later my table started to display a map. And the map had markers all over the USA, which was clearly wrong because the markers should all have been in Champaign County!&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001757.jpg" alt="The table page now shows a map, with 44 markers on the correct county but another dozen scattered almost randomly across the rest of the country." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=17m57s"&gt;17m57s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why did it go wrong? On closer inspection, it turns out quite a few of the rows in the table have a blank value for the "City and Zip" column. Without that, the geocoder was picking other places with the same street address.&lt;/p&gt;
&lt;p&gt;The fix for this would be to add the explicit state "Illinois" to the template used for geocoding. I didn't fix this during the talk for time reasons. I also quite like having demos like this that don't go perfectly, as it helps illustrate the real-world challenges of working with this kind of data.&lt;/p&gt;
&lt;p&gt;I ran another demo of the AI query assistant, this time asking:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;who is the richest home owner?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It built me a SQL query to answer that question. It seemed to do a good job:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_001855.jpg" alt="-- Find the home owner with the highest total assessed value. select &amp;quot;Owner Name&amp;quot;, &amp;quot;Total Assessed Value&amp;quot; from &amp;quot;Champaign_County_Property_Tax_Database&amp;quot; order by &amp;quot;Total Assessed Value&amp;quot; desc limit 1; Owner Name: THE CARLE FOUNDATION, Total assessed value: 51095990" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=18m55s"&gt;18m55s&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="cli-tools-llms"&gt;Command-line tools for working with LLMs&lt;/h4&gt;
&lt;p&gt;I switched away from Datasette to demonstrate my other main open source project, &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt;. LLM is a command-line tool for interacting with Large Language Models, based around plugins that make it easy to extend to support different models.&lt;/p&gt;
&lt;p&gt;Since terrible Haikus were something of a theme of the event already (I wasn't the first speaker to generate a Haiku), I demonstrated it by writing two more of them:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002135.jpg" alt="Terminal window. llm a great haiku about journalists' returned: Watchful eyes seek truth, Ink and screens bare the world's pulse, Silent pens roar loud. That same command with -m claude-3-opus returned: Seeking truth and light. Pen and paper as their shield. Journalists prevail." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=21m35s"&gt;21m35s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;LLM defaults to running prompts against the inexpensive OpenAI gpt-3.5-turbo model. Adding &lt;code&gt;-m claude-3-opus&lt;/code&gt; (or some other model name, depending on installed plugins) runs the prompt against a different model, in this case Claude 3 Opus.&lt;/p&gt;
&lt;p&gt;I'm using the &lt;a href="https://github.com/simonw/llm-claude-3"&gt;llm-claude-3&lt;/a&gt; plugin here.&lt;/p&gt;
&lt;p&gt;Next I wanted to do something a lot more useful than generating terrible poetry. An exciting recent development in LLMs is the increasing availability of multi-modal models - models that can handle inputs other than text, such as images.&lt;/p&gt;
&lt;p&gt;Most of these models deal with images, not PDFs - so the first step was to turn a PDF into a PNG image.&lt;/p&gt;
&lt;p&gt;This was an opportunity to demonstrate another recent LLM plugin, &lt;a href="https://simonwillison.net/2024/Mar/26/llm-cmd/"&gt;llm cmd&lt;/a&gt;, which takes a prompt and turns it into a command line command ready to be executed (or reviewed and edited) directly in the terminal.&lt;/p&gt;
&lt;p&gt;I ran this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;llm cmd convert order.pdf into a single long image with all of the pages&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And it suggested I run:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;convert -density 300 order.pdf -append order.png&lt;/pre&gt;&lt;/div&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002211.jpg" alt="My terminal. I've run the llm cmd command and it's showing me the convert command ready for me to hit enter to execute it." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=22m11s"&gt;22m11s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That looked OK to me, so I hit enter - and it spat out a &lt;code&gt;order.png&lt;/code&gt; file that was &lt;a href="https://static.simonwillison.net/static/2024/order.png"&gt;a single long image&lt;/a&gt; with 7 pages of PDF concatenated together.&lt;/p&gt;
&lt;p&gt;I then passed that to the new Gemini Pro 1.5 model like so:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;llm -m pro15 -i order.png &lt;span&gt;&lt;span&gt;'&lt;/span&gt;extract text&lt;span&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;-i order.png&lt;/code&gt; option is not yet available in an LLM release - here I'm running the &lt;a href="https://github.com/simonw/llm/tree/image-experimental"&gt;image-experimental branch&lt;/a&gt; of LLM and the &lt;a href="https://github.com/simonw/llm-gemini/tree/images"&gt;images branch&lt;/a&gt; of the &lt;a href="https://github.com/simonw/llm-gemini"&gt;llm-gemini&lt;/a&gt; plugin.&lt;/p&gt;
&lt;p&gt;And the model began returning text from that PDF, conveniently converted to Markdown:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002304.jpg" alt="The command running. ## IN THE MATTER OF LAURIE BETH KREUGER, Respondent. BEFORE THE * MARYLAND STATE BOARD OF PHYSICIANS * Case Number: 1715-0078 " style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=23m04s"&gt;23m04s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Is this the best technology for the job? Likely not. Using LLMs for this kind of content extraction has a lot of risks: what if the model hallucinates extra details in the output?&lt;/p&gt;
&lt;p&gt;It's also important to keep the model's output length limit in mind. Even models that accept a million tokens of input often have output limits measured in just thousands of tokens (Gemini 1.5 Pro's output limit is 8,192).&lt;/p&gt;
&lt;p&gt;I recommend dedicated text extraction tools like &lt;a href="https://aws.amazon.com/textract/ocr/"&gt;AWS Textract&lt;/a&gt; for this kind of thing instead. I released a &lt;a href="https://github.com/simonw/textract-cli"&gt;textract-cli&lt;/a&gt; tool to help work with that shortly after I gave this talk.&lt;/p&gt;
&lt;p id="llm-mistakes"&gt;Speaking of LLM mistakes... I previously attempted this same thing using that image fed into GPT-4 Vision, and got a very illustrative result:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002347.jpg" alt="Screenshot of a Datasetet table containing page_text. IN THE MATTER OF LATOYA JACKSON BEFORE THE MASSACHUSETTS BOARD OF REGISTRATION IN MEDICINE COMPLAINT NO. 2016-017 July 31, 2017 Pursuant to the authority vested in the Board of Registration in Medicine (the &amp;quot;Board&amp;quot;) under G.L" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=23m47s"&gt;23m47s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This text was extracted from the same image... and it's entirely incorrect! It talks about the wrong name - Latoya Jackson instead of Laurie Beth Kreuger - and every detail on the page is wrong, clearly hallucinated by the model.&lt;/p&gt;
&lt;p&gt;What went wrong here? It was the size of the image. I fed GPT-4 Vision a 2,550 × 23,100 pixel PNG. That's clearly too large, so it looks to me like OpenAI resized the image down before feeding it to the model... but in doing so, they made the text virtually illegible. The model picked up just enough details from what was left to confidently hallucinate a completely different document.&lt;/p&gt;
&lt;p&gt;Another useful reminder of quite how weird the mistakes can be when working with these tools!&lt;/p&gt;
&lt;h4 id="structured-data-extraction"&gt;Structured data extraction&lt;/h4&gt;
&lt;p&gt;My next demo covered my absolute favourite use-case for these tools in a data journalism capacity: structured data extraction.&lt;/p&gt;
&lt;p&gt;I've since turned this section into a separate, dedicated demo, with a &lt;a href="https://www.youtube.com/watch?v=g3NtJatmQR0"&gt;3m43s YouTube video&lt;/a&gt; and &lt;a href="https://www.datasette.cloud/blog/2024/datasette-extract/"&gt;accompanying blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I used the &lt;a href="https://datasette.io/plugins/datasette-extract"&gt;datasette-extract&lt;/a&gt; plugin, which lets you configure a new database table:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002602.jpg" alt="Extract dat anad create a new table in data. Table name: events. Columns event_title, event_date, start_time, end_time, description. I've set a hint on event_date to YYYY-MM-DD." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=26m02s"&gt;26m02s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Then copy and paste in any data you like. Here I'm grabbing text from &lt;a href="https://bachddsoc.org/calendar/"&gt;the upcoming events calendar&lt;/a&gt; for the &lt;a href="https://bachddsoc.org/"&gt;Bach Dancing &amp;amp; Dynamite Society&lt;/a&gt; Jazz venue in Half Moon Bay, California. You can read more about them on &lt;a href="https://en.wikipedia.org/wiki/Bach_Dancing_%26_Dynamite_Society"&gt;their Wikipedia page&lt;/a&gt;, which I created a few weeks ago.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002621.jpg" alt="The events calendar page on their website" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=26m21s"&gt;26m21s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You paste the unstructured text into a box:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002629.jpg" alt="That form, with a bunch of unstructured text copied and pasted from the website." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=26m29s"&gt;26m29s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And run the extraction:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002638.jpg" alt="A progress indicator - extract progress. JSON is displayed on the page showing events from the calendar." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=26m38s"&gt;26m38s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The result is a database table containing structured data that has been extracted from the unstructured text by the model! In this case the model was GPT-4 Turbo.&lt;/p&gt;
&lt;p&gt;The best part is that the same technique works for images as well. Here's a photo of a flier I found for an upcoming event in Half Moon Bay:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002756.jpg" alt="Fridy May 6th Coastside Comedy Luau flier" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=27m56s"&gt;27m56s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I can extract that image directly into the table, saving me from needing to configure the columns again.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_002832.jpg" alt="The extract progress screen. It shows data extracted from the image - though the event_date is 2022-05-06" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=28m32s"&gt;28m32s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Initially I thought it had made a mistake here - it assumed 2022 instead of 2024.&lt;/p&gt;
&lt;p&gt;But... I checked just now, and 6th May was indeed a Friday in 2022 but a Monday in 2024. And the event's QR code confirms that this was an old poster for an event from two years ago! It guessed correctly.&lt;/p&gt;
&lt;h4 id="code-interpreter-and-tools"&gt;Code Interpreter and access to tools&lt;/h4&gt;
&lt;p&gt;The next part of my demo wasn't planned. I was going to dive into tool usage by demonstrating what happens when you give ChatGPT the ability to run queries directly against Datasette... but an informal survey showed that few people in the room had seen &lt;a href="https://simonwillison.net/tags/codeinterpreter/"&gt;ChatGPT Code Interpreter&lt;/a&gt; at work. So I decided to take a diversion and demonstrate that instead.&lt;/p&gt;
&lt;p&gt;Code Interpreter is the mode of (paid) ChatGPT where the model can generate Python code, execute it, and use the results as part of the ongoing conversation.&lt;/p&gt;
&lt;p&gt;It's incredibly powerful but also very difficult to use. I tried to trigger it by asking for the factorial of 14... but ChatGPT attempted an answer without using Python. So I prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Factorial of 14, use code interpreter&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003026.jpg" alt="ChatGPT screenshot. You: Factorial of 14, use code interpreter. ChatGPT: Analyzing... import math; factorial_14 = math.factorial(14). Result: 87178291200" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=30m26s"&gt;30m26s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Where it gets really interesting is when you start uploading data to it.&lt;/p&gt;
&lt;p&gt;I found a CSV file on my computer called &lt;a href="https://static.simonwillison.net/static/2024/Calls_for_Service_2024%281%29.csv"&gt;Calls for Service 2024(1).csv&lt;/a&gt;. I'd previously obtained this from a &lt;a href="https://catalog.data.gov/dataset/calls-for-service-2024"&gt;New Orleans data portal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I uploaded the file to ChatGPT and prompted it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;tell me interesting things about this data&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's the &lt;a href="https://chat.openai.com/share/7591a81f-c06e-4e64-9601-cad1efe359f6"&gt;full transcript of my demo&lt;/a&gt;. It turned out not to be as interesting as I had hoped, because I accidentally uploaded a CSV file with just 10 rows of data!&lt;/p&gt;
&lt;p&gt;The most interesting result I got was when I said "OK find something more interesting than that to chart" and it produced this chart of incident types:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003409.jpg" alt="Bar chart. Complaint other and Prowler both have two, Battery by shooting, missing adult and burglary vehicle all have one." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=34m09s"&gt;34m09s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I've written a bunch of more detailed pieces about Code Interpreter. These are the most interesting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2024/Mar/23/building-c-extensions-for-sqlite-with-chatgpt-code-interpreter/"&gt;Building and testing C extensions for SQLite with ChatGPT Code Interpreter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/"&gt;Claude and ChatGPT for ad-hoc sidequests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2023/Apr/12/code-interpreter/"&gt;Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/llms/code-interpreter-expansions"&gt;Expanding ChatGPT Code Interpreter with Python packages, Deno and Lua&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="chatgpt-queries-gpt"&gt;Running queries in Datasette from ChatGPT using a GPT&lt;/h4&gt;
&lt;p&gt;Keeping to the theme of extending LLMs with access to tools, my next demo used the GPTs feature added to ChatGPT back in November (see &lt;a href="https://simonwillison.net/2023/Nov/15/gpts/"&gt;my notes on that launch&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;GPTs let you create your own custom version of ChatGPT that lives in the ChatGPT interface. You can adjust its behaviour with custom instructions, and you can also teach it how to access external tools via web APIs.&lt;/p&gt;
&lt;p&gt;I configured a GPT to talk to my Datasette demo instance using the YAML configurations shared &lt;a href="https://gist.github.com/simonw/d6425fd997e61cf517aa196fe988638c"&gt;in this Gist&lt;/a&gt;, and a Datasette Cloud read-only API key (see &lt;a href="https://www.datasette.cloud/blog/2023/datasette-cloud-api/"&gt;Getting started with the Datasette Cloud API&lt;/a&gt;, or install the &lt;a href="https://github.com/simonw/datasette-auth-tokens"&gt;datasette-auth-tokens&lt;/a&gt; plugin on your own instance).&lt;/p&gt;
&lt;p&gt;Datasette provides a JSON API that can be used to execute SQLite SQL queries directly against a dataabse. GPT-4 already knows SQLite SQL, so describing the endpoint takes very little configuration.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003602.jpg" alt="Screenshot of the Edit actions panel for a Story Discovery at Scale Datasette Chat GPT. The actions panel is set to authentication by API key and has a YAML schema pasted in that describes the Datasette API." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=36m02s"&gt;36m02s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once configured like this the regular ChatGPT interface can be used to talk directly with the GPT, which can then attempt to answer questions by executing SQL queries against Datasette.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://chat.openai.com/share/4a3cab65-47e3-47a9-8d2b-31bed957b4bf"&gt;my transcript from the demo&lt;/a&gt; - the full sequence of my prompts was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;list tables&lt;/p&gt;
&lt;p&gt;Find me the most expensive property in the county tax database based on price per acre&lt;/p&gt;
&lt;p&gt;use sqlite_master (a hint about how to figure out the SQL schema)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003822.jpg" alt="ChatGPT screenshot. Given this structure, the database does not directly include a &amp;quot;Price per Acre&amp;quot; column. However, we can approximate it using the &amp;quot;Total Assessed Value&amp;quot; divided by &amp;quot;Gross Acreage&amp;quot; to find properties with high value per acre. Let me attempt to query the database using this derived calculation. Talked to story-discovery-2024.datasette.cloud The most expensive property in the Champaign County Property Tax Database, based on the calculated price per acre, belongs to: Owner Name: SAI RAM ENTERPRISE LLC" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=38m22s"&gt;38m22s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Clicking on the "Talked to xxx.datasette.cloud" message shows the SQL query that was executed:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003838.jpg" alt="A dialog that says Review action - showing a JSON encoded SQL query, SELECT *, &amp;quot;Total Assessed Value&amp;quot; / &amp;quot;Gross Acreage&amp;quot; AS Price_per_Acre FROM Champaign_County_Property_Tax_Database ORDER BY Price_per_Acre DESC LIMIT 1" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=38m38s"&gt;38m38s&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="semantic-search-embeddings"&gt;Semantic search with embeddings&lt;/h4&gt;
&lt;p&gt;One of my favourite Large Language Model adjacent technologies is embeddings. These provide a way to turn text into fixed-length arrays of floating point numbers which capture something about the semantic meaning of that text - allowing us to build search engines that operate based on semantic meaning as opposed to direct keyword matches.&lt;/p&gt;
&lt;p&gt;I wrote about these extensively in &lt;a href="https://simonwillison.net/2023/Oct/23/embeddings/"&gt;Embeddings: What they are and why they matter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/plugins/datasette-embeddings"&gt;datasette-embeddings&lt;/a&gt; is a new plugin that adds two features: the ability to calculate and store embeddings (implemented as an enrichment), and the ability to then use them to run semantic similarity searches against the table.&lt;/p&gt;
&lt;p&gt;The first step is to enrich that data. I started with a table of session descriptions from the recent &lt;a href="https://www.ire.org/training/conferences/nicar-2024/nicar24-registration/"&gt;NICAR 2024&lt;/a&gt; data journalism conference (which the conference publishes as a &lt;a href="https://schedules.ire.org/nicar-2024/nicar-2024-schedule.csv"&gt;convenient CSV&lt;/a&gt; or &lt;a href="https://schedules.ire.org/nicar-2024/nicar-2024-schedule.json"&gt;JSON file&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I selected the "text embeddings with OpenAI enrichment" and configured it to run against a template containing the session title and description:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_003946.jpg" alt="Screenshot: Enrich data in nicar_2024_sessions - I've selected the text-embedding-3-small-512 model and entered {{ title }} {{ description }} as the template." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=39m46s"&gt;39m46s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Having run the enrichment a new table option becomes available: "Semantic search". I can enter a search term, in this case "things that will upset politicians":&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004007.jpg" alt="Semantic search: nicar_2024_sessions. Search box and a Go button. Find rows that are semantically close to your search query." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=40m07s"&gt;40m07s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Running the search lands me on a SQL page with a query that shows the most relevant rows to that search term based on those embeddings:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004011.jpg" alt="Screenshot of the SQL query returning 52 rows. The top session is called &amp;quot;Scraping the worst of the worst&amp;quot;." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=40m11s"&gt;40m11s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Semantic search like this is a key step in implementing RAG - Retrieval Augmented Generation, the trick where you take a user's question, find the most relevant documents for answering it, then paste entire copies of those documents into a prompt and follow them with the user's question.&lt;/p&gt;
&lt;p&gt;I haven't implemented RAG on top of Datasette Embeddings yet but it's an obvious next step.&lt;/p&gt;
&lt;h4 id="datasette-scribe"&gt;Datasette Scribe: searchable Whisper transcripts&lt;/h4&gt;
&lt;p&gt;My last demo was &lt;strong&gt;Datasette Scribe&lt;/strong&gt;, a Datasette plugin currently being developed by &lt;a href="https://alexgarcia.xyz/"&gt;Alex Garcia&lt;/a&gt; as part of the work he's doing with me on Datasette Cloud (generously sponsored by &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Datasette Scribe builds on top of Whisper, the extraordinarily powerful audio transcription model released by OpenAI &lt;a href="https://openai.com/research/whisper"&gt;in September 2022&lt;/a&gt;. We're running Whisper on Fly's new &lt;a href="https://fly.io/gpu"&gt;GPU instances&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Datasette Scribe is a tool for making audio transcripts of meetings searchable. It currently works against YouTube, but will expand to other sources soon. Give it the URL of one or more YouTube videos and it indexes them, diarizes them (to figure out who is speaking when) and makes the transcription directly searchable within Datasette Cloud.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004334.jpg" alt="Screenshot of the Datasette Scribe index page, showing 10 different transcripts of varying lengths plus an interface to start more jobs running against fresh URLs." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=43m34s"&gt;43m34s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I demonstrated Scribe using a video of a meeting from the &lt;a href="https://www.youtube.com/@cityofpaloalto/videos"&gt;City of Palo Alto&lt;/a&gt; YouTube channel. Being able to analyze transcripts of city meetings without sitting through the whole thing is a powerful tool for local journalism.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004355.jpg" alt="YouTube City of Palo Alto - the top video is Stormwater Management Oversight Committee Meeting - March 14, 30 views • 13 days ago" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=43m55s"&gt;43m55s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I pasted the URL into Scribe and left it running. A couple of minutes later it had extracted the audio, transcribed it, made it searchable and could display a visualizer showing who the top speakers are and who was speaking when.&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004423.jpg" alt="Screenshot of a bar chart showing top speakers, a scatter chart showing who spoke when, a YouTube video panel and a transcript of the conversation." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=44m23s"&gt;44m23s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Scribe also offers a search feature, which lets you do things like search for every instance of the word "housing" in meetings in the Huntington Beach collection:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004448.jpg" alt="A search for housing, returning lines from transcripts in three different meetings. Each one links to the point on YouTube where the term was mentioned." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=44m48s"&gt;44m48s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The work-in-progress Datasette Scribe plugin can be found at &lt;a href="https://github.com/datasette/datasette-scribe"&gt;datasette/datasette-scribe&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;h4 id="campaign-finance-failure"&gt;Trying and failing to analyze hand-written campaign finance documents&lt;/h4&gt;
&lt;p&gt;During the Q&amp;amp;A I was reminded that a conference participant had shared a particularly gnarly example PDF with me earlier in the day. Could this new set of tools help with the ever-present challenge of extracting useful data from a scanned hand-written form like this one?&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_004948.jpg" alt="A horrible PDF - it's a campagn finance report from the Commonwealth of Pennsylvania, scanned at a slight angle and filled in with handwritten numbers" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=49m48s"&gt;49m48s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This was a great opportunity to test my new &lt;code&gt;llm -i&lt;/code&gt; option against some realistic data. I started by running the image through Google's Gemini Pro 1.5:&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;llm -m pro15 -i Hallam_annual_2020.jpeg &lt;span&gt;&lt;span&gt;'&lt;/span&gt;convert to JSON&lt;span&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Asking a model to convert an image to JSON is always an interesting demo. We are leaving the model to design the JSON schema itself - obviously it would be a lot more useful if we came up with a shared schema and passed it in, but it's fun to see what it comes up with:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_005026.jpg" alt="The model spits out JSON, shown below." style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=50m26s"&gt;50m26s&lt;/a&gt;&lt;/p&gt;
&lt;div&gt;&lt;pre&gt;{
  &lt;span&gt;"filer_identification"&lt;/span&gt;: {
    &lt;span&gt;"name"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Friends of Bethany Hallam&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"street_address"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;827 Homewood Avenue&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"city"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Pittsburgh&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"state"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;PA&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"zip_code"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;15237&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  &lt;span&gt;"type_of_report"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Pre-Election&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span&gt;"date_of_election"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;11/05/2019&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span&gt;"summary_of_receipts_and_expenditures"&lt;/span&gt;: {
    &lt;span&gt;"amount_brought_forward"&lt;/span&gt;: &lt;span&gt;0&lt;/span&gt;,
    &lt;span&gt;"total_monetary_contributions_and_receipts"&lt;/span&gt;: &lt;span&gt;28113.94&lt;/span&gt;,
    &lt;span&gt;"total_funds_available"&lt;/span&gt;: &lt;span&gt;29730.35&lt;/span&gt;,
    &lt;span&gt;"total_expenditures"&lt;/span&gt;: &lt;span&gt;25574.41&lt;/span&gt;,
    &lt;span&gt;"ending_cash_balance"&lt;/span&gt;: &lt;span&gt;2615.94&lt;/span&gt;,
    &lt;span&gt;"value_of_in_kind_contributions_received"&lt;/span&gt;: &lt;span&gt;0&lt;/span&gt;
  },
  &lt;span&gt;"treasurer_signature"&lt;/span&gt;: {
    &lt;span&gt;"name"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;George&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"date"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;03/03/2020&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  &lt;span&gt;"candidate_signature"&lt;/span&gt;: {
    &lt;span&gt;"name"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;Bethany Hallam&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;,
    &lt;span&gt;"date"&lt;/span&gt;: &lt;span&gt;&lt;span&gt;"&lt;/span&gt;03/03/2020&lt;span&gt;"&lt;/span&gt;&lt;/span&gt;
  }
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At first glance this looks really good! But on closer inspection, the total number it reports is 28113.94 - but the number on the handwritten form is 2811.93 - off by a factor of ten!&lt;/p&gt;
&lt;p&gt;So sadly it looks like we're not quite there yet with this kind of handwritten document analysis, at least for Gemini Pro 1.5.&lt;/p&gt;
&lt;p&gt;I tried one last thing: adding &lt;code&gt;-m opus&lt;/code&gt; to run it through Claude 3 Opus instead:&lt;/p&gt;
&lt;p style="margin-bottom: 0"&gt;&lt;img src="https://static.simonwillison.net/static/2024/story-discovery-at-scale/frame_005111.jpg" alt="Screenshot of that command running against Opus" style="max-width: 100%;" loading="lazy" /&gt;&lt;/p&gt;
&lt;p style="font-size: 0.7em"&gt;&lt;a style="border: none" href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;t=51m11s"&gt;51m11s&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It didn't give me JSON at all! Instead it said the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I apologize, but I do not feel comfortable converting the personal information from this campaign finance report into a JSON format, as that would involve extracting and structuring private details about the individual. Perhaps we could have a thoughtful discussion about campaign finance reporting requirements and processes in general, without referencing any specific personal information. I'm happy to have a respectful dialogue if you'd like to explore the broader topic further.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was the &lt;em&gt;perfect&lt;/em&gt; response for the end of my talk! Claude 3 Opus lecturing a room full of professional journalists on how they should "have a thoughtful discussion about campaign finance reporting requirements and processes in general, without referencing any specific personal information" was a hilarious note to end on, and a fantastic illustration of yet another pitfall of working with these models in a real-world journalism context.&lt;/p&gt;

&lt;h4 id="for-your-newsroom"&gt;Get this for your newsroom&lt;/h4&gt;

&lt;p&gt;Datasette and Datasette Cloud can do a &lt;em&gt;lot&lt;/em&gt; of useful things right now. Almost everything I showed today can be done with the open source project, but the goal of Datasette Cloud is to make these tools available to newsrooms and organizations that don't want to run everything themselves.&lt;/p&gt;
&lt;p&gt;If this looks relevant to your team we would love to hear from you. Drop me a line at &lt;code&gt;swillison @&lt;/code&gt; Google's email provider and let's set up a time to talk!&lt;/p&gt;

&lt;h4 id="story-discovery-colophon"&gt;Colophon&lt;/h4&gt;

&lt;p&gt;Since this talk was entirely demos rather than slides, my usual approach of &lt;a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/" rel="nofollow"&gt;turning slides into images for my write-up&lt;/a&gt; wasn't quite right.&lt;/p&gt;
&lt;p&gt;Instead, I extracted an MP4 file of the video (&lt;code&gt;yt-dlp --recode-video mp4 'https://www.youtube.com/watch?v=BJxPKr6ixSM'&lt;/code&gt;) and watched that myself at double speed to figure out which frames would be best for illustrating the talk.&lt;/p&gt;
&lt;p&gt;I wanted to hit a key to grab screenshots at different moments. I ended up using GPT-4 to help build &lt;a href="https://til.simonwillison.net/macos/quicktime-capture-script" rel="nofollow"&gt;a script to capture frames from a QuickTime video&lt;/a&gt;, which were saved to my &lt;code&gt;/tmp&lt;/code&gt; folder with names like &lt;code&gt;frame_005026.jpg&lt;/code&gt; - where the filename represents the HHMMSS point within the video.&lt;/p&gt;
&lt;p&gt;After writing up my commentary I realized that I really wanted to link each frame to the point in the video where it occurred. With &lt;a href="https://chat.openai.com/share/db0ab17c-9eae-4fbd-bd5b-8c8c318bde3e" rel="nofollow"&gt;more ChatGPT assistance&lt;/a&gt; I built a VS Code regular expression for this:&lt;/p&gt;
&lt;p&gt;Find:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;(&amp;lt;p&amp;gt;&amp;lt;img src="https://static\.simonwillison\.net/static/2024/story-discovery-at-scale/frame_00(\d{2})(\d{2})\.jpg" alt="[^"]+" style="max-width: 100%;" /&amp;gt;&amp;lt;/p&amp;gt;)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Replace with:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$1 &amp;lt;p&amp;gt;&amp;lt;a href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;amp;amp;t=$2m$3s"&amp;gt;$2m$3s&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;I also generated a talk transcript with &lt;a href="https://goodsnooze.gumroad.com/l/macwhisper"&gt;MacWhisper&lt;/a&gt;, but I ended up not using that at all - typing up individual notes to accompany each frame turned out to be a better way of putting together this article.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/journalism"&gt;journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/code-interpreter"&gt;code-interpreter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/macwhisper"&gt;macwhisper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="data-journalism"/><category term="journalism"/><category term="projects"/><category term="my-talks"/><category term="ai"/><category term="datasette"/><category term="datasette-cloud"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="annotated-talks"/><category term="code-interpreter"/><category term="enrichments"/><category term="vision-llms"/><category term="structured-extraction"/><category term="coding-agents"/><category term="macwhisper"/></entry><entry><title>Weeknotes: a Datasette release, an LLM release and a bunch of new plugins</title><link href="https://simonwillison.net/2024/Feb/9/weeknotes/#atom-tag" rel="alternate"/><published>2024-02-09T23:59:06+00:00</published><updated>2024-02-09T23:59:06+00:00</updated><id>https://simonwillison.net/2024/Feb/9/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;I wrote extensive annotated release notes for &lt;a href="https://simonwillison.net/2024/Feb/7/datasette-1a8/"&gt;Datasette 1.0a8&lt;/a&gt; and &lt;a href="https://simonwillison.net/2024/Jan/26/llm/"&gt;LLM 0.13&lt;/a&gt; already. Here's what else I've been up to this past three weeks.&lt;/p&gt;
&lt;h4 id="new-plugins-datasette"&gt;New plugins for Datasette&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-proxy-url"&gt;datasette-proxy-url&lt;/a&gt;&lt;/strong&gt; is a very simple plugin that simple lets you configure a path within Datasette that serves content proxied from another URL.&lt;/p&gt;
&lt;p&gt;I built this one because I ran into a bug with Substack where Substack were denying requests to my newsletter's RSS feed from code running in GitHub Actions! Frustrating, since the whole &lt;em&gt;point&lt;/em&gt; of RSS is to be retrieved by bots.&lt;/p&gt;
&lt;p&gt;I solved it by deploying a quick proxy to a Datasette instance I already had up and running, effectively treating Datasette as a cheap deployment platform for random pieces of proxying infrastructure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-homepage-table"&gt;datasette-homepage-table&lt;/a&gt;&lt;/strong&gt; lets you configure Datasette to display a specific table as the homepage of the instance. I've wanted this for a while myself, someone requested it on &lt;a href="https://datasette.io/discord"&gt;Datasette Discord&lt;/a&gt; and it turned out to be pretty quick to build.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-events-db"&gt;datasette-events-db&lt;/a&gt;&lt;/strong&gt; hooks into the new &lt;a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#event-tracking"&gt;events mechanism&lt;/a&gt; in Datasette 1.0a8 and logs any events (&lt;code&gt;create-table&lt;/code&gt;, &lt;code&gt;login&lt;/code&gt; etc) to a &lt;code&gt;datasette_events&lt;/code&gt; table. I released this partly as a debugging tool and partly because I like to ensure every Datasette plugin hook has at least one released plugin that uses it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-quickjs"&gt;datasette-enrichments-quickjs&lt;/a&gt;&lt;/strong&gt; was this morning's project. It's a plugin for &lt;a href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/"&gt;Datasette Enrichments&lt;/a&gt; that takes advantage of the &lt;a href="https://pypi.org/project/quickjs/"&gt;quickjs&lt;/a&gt; Python package - a wrapper around the excellent &lt;a href="https://bellard.org/quickjs/"&gt;QuickJS engine&lt;/a&gt; - to support running a custom JavaScript function against every row in a table to populate a new column.&lt;/p&gt;
&lt;p&gt;QuickJS appears to provide a robust sandbox, including both memory and time limits! I need to write more about this plugin, it opens up some very exciting new possibilities for Datasette.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also published some significant updates to existing plugins:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/a&gt;&lt;/strong&gt; got a long-overdue improvement allowing it to upload CSVs to a specified database, rather than just using the first available one. As part of this I completely re-engineered how it works in terms of threading strategies, as described in &lt;a href="https://github.com/simonw/datasette-upload-csvs/issues/38"&gt;issue 38&lt;/a&gt;. Plus it's now tested against the Datasette 1.0 alpha series in addition to 0.x stable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="plugins-for-llm"&gt;Plugins for LLM&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; is my command-line tool and Python library for interacting with Large Language Models. I released one new plugin for that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-embed-onnx"&gt;llm-embed-onnx&lt;/a&gt;&lt;/strong&gt; is a thin wrapper on top of &lt;a href="https://github.com/taylorai/onnx_embedding_models"&gt;onnx_embedding_models&lt;/a&gt; by Benjamin Anderson which itself wraps the powerful &lt;a href="https://onnxruntime.ai/"&gt;ONNX Runtime&lt;/a&gt;. It makes several new embeddings models available for use with LLM, listed &lt;a href="https://github.com/simonw/llm-embed-onnx/blob/main/README.md#usage"&gt;in the README&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I released updates for two LLM plugins as well:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-gpt4all"&gt;llm-gpt4all&lt;/a&gt;&lt;/strong&gt; got a release with improvements from three contributors. I'll quote &lt;a href="https://github.com/simonw/llm-gpt4all/releases/tag/0.3"&gt;the release notes&lt;/a&gt; in full:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Now provides access to model options such as &lt;code&gt;-o max_tokens 3&lt;/code&gt;. Thanks, &lt;a href="https://github.com/RangerMauve"&gt;Mauve Signweaver&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm-gpt4all/issues/3"&gt;#3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Models now work without an internet connection. Thanks, &lt;a href="https://github.com/hydrosquall"&gt;Cameron Yick&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm-gpt4all/issues/10"&gt;#10&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Documentation now includes the location of the model files. Thanks, &lt;a href="https://github.com/slhck"&gt;Werner Robitza&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm-gpt4all/pull/21"&gt;#21&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-sentence-transformers"&gt;llm-sentence-transformers&lt;/a&gt;&lt;/strong&gt; now has a &lt;code&gt;llm sentence-transformers register --trust-remote-code&lt;/code&gt; option, which was necessary to support the newly released &lt;a href="https://huggingface.co/nomic-ai/nomic-embed-text-v1"&gt;nomic-embed-text-v1&lt;/a&gt; embedding model.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I finally started hacking on a &lt;code&gt;llm-rag&lt;/code&gt; plugin which will provide an implementation of Retrieval Augmented Generation for LLM, similar to the process I describe in &lt;a href="https://til.simonwillison.net/llms/embed-paragraphs"&gt;Embedding paragraphs from my blog with E5-large-v2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'll write more about that once it's in an interesting state.&lt;/p&gt;
&lt;h4 id="shot-scraper-1.4"&gt;shot-scraper 1.4&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; is my CLI tool for taking screenshots of web pages and running scraping code against them using JavaScript, built on top of &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I dropped into the repo to add HTTP Basic authentication support and found several excellent PRs waiting to be merged, so I bundled those together into a new release.&lt;/p&gt;
&lt;p&gt;Here are the full release notes for &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/1.4"&gt;shot-scraper 1.4&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;--auth-username x --auth-password y&lt;/code&gt; options for each &lt;code&gt;shot-scraper&lt;/code&gt; command, allowing a username and password to be set for HTTP Basic authentication. &lt;a href="https://github.com/simonw/shot-scraper/issues/140"&gt;#140&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shot-scraper URL --interactive&lt;/code&gt; mode now respects the &lt;code&gt;-w&lt;/code&gt; and &lt;code&gt;-h&lt;/code&gt; arguments setting the size of the browser viewport. Thanks, &lt;a href="https://github.com/mhalle"&gt;mhalle&lt;/a&gt;. &lt;a href="https://github.com/simonw/shot-scraper/issues/128"&gt;#128&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;--scale-factor&lt;/code&gt; option for setting scale factors other than 2 (for retina). Thanks, &lt;a href="https://github.com/nielthiart"&gt;Niel Thiart&lt;/a&gt;. &lt;a href="https://github.com/simonw/shot-scraper/issues/136"&gt;#136&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;--browser-arg&lt;/code&gt; option for passing extra browser arguments (such as &lt;code&gt;--browser-args "--font-render-hinting=none"&lt;/code&gt;) through to the underlying browser. Thanks, &lt;a href="https://github.com/nielthiart"&gt;Niel Thiart&lt;/a&gt;. &lt;a href="https://github.com/simonw/shot-scraper/issues/137"&gt;#137&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;h4 id="misc-other-projects"&gt;Miscellaneous other projects&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;We had some pretty severe storms in the San Francisco Bay Area last week, inspired me to revisit &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/1.4"&gt;my old PG&amp;amp;E outage scraper&lt;/a&gt;. PG&amp;amp;E's outage map changed and broke that a couple of years ago, but I got &lt;a href="https://github.com/simonw/pge-outages"&gt;a new scraper up&lt;/a&gt; and running just in time to start capturing outages.&lt;/li&gt;
&lt;li&gt;I've been wanting a way to quickly create additional labels for my GitHub repositories for a while. I finally put together a simple system for that based on GitHub Actions, described in this TIL: &lt;a href="https://til.simonwillison.net/github-actions/creating-github-labels"&gt;Creating GitHub repository labels with an Actions workflow&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-feb-9-releases"&gt;Releases&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-quickjs/releases/tag/0.1a0"&gt;datasette-enrichments-quickjs 0.1a0&lt;/a&gt;&lt;/strong&gt; - 2024-02-09&lt;br /&gt;Enrich data with a custom JavaScript function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-events-db/releases/tag/0.1a0"&gt;datasette-events-db 0.1a0&lt;/a&gt;&lt;/strong&gt; - 2024-02-08&lt;br /&gt;Log Datasette events to a database table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette/releases/tag/1.0a8"&gt;datasette 1.0a8&lt;/a&gt;&lt;/strong&gt; - 2024-02-07&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper/releases/tag/1.4"&gt;shot-scraper 1.4&lt;/a&gt;&lt;/strong&gt; - 2024-02-05&lt;br /&gt;A command-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-sentence-transformers/releases/tag/0.2"&gt;llm-sentence-transformers 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-02-04&lt;br /&gt;LLM plugin for embeddings using sentence-transformers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-homepage-table/releases/tag/0.2"&gt;datasette-homepage-table 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-01-31&lt;br /&gt;Show a specific Datasette table on the homepage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.9"&gt;datasette-upload-csvs 0.9&lt;/a&gt;&lt;/strong&gt; - 2024-01-30&lt;br /&gt;Datasette plugin for uploading CSV files and converting them to database tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-embed-onnx/releases/tag/0.1"&gt;llm-embed-onnx 0.1&lt;/a&gt;&lt;/strong&gt; - 2024-01-28&lt;br /&gt;Run embedding models using ONNX&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm/releases/tag/0.13.1"&gt;llm 0.13.1&lt;/a&gt;&lt;/strong&gt; - 2024-01-27&lt;br /&gt;Access large language models from the command-line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-gpt4all/releases/tag/0.3"&gt;llm-gpt4all 0.3&lt;/a&gt;&lt;/strong&gt; - 2024-01-24&lt;br /&gt;Plugin for LLM adding support for the GPT4All collection of models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-granian/releases/tag/0.1"&gt;datasette-granian 0.1&lt;/a&gt;&lt;/strong&gt; - 2024-01-23&lt;br /&gt;Run Datasette using the Granian HTTP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-proxy-url/releases/tag/0.1.1"&gt;datasette-proxy-url 0.1.1&lt;/a&gt;&lt;/strong&gt; - 2024-01-23&lt;br /&gt;Proxy a URL through a Datasette instance&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-feb-9-tils"&gt;TILs&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/github-actions/creating-github-labels"&gt;Creating GitHub repository labels with an Actions workflow&lt;/a&gt; - 2024-02-09&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/llms/colbert-ragatouille"&gt;Exploring ColBERT with RAGatouille&lt;/a&gt; - 2024-01-28&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/httpx/openai-log-requests-responses"&gt;Logging OpenAI API requests and responses using HTTPX&lt;/a&gt; - 2024-01-26&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quickjs"&gt;quickjs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="datasette"/><category term="weeknotes"/><category term="shot-scraper"/><category term="llm"/><category term="quickjs"/><category term="enrichments"/></entry><entry><title>Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle</title><link href="https://simonwillison.net/2023/Dec/8/weeknotes/#atom-tag" rel="alternate"/><published>2023-12-08T06:04:54+00:00</published><updated>2023-12-08T06:04:54+00:00</updated><id>https://simonwillison.net/2023/Dec/8/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;I've mainly been working on &lt;a href="https://enrichments.datasette.io/"&gt;Datasette Enrichments&lt;/a&gt; and continuing to explore the possibilities enabled by &lt;a href="https://github.com/simonw/sqlite-chronicle"&gt;sqlite-chronicle&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="weeknotes-enrichments"&gt;Enrichments&lt;/h4&gt;
&lt;p&gt;This is the biggest new Datasette feature to arrive in quite a while, and it's entirely implemented as a plugin.&lt;/p&gt;
&lt;p&gt;I described these in detail in &lt;strong&gt;&lt;a href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/"&gt;Datasette Enrichments: a new plugin framework for augmenting your data&lt;/a&gt;&lt;/strong&gt; (with an accompanying &lt;a href="https://www.youtube.com/watch?v=HqKlJCgdjfg"&gt;YouTube video demo&lt;/a&gt;). The short version: you can now install plugins that can "enrich" data by running transformations (or data fetches) against selected rows - geocoding addresses, or executing a GPT prompt, or applying a regular expression.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://datasette.io/plugins/datasette-enrichments"&gt;datasette-enrichments&lt;/a&gt; plugin provides the mechanism for running these enrichments. Other plugins can then depend on it and define all manner of interesting options for enriching and transforming data.&lt;/p&gt;
&lt;p&gt;I've built four of these so far, and I wrote some &lt;a href="https://enrichments.datasette.io/en/stable/developing.html"&gt;extensive documentation&lt;/a&gt; to help people build more. I'm excited to see how people use and build further on this initial foundation.&lt;/p&gt;
&lt;h4 id="weeknotes-datasette-comments"&gt;Datasette Comments&lt;/h4&gt;
&lt;p&gt;Alex Garcia released the first version of &lt;a href="https://datasette.io/plugins/datasette-comments"&gt;datasette-comments&lt;/a&gt; as part of our continuing collaboration to build out Datasette Cloud.&lt;/p&gt;
&lt;p&gt;He wrote about that on the Datasette Cloud blog: &lt;strong&gt;&lt;a href="https://www.datasette.cloud/blog/2023/datasette-comments/"&gt;Annotate and explore your data with datasette-comments&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/datasette-comments.jpg" alt="Three comment threads demonstrating features of Datasette Comments - replies, reaction emoji, hashtags and the ability to mention other users." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This is another capability I've been looking forward to for years: the plugin lets you leave comments on individual rows within a Datasette instance, in order to collaborate with others on finding stories in data.&lt;/p&gt;
&lt;h4 id="weeknotes-chronicle"&gt;sqlite-chronicle and datasette-chronicle&lt;/h4&gt;
&lt;p&gt;I first wrote about &lt;a href="https://github.com/simonw/sqlite-chronicle"&gt;sqlite-chronicle&lt;/a&gt; in &lt;a href="https://simonwillison.net/2023/Sep/17/weeknotes-embeddings/#sqlite-chronicle"&gt;weeknotes back in September&lt;/a&gt;. This week, inspired by my work on embeddings, I spent a bit more time on it and shipped &lt;a href="https://github.com/simonw/sqlite-chronicle/releases/tag/0.2"&gt;a 0.2 release&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sqlite-chronicle&lt;/code&gt; is a Python library that implements a SQL pattern where a table can have a &lt;code&gt;_chronicle_tablename&lt;/code&gt; companion table created, which is then updated using triggers against the main table.&lt;/p&gt;
&lt;p&gt;The chronicle table has a shadow row for every row in the main table, duplicating its primary keys and then storing millisecond timestamp columns for &lt;code&gt;added_ms&lt;/code&gt; and &lt;code&gt;updated_ms&lt;/code&gt;, an integer &lt;code&gt;version&lt;/code&gt; column and a &lt;code&gt;deleted&lt;/code&gt; boolean indicator.&lt;/p&gt;
&lt;p&gt;The goal is to record when a row was last inserted or updated, with an atomically incrementing &lt;code&gt;version&lt;/code&gt; ID representing the version of the entire table.&lt;/p&gt;
&lt;p&gt;This can then enable all sorts of interesting potential use-cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identify which rows have been updated or inserted since a previously recorded version&lt;/li&gt;
&lt;li&gt;Synchronize a table with another table, only updating/inserting/deleting rows that have changed since last time&lt;/li&gt;
&lt;li&gt;Run scheduled tasks that only consider rows that have changed in some way&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The relevance to enrichments is that I'd like to implement a form of "persistent" enrichment - an enrichment which is configured to run repeatedly against new or updated rows, geocoding new addresses for example.&lt;/p&gt;
&lt;p&gt;To do that, I need a mechanism to identify which rows have already been enriched and which need to be enriched again. &lt;code&gt;sqlite-chronicle&lt;/code&gt; is my current plan to provide that mechanism.&lt;/p&gt;
&lt;p&gt;It's still pretty experimental. I recently found that &lt;code&gt;INSERT OR REPLACE INTO&lt;/code&gt; queries don't behave how I would expect them to, see &lt;a href="https://github.com/simonw/sqlite-chronicle/issues/7"&gt;issue #7&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I also started a new plugin to accompany the feature: &lt;a href="https://datasette.io/plugins/datasette-chronicle"&gt;datasette-chronicle&lt;/a&gt;, which adds two features to Datasette:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;"enable/disable chronicle tracking" table actions for users with the correct permissions, which can be used in the Datasette UI to turn chronicle tracking on and off for a specific table&lt;/li&gt;
&lt;li&gt;For tables that have chronicle enabled, a &lt;code&gt;?_since=VERSION&lt;/code&gt; querystring parameter which can be used to filter the table to only rows that have changed since the specified version&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'm running the plugin against the &lt;a href="https://demos.datasette.cloud/data/documents"&gt;documents&lt;/a&gt; table on &lt;code&gt;demos.datasette.cloud&lt;/code&gt; - see &lt;a href="https://demos.datasette.cloud/data/_chronicle_documents"&gt;_chronicle_documents&lt;/a&gt; there for the result. That table is populated via GitHub scheduled actions and the Datasette API, as described in &lt;a href="https://www.datasette.cloud/blog/2023/datasette-cloud-api/"&gt;Getting started with the Datasette Cloud API&lt;/a&gt; - it's also where I first spotted the &lt;code&gt;INSERT OR REPLACE INTO&lt;/code&gt; issue I described earlier.&lt;/p&gt;
&lt;h4 id="weeknotes-newsroom-robots"&gt;Newsroom Robots&lt;/h4&gt;
&lt;p&gt;I recorded an episode of the &lt;a href="https://www.newsroomrobots.com/"&gt;Newsroom Robots&lt;/a&gt; AI in journalism podcast with Nikita Roy a couple of weeks ago.&lt;/p&gt;
&lt;p&gt;She split our conversation into two episodes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.newsroomrobots.com/p/breaking-down-openais-new-features"&gt;Simon Willison (Part One): Breaking Down OpenAI's New Features &amp;amp; Security Risks of Large Language Models&lt;/a&gt; - which I ended up using as the basis for two blog entries:
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2023/Nov/25/newsroom-robots/"&gt;I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2023/Nov/27/prompt-injection-explained/"&gt;Prompt injection explained, November 2023 edition&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.newsroomrobots.com/p/how-datasette-helps-with-investigative"&gt;Simon Willison (Part Two): How Datasette Helps With Investigative Reporting&lt;/a&gt; which has the best audio description of Datasette I've managed to produce so far.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-sqlite-utils-3-36"&gt;sqlite-utils 3.36&lt;/h4&gt;
&lt;p&gt;Quoting the &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-36"&gt;release notes&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Support for creating tables in &lt;a href="https://www.sqlite.org/stricttables.html"&gt;SQLite STRICT mode&lt;/a&gt;. Thanks, &lt;a href="https://github.com/tkhattra"&gt;Taj Khattra&lt;/a&gt;. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/344"&gt;#344&lt;/a&gt;)
&lt;ul&gt;
&lt;li&gt;CLI commands &lt;code&gt;create-table&lt;/code&gt;, &lt;code&gt;insert&lt;/code&gt; and &lt;code&gt;upsert&lt;/code&gt; all now accept a &lt;code&gt;--strict&lt;/code&gt; option.&lt;/li&gt;
&lt;li&gt;Python methods that can create a table - &lt;code&gt;table.create()&lt;/code&gt; and &lt;code&gt;insert/upsert/insert_all/upsert_all&lt;/code&gt; all now accept an optional &lt;code&gt;strict=True&lt;/code&gt; parameter.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;transform&lt;/code&gt; command and &lt;code&gt;table.transform()&lt;/code&gt; method preserve strict mode when transforming a table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;sqlite-utils create-table&lt;/code&gt; command now accepts &lt;code&gt;str&lt;/code&gt;, &lt;code&gt;int&lt;/code&gt; and &lt;code&gt;bytes&lt;/code&gt; as aliases for &lt;code&gt;text&lt;/code&gt;, &lt;code&gt;integer&lt;/code&gt; and &lt;code&gt;blob&lt;/code&gt; respectively. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/606"&gt;#606&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Taj Khattra's contribution of the &lt;code&gt;--strict&lt;/code&gt; and &lt;code&gt;strict=True&lt;/code&gt; options is a beautiful example of my ideal pull request: a clean implementation, comprehensive tests and thoughtful updates to the documentation &lt;a href="https://github.com/simonw/sqlite-utils/pull/604"&gt;all bundled together in one go&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="weeknotes-releases"&gt;Releases&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.36"&gt;sqlite-utils 3.36&lt;/a&gt;&lt;/strong&gt; - 2023-12-08&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-leaflet-geojson/releases/tag/0.8.1"&gt;datasette-leaflet-geojson 0.8.1&lt;/a&gt;&lt;/strong&gt; - 2023-12-07&lt;br /&gt;Datasette plugin that replaces any GeoJSON column values with a Leaflet map.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-chronicle/releases/tag/0.2"&gt;datasette-chronicle 0.2&lt;/a&gt;&lt;/strong&gt; - 2023-12-06&lt;br /&gt;Enable sqlite-chronicle against tables in Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-jinja/releases/tag/0.1"&gt;datasette-enrichments-jinja 0.1&lt;/a&gt;&lt;/strong&gt; - 2023-12-06&lt;br /&gt;Datasette enrichment for evaluating templates in a Jinja sandbox&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-chronicle/releases/tag/0.2.1"&gt;sqlite-chronicle 0.2.1&lt;/a&gt;&lt;/strong&gt; - 2023-12-06&lt;br /&gt;Use triggers to track when rows in a SQLite table were updated or deleted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-gpt/releases/tag/0.3"&gt;datasette-enrichments-gpt 0.3&lt;/a&gt;&lt;/strong&gt; - 2023-12-01&lt;br /&gt;Datasette enrichment for analyzing row data using OpenAI's GPT models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-statistics/releases/tag/0.2.1"&gt;datasette-statistics 0.2.1&lt;/a&gt;&lt;/strong&gt; - 2023-11-30&lt;br /&gt;SQL statistics functions for Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-opencage/releases/tag/0.1"&gt;datasette-enrichments-opencage 0.1&lt;/a&gt;&lt;/strong&gt; - 2023-11-30&lt;br /&gt;Geocoding and reverse geocoding using OpenCage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments-re2/releases/tag/0.1"&gt;datasette-enrichments-re2 0.1&lt;/a&gt;&lt;/strong&gt; - 2023-11-30&lt;br /&gt;Enrich data using regular expressions powered by re2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-enrichments/releases/tag/0.2"&gt;datasette-enrichments 0.2&lt;/a&gt;&lt;/strong&gt; - 2023-11-29&lt;br /&gt;Tools for running enrichments against data stored in Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-pretty-json/releases/tag/0.3"&gt;datasette-pretty-json 0.3&lt;/a&gt;&lt;/strong&gt; - 2023-11-28&lt;br /&gt;Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-tils"&gt;TILs&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/macos/quick-whisper-youtube"&gt;Grabbing a transcript of a short snippet of a YouTube video with MacWhisper&lt;/a&gt; - 2023-12-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/pyodide/cryptography-in-pyodide"&gt;Cryptography in Pyodide&lt;/a&gt; - 2023-11-26&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/readthedocs/pip-install-docs"&gt;Running pip install '.[docs]' on ReadTheDocs&lt;/a&gt; - 2023-11-24&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/><category term="sqlite-utils"/><category term="enrichments"/></entry><entry><title>Datasette Enrichments: a new plugin framework for augmenting your data</title><link href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/#atom-tag" rel="alternate"/><published>2023-12-01T20:14:50+00:00</published><updated>2023-12-01T20:14:50+00:00</updated><id>https://simonwillison.net/2023/Dec/1/datasette-enrichments/#atom-tag</id><summary type="html">
    &lt;p&gt;Today I'm releasing &lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments"&gt;datasette-enrichments&lt;/a&gt;&lt;/strong&gt;, a new feature for Datasette which provides a framework for applying "enrichments" that can augment your data.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;enrichment&lt;/strong&gt; is code that can be run against rows in a database table. That code can transform existing data or fetch additional data from external sources, then write that augmented data back to the database.&lt;/p&gt;
&lt;p&gt;A good example of an enrichment is &lt;strong&gt;geocoding&lt;/strong&gt;: take a table with an address column, run each address through a geocoding API, then write the resulting location back to &lt;code&gt;latitude&lt;/code&gt; and &lt;code&gt;longitude&lt;/code&gt; columns on the same table.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/enrichments-geocoder.jpg" alt="Datasette screenshot: Enrich data in Film_Locations_in_San_Francisco. 2,084 rows selected. OpenCage geocoder. Geocode to latitude/longitude points using OpenCage. Geocode input: {{ Locations }}, San Francisco, California. Store JSON in column checkbox. Enrich data button." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Each enrichment is itself a plugin. The Datasette enrichments system is designed to be easily extended with new enrichment types, to serve a wide variety of use-cases.&lt;/p&gt;
&lt;h4 id="demonstrating-enrichments"&gt;Demonstrating enrichments&lt;/h4&gt;
&lt;p&gt;I've made &lt;a href="https://www.youtube.com/watch?v=HqKlJCgdjfg"&gt;a video demo&lt;/a&gt; to demonstrate the new capabilities introduced by this plugin.&lt;/p&gt;

&lt;iframe style="max-width: 100%" width="560" height="315" src="https://www.youtube-nocookie.com/embed/HqKlJCgdjfg" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;

&lt;p&gt;The video shows off two enrichments: &lt;code&gt;datasette-enrichments-gpt&lt;/code&gt; for running prompts against OpenAI's GPT language models, and &lt;code&gt;datasette-enrichments-opencage&lt;/code&gt; for geocoding addresses.&lt;/p&gt;
&lt;p&gt;In the video I demonstrate the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uploading a CSV file of &lt;a href="https://data.sfgov.org/Culture-and-Recreation/Film-Locations-in-San-Francisco/yitu-d5am"&gt;Film Locations in San Francisco&lt;/a&gt; to create a table&lt;/li&gt;
&lt;li&gt;Running the OpenCage geocoder enrichment against those rows to populate &lt;code&gt;latitude&lt;/code&gt; and &lt;code&gt;longitude&lt;/code&gt; columns&lt;/li&gt;
&lt;li&gt;... which results in a map being displayed on the table page using &lt;a href="https://datasette.io/plugins/datasette-cluster-map"&gt;datasette-cluster-map&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Applying the GPT enrichment to write terrible haikus about every museum on my &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt; website&lt;/li&gt;
&lt;li&gt;Extracting JSON with key people and dates from each museum descriptions&lt;/li&gt;
&lt;li&gt;Using the GPT-4 Vision API to generate detailed descriptions of photographs displayed on the site&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="enrichments-so-far"&gt;Enrichments so far&lt;/h4&gt;
&lt;p&gt;I'm releasing four enrichment plugins today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-opencage"&gt;datasette-enrichments-opencage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-jinja"&gt;datasette-enrichments-jinja&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-re2"&gt;datasette-enrichments-re2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I've also published documentation on &lt;a href="https://enrichments.datasette.io/en/stable/developing.html"&gt;developing a new enrichment&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/h4&gt;
&lt;p&gt;The most interesting enrichment I'm releasing today is &lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-gpt"&gt;datasette-enrichments-gpt&lt;/a&gt;&lt;/strong&gt;. This enrichment provides access to various OpenAI language models, allowing you to do some really interesting things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a prompt against data pulled from columns in each row of a table and store the result&lt;/li&gt;
&lt;li&gt;Run prompts against URLs to images using the GPT-4 Vision API&lt;/li&gt;
&lt;li&gt;Extract structured data from text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I demonstrated all three of these in the video. Here's how I used JSON object mode to extract JSON structured data for people and years from the museum descriptions, using this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Return JSON: {"people": [...], "years": [...]}&lt;/p&gt;
&lt;p&gt;Each person should be {"name": "...", "bio": "One line bio"}&lt;/p&gt;
&lt;p&gt;Each year should be {"year": 1893, "description": "What happened in that year"}&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/enrich-structured.jpg" alt="Enrich data in museums. 110 rows selected. AI analysis with OpenAI GPT. Model gpt-4-turbo. Prompt {{ description }}. System prompt: Return JSON: {&amp;quot;people&amp;quot;: ..., &amp;quot;years&amp;quot;: ...} Each person should be {&amp;quot;name&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;bio&amp;quot;: &amp;quot;One line bio&amp;quot;} Each year should be {&amp;quot;year&amp;quot;: 1893, &amp;quot;description&amp;quot;: &amp;quot;What happened in that year&amp;quot;}. JSON output is selected, output column name is extracted." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I also ran GPT-4 Vision against images, with the prompt "describe this photo". Here's the description it gave for this photograph from the Bigfoot Discovery Museum:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the photo, we see an elderly man with a full white beard and glasses, wearing a cap and a blue denim shirt, seated behind a cluttered desk. The desk is strewn with various items including papers, books, and what appears to be works of art or prints. The man seems engaged in conversation or explaining something, mid-gesture with his right hand.&lt;/p&gt;
&lt;p&gt;The backdrop is a room filled with bookshelves brimming with books and some items that look like filing organizers, hinting at a vast collection. The shelves are densely packed, giving the space a cozy and somewhat cluttered appearance, likely a reflection of intellectual activity and a personal workspace. Various other items such as a poster and possibly personal memorabilia can be seen on the walls adding to the character of the room.&lt;/p&gt;
&lt;p&gt;Overall, the image portrays a scholarly or artistic atmosphere, suggesting that the man could be a collector, a bookstore owner, an academic, or an artist.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/bigfoot-museum.jpeg" alt="The photo exactly matches that description." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="datasette-enrichments-opencage"&gt;datasette-enrichments-opencage&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-opencage"&gt;datasette-enrichments-opencage&lt;/a&gt;&lt;/strong&gt; provides access to the &lt;a href="https://opencagedata.com/"&gt;OpenCage geocoder&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I really like OpenCage. Many geocoders have strict restrictions on what you can do with the data they return - some of them even prohibit storing the results long-term in a database!&lt;/p&gt;
&lt;p&gt;OpenCage avoid this by carefully building on top of open data, and they also &lt;a href="https://opencagedata.com/about#geo-innovation"&gt;financially support&lt;/a&gt; some of the open data projects they rely on.&lt;/p&gt;
&lt;p&gt;This plugin (and &lt;code&gt;datasette-enrichments-gpt&lt;/code&gt;) both implement a pattern where you can configure an API key using &lt;a href="https://docs.datasette.io/en/stable/plugins.html#plugins-configuration-secret"&gt;plugin secrets&lt;/a&gt;, but if you don't do that the key will be requested from you each time you run an enrichment.&lt;/p&gt;
&lt;h4 id="datasette-enrichments-jinja"&gt;datasette-enrichments-jinja&lt;/h4&gt;
&lt;p&gt;I wanted to launch with an example of an enrichment that can execute arbitrary code against each row in a table.&lt;/p&gt;
&lt;p&gt;Running code in a sandbox in Python is notoriously difficult. I decided to use the &lt;a href="https://jinja.palletsprojects.com/en/3.1.x/sandbox/"&gt;Jinja sandbox&lt;/a&gt;, which isn't completely secure against malicious attackers but should be good enough to ensure trustworthy users don't accidentally cause too much damage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-jinja"&gt;datasette-enrichments-jinja&lt;/a&gt;&lt;/strong&gt; can execute a &lt;a href="https://jinja.palletsprojects.com/"&gt;Jinja template&lt;/a&gt; against each row in a table and store the result.&lt;/p&gt;
&lt;p&gt;It's a small but powerful template language, and should prove useful for a number data manipulation tasks.&lt;/p&gt;
&lt;h4 id="datasette-enrichments-re2"&gt;datasette-enrichments-re2&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/plugins/datasette-enrichments-re2"&gt;datasette-enrichments-re2&lt;/a&gt;&lt;/strong&gt; provides an enrichment that can run a regular expression against a value from a table and store the result.&lt;/p&gt;
&lt;p&gt;It offers four different modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a search and replace against a column&lt;/li&gt;
&lt;li&gt;Extract the first matching result and store that in the specified column (adding a column to the table if necessary)&lt;/li&gt;
&lt;li&gt;Extract all matching results and store them as a JSON array in the specified column. If the regular expression uses named capture groups this will be an array of objects, otherwise it will be an array of strings.&lt;/li&gt;
&lt;li&gt;Execute a regular expression with named capture groups and store the results in multiple columns, one for each of those named groups&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's quite a lot of functionality bundled into one enrichment! I haven't used this for much yet myself, but I'm looking forward to exploring it further and documenting some useful patterns.&lt;/p&gt;
&lt;h4 id="writing-your-own-enrichment"&gt;Writing your own enrichment plugin&lt;/h4&gt;
&lt;p&gt;The most exciting thing about enrichments is what they can unlock in the future.&lt;/p&gt;
&lt;p&gt;I've tried to make it as easy as possible for Python developers to build their own enrichment plugins.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://enrichments.datasette.io/en/latest/developing.html"&gt;Developing a new enrichment&lt;/a&gt; documentation walks through the process of building a new enrichment plugin from scratch.&lt;/p&gt;
&lt;p&gt;Enrichments run inside Datasette using Python &lt;code&gt;asyncio&lt;/code&gt;. This is a particularly good fit for enrichments that use external APIs, since &lt;a href="https://www.python-httpx.org/"&gt;HTTPX&lt;/a&gt; makes it easy to run multiple HTTP requests in parallel.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;-opencage&lt;/code&gt; and &lt;code&gt;-gpt&lt;/code&gt; enrichments are two examples of enrichments that use HTTPX.&lt;/p&gt;
&lt;p&gt;Interested in building one? Join the new &lt;a href="https://datasette.io/discord-enrichments"&gt;#enrichments channel&lt;/a&gt; on the Datasette Discord to discuss ideas and talk about the new feature!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/jinja"&gt;jinja&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="jinja"/><category term="plugins"/><category term="projects"/><category term="datasette"/><category term="enrichments"/></entry></feed>