<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: derek-willis</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/derek-willis.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-06-19T18:26:38+00:00</updated><author><name>Simon Willison</name></author><entry><title>How OpenElections Uses LLMs</title><link href="https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-tag" rel="alternate"/><published>2025-06-19T18:26:38+00:00</published><updated>2025-06-19T18:26:38+00:00</updated><id>https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://thescoop.org/archives/2025/06/09/how-openelections-uses-llms/index.html"&gt;How OpenElections Uses LLMs&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The &lt;a href="https://github.com/openelections"&gt;OpenElections project&lt;/a&gt; collects detailed election data for the USA, all the way down to the precinct level. This is a &lt;em&gt;surprisingly&lt;/em&gt; hard problem: while county and state-level results are widely available, precinct-level results are published in thousands of different ad-hoc ways and rarely aggregated once the election result has been announced.&lt;/p&gt;
&lt;p&gt;A lot of those precinct results are published as image-filled PDFs.&lt;/p&gt;
&lt;p&gt;Derek Willis has recently started leaning on Gemini to help parse those PDFs into CSV data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For parsing image PDFs into CSV files, Google’s Gemini is my model of choice, for two main reasons. First, the results are usually very, very accurate (with a few caveats I’ll detail below), and second, Gemini’s large context window means it’s possible to work with PDF files that can be multiple MBs in size.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Is &lt;a href="https://thescoop.org/archives/2025/06/09/how-openelections-uses-llms/index.html"&gt;this piece&lt;/a&gt; he shares the process and prompts for a real-world expert level data entry project, assisted by Gemini.&lt;/p&gt;
&lt;p&gt;This example from Limestone County, Texas is a great illustration of how tricky this problem can get. Getting traditional OCR software to correctly interpret multi-column layouts like this always requires some level of manual intervention:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The results are typewritten and slightly wonky and come in several different columns" src="https://static.simonwillison.net/static/2025/election-results.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;Derek's &lt;a href="https://aistudio.google.com/app/prompts/1vZq4hi_eCqR58TkuzqPugDcOc2kE1tms"&gt;prompt against Gemini 2.5 Pro&lt;/a&gt; throws in an example, some special instructions and a note about the two column format:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Produce a CSV file from the attached PDF based on this example:&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;county,precinct,office,district,party,candidate,votes,absentee,early_voting,election_day&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,Registered Voters,,,,1858,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,Ballots Cast,,,,1160,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,President,,REP,Donald J. Trump,879,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,President,,DEM,Kamala D. Harris,271,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,President,,LIB,Chase Oliver,1,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,President,,GRN,Jill Stein,4,,,&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Limestone,Precinct 101,President,,,Write-ins,1,,,&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Skip Write-ins with candidate names and rows with "Cast Votes", "Not Assigned", "Rejected write-in votes", "Unresolved write-in votes" or "Contest Totals". Do not extract any values that end in "%"&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Use the following offices:&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;President/Vice President -&amp;gt; President&lt;/code&gt;&lt;br&gt;
&lt;code&gt;United States Senator -&amp;gt; U.S. Senate&lt;/code&gt;&lt;br&gt;
&lt;code&gt;US Representative -&amp;gt; U.S. House&lt;/code&gt;&lt;br&gt;
&lt;code&gt;State Senator -&amp;gt; State Senate&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Quote all office and candidate values. The results are split into two columns on each page; parse the left column first and then the right column.&lt;/code&gt;&lt;br&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A spot-check and a few manual tweaks and &lt;a href="https://github.com/openelections/openelections-data-tx/blob/master/2024/counties/20241105__tx__general__limestone__precinct.csv"&gt;the result&lt;/a&gt; against a 42 page PDF was exactly what was needed.&lt;/p&gt;
&lt;p&gt;How about something harder? The results for Cameron County came as more than 600 pages and looked like this - note the hole-punch holes that obscure some of the text!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Precinct results report, Cameron County Texas, November 5th 2024. A hole punch hole obscures Precinct 16 and another further down the page deletes the first three letters in both Undervotes and Overvotes" src="https://static.simonwillison.net/static/2025/cameron.png" /&gt;&lt;/p&gt;
&lt;p&gt;This file had to be split into chunks of 100 pages each, and the entire process still took a full hour of work - but the resulting table matched up with the official vote totals.&lt;/p&gt;
&lt;p&gt;I love how realistic this example is. AI data entry like this isn't a silver bullet - there's still a bunch of work needed to verify the results and creative thinking needed to work through limitations - but it represents a very real improvement in how small teams can take on projects of this scale.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the six weeks since we started working on Texas precinct results, we’ve been able to convert them for more than half of the state’s 254 counties, including many image PDFs like the ones on display here. That pace simply wouldn’t be possible with data entry or traditional OCR software.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44320001"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="derek-willis"/><category term="ocr"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="gemini"/><category term="vision-llms"/><category term="structured-extraction"/></entry><entry><title>Political Email Extraction Leaderboard</title><link href="https://simonwillison.net/2025/Apr/8/political-email-extraction-leaderboard/#atom-tag" rel="alternate"/><published>2025-04-08T23:22:41+00:00</published><updated>2025-04-08T23:22:41+00:00</updated><id>https://simonwillison.net/2025/Apr/8/political-email-extraction-leaderboard/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://thescoop.org/LLM-Extraction-Challenge/"&gt;Political Email Extraction Leaderboard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Derek Willis collects "political fundraising emails from just about every committee" - 3,000-12,000 a month - and has created an LLM benchmark from 1,000 of them that he collected last November.&lt;/p&gt;
&lt;p&gt;He explains the leaderboard &lt;a href="https://thescoop.org/archives/2025/01/27/llm-extraction-challenge-fundraising-emails/index.html"&gt;in this blog post&lt;/a&gt;. The goal is to have an LLM correctly identify the the committee name from the disclaimer text included in the email.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/dwillis/LLM-Extraction-Challenge/blob/main/fundraising-emails/email_ollama.py"&gt;the code&lt;/a&gt; he uses to run prompts using Ollama. It uses this system prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Produce a JSON object with the following keys: 'committee', which is the name of the committee in the disclaimer that begins with Paid for by but does not include 'Paid for by', the committee address or the treasurer name. If no committee is present, the value of 'committee' should be None. Also add a key called 'sender', which is the name of the person, if any, mentioned as the author of the email. If there is no person named, the value is None. Do not include any other text, no yapping.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Gemini 2.5 Pro tops the leaderboard at the moment with 95.40%, but the new Mistral Small 3.1 manages 5th place with 85.70%, pretty good for a local model!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Table comparing AI model performance with columns for Model (JSON Filename), Total Records, Committee Matches, and Match Percentage. Shows 7 models with 1000 records each: gemini_25_november_2024_prompt2.json (95.40%), qwen25_november_2024_prompt2.json (92.90%), gemini20_flash_november_2024_prompt2.json (92.40%), claude37_sonnet_november_2024_prompt2.json (90.70%), mistral_small_31_november_2024_prompt2.json (85.70%), gemma2_27b_november_2024_prompt2.json (84.40%), and gemma2_november_2024_prompt2.json (83.90%)." src="https://static.simonwillison.net/static/2025/derek-leaderboard.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I said &lt;a href="https://simonwillison.net/2025/Mar/8/nicar-llms/#llms.020.jpeg"&gt;we need our own evals&lt;/a&gt; in my talk at the NICAR Data Journalism conference last month, without realizing Derek has been running one since January.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/dwillis.bsky.social/post/3lmdjmfyeac25"&gt;@dwillis.bsky.social&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ollama"&gt;ollama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="derek-willis"/><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="mistral"/><category term="gemini"/><category term="evals"/><category term="ollama"/><category term="system-prompts"/></entry><entry><title>Six short video demos of LLM and Datasette projects</title><link href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#atom-tag" rel="alternate"/><published>2025-01-22T02:09:54+00:00</published><updated>2025-01-22T02:09:54+00:00</updated><id>https://simonwillison.net/2025/Jan/22/office-hours-demos/#atom-tag</id><summary type="html">
    &lt;p&gt;Last Friday Alex Garcia and I hosted a new kind of Datasette Public Office Hours session, inviting members of the Datasette community to share short demos of projects that they had built. The session lasted just over an hour and featured demos from six different people.&lt;/p&gt;
&lt;p&gt;We broadcast live on YouTube, but I've now edited the session into separate videos. These are listed below, along with project summaries and show notes for each presentation.&lt;/p&gt;
&lt;p&gt;You can also watch all six videos in &lt;a href="https://www.youtube.com/playlist?list=PLSocEbMlNGotyeonEbgFP1_uf9gk1z7zm"&gt;this YouTube playlist&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-logs-feedback-by-matthias-l-bken"&gt;llm-logs-feedback by Matthias Lübken&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-model-gateway-and-llm-consortium-by-thomas-hughes"&gt;llm-model-gateway and llm-consortium by Thomas Hughes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#congressional-travel-explorer-with-derek-willis"&gt;Congressional Travel Explorer with Derek Willis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#llm-questioncache-with-nat-knight"&gt;llm-questioncache with Nat Knight&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#improvements-to-datasette-enrichments-with-simon-willison"&gt;Improvements to Datasette Enrichments with Simon Willison&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Jan/22/office-hours-demos/#datasette-comments-pins-and-write-ui-with-alex-garcia"&gt;Datasette comments, pins and write UI with Alex Garcia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="llm-logs-feedback-by-matthias-l-bken"&gt;llm-logs-feedback by Matthias Lübken&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="9pEP6auZmvg"
  title="llm-logs-feedback by Matthias Lübken"
  playlabel="Play: llm-logs-feedback by Matthias Lübken"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/luebken/llm-logs-feedback"&gt;llm-logs-feedback&lt;/a&gt; is a plugin by Matthias Lübken for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; which adds the ability to store feedback on prompt responses, using new &lt;code&gt;llm feedback+1&lt;/code&gt; and &lt;code&gt;llm feedback-1&lt;/code&gt; commands. These also accept an optional comment, and the feedback is stored in a &lt;code&gt;feedback&lt;/code&gt; table in SQLite.&lt;/p&gt;
&lt;p&gt;You can install the plugin from PyPI like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-logs-feedback&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The full plugin implementation is in the &lt;a href="https://github.com/luebken/llm-logs-feedback/blob/main/llm_logs_feedback.py"&gt;llm_logs_feedback.py file&lt;/a&gt; in Matthias' GitHub repository.&lt;/p&gt;
&lt;h4 id="llm-model-gateway-and-llm-consortium-by-thomas-hughes"&gt;llm-model-gateway and llm-consortium by Thomas Hughes&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="Th5WOyjuRdk"
  title="llm-model-gateway and llm-consortium by Thomas Hughes"
  playlabel="Play: llm-model-gateway and llm-consortium by Thomas Hughes"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;Tommy Hughes has been developing a whole array of LLM plugins, including his &lt;a href="https://github.com/irthomasthomas/llm-plugin-generator"&gt;llm-plugin-generator&lt;/a&gt; which is a plugin that can help write new plugins!&lt;/p&gt;
&lt;p&gt;He started by demonstrating &lt;a href="https://github.com/irthomasthomas/llm-model-gateway"&gt;llm-model-gateway&lt;/a&gt;, a plugin that adds a &lt;code&gt;llm serve&lt;/code&gt; command which starts a localhost server running an imitation of the OpenAI API against LLM models:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-model-gateway
llm serve&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Tommy's main demo was of &lt;a href="https://github.com/irthomasthomas/llm-consortium"&gt;llm-consortium&lt;/a&gt;, a plugin which can use a consortium of collaborating LLM models working together to solve problems.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-consortium
llm consortium &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Best way to prepare avocado toast&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --models claude-3.5-sonnet \
  --models gpt-4o \
  --arbiter gegemini-1.5-pro-latest \
  --confidence-threshold 0.8 \
  --max-iterations 3 \
  --output results.json&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I ran this and got back:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The best way to prepare avocado toast involves selecting quality ingredients and careful preparation. Start with a thick slice of good quality bread, such as sourdough or whole grain. Toast it to your preferred level of crispness.  While the bread is toasting, prepare a ripe avocado. Cut it in half, remove the pit, and scoop the flesh into a bowl. Mash the avocado with a fork, leaving some chunks for texture. Season with salt, black pepper, and a squeeze of fresh lemon juice to prevent browning.  Optional additions include a touch of red pepper flakes.&lt;/p&gt;
&lt;p&gt;Once the toast is ready, let it cool slightly before spreading the seasoned avocado evenly over it. Consider lightly rubbing a cut garlic clove on the warm toast for an extra layer of flavor (optional).&lt;/p&gt;
&lt;p&gt;Enhance your avocado toast with your favorite toppings. Popular choices include: everything bagel seasoning, sliced tomatoes, radishes, a poached or fried egg (for added protein), microgreens, smoked salmon (for a more savory option), feta cheese crumbles, or a drizzle of hot sauce.  For a finishing touch, drizzle with high-quality olive oil and sprinkle with sesame or chia seeds for added texture.&lt;/p&gt;
&lt;p&gt;Consider dietary needs when choosing toppings. For example, those following a low-carb diet might skip the tomatoes and opt for more protein and healthy fats.&lt;/p&gt;
&lt;p&gt;Finally, pay attention to presentation. Arrange the toppings neatly for a visually appealing toast. Serve immediately to enjoy the fresh flavors and crispy toast.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But the really interesting thing is the full log of the prompts and responses sent to Claude 3.5 Sonnet and GPT-4o, followed by a combined prompt to Gemini 1.5 Pro to have it arbitrate between the two responses. You can see &lt;a href="https://gist.github.com/simonw/425f42f8ec1a963ae13c5b57ba580f56"&gt;the full logged prompts and responses here&lt;/a&gt;. Here's that &lt;a href="https://gist.github.com/simonw/e82370f0e5986a15823c82200c1b77f8"&gt;results.json&lt;/a&gt; output file.&lt;/p&gt;
&lt;h4 id="congressional-travel-explorer-with-derek-willis"&gt;Congressional Travel Explorer with Derek Willis&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="CDilLbFP1DY"
  title="Congressional Travel Explorer with Derek Willis"
  playlabel="Play: Congressional Travel Explorer with Derek Willis"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;Derek Willis teaches data journalism at the Philip Merrill College of Journalism at the University of Maryland. For a recent project his students built a &lt;a href="https://cnsmaryland.org/interactives/fall-2024/congressional_travel_explorer/index.html"&gt;Congressional Travel Explorer&lt;/a&gt; interactive using Datasette, AWS Extract and Claude 3.5 Sonnet to analyze travel disclosures from members of Congress.&lt;/p&gt;
&lt;p&gt;One of the outcomes from the project was this story in Politico: &lt;a href="https://www.politico.com/news/2024/10/30/israel-aipac-funded-congress-travel-00185167"&gt;Members of Congress have taken hundreds of AIPAC-funded trips to Israel in the past decade&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="llm-questioncache-with-nat-knight"&gt;llm-questioncache with Nat Knight&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="lXwfEYXjsak"
  title="llm-questioncache with Nat Knight"
  playlabel="Play: llm-questioncache with Nat Knight"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/nathanielknight/llm-questioncache"&gt;llm-questioncache&lt;/a&gt; builds on top of &lt;a href="https://llm.datasette.io/"&gt;https://llm.datasette.io/&lt;/a&gt; to cache answers to questions, using embeddings to return similar answers if they have already been stored.&lt;/p&gt;
&lt;p&gt;Using embeddings for de-duplication of similar questions is an interesting way to apply LLM's &lt;a href="https://llm.datasette.io/en/stable/embeddings/python-api.html"&gt;embeddings feature&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="improvements-to-datasette-enrichments-with-simon-willison"&gt;Improvements to Datasette Enrichments with Simon Willison&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="GumAgaYpda0"
  title="Improvements to Datasette Enrichments with Simon Willison"
  playlabel="Play: Improvements to Datasette Enrichments with Simon Willison"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;I've demonstrated improvements I've been making to Datasette's &lt;a href="https://enrichments.datasette.io/"&gt;Enrichments&lt;/a&gt; system over the past few weeks.&lt;/p&gt;
&lt;p&gt;Enrichments allow you to apply an operation - such as geocoding, a QuickJS JavaScript transformation or an LLM prompt - against selected rows within a table.&lt;/p&gt;
&lt;p&gt;The latest release of &lt;a href="https://github.com/datasette/datasette-enrichments/releases/tag/0.5"&gt;datasette-enrichments&lt;/a&gt; adds visible progress bars and the ability to pause, resume and cancel an enrichment job that is running against a table.&lt;/p&gt;
&lt;h4 id="datasette-comments-pins-and-write-ui-with-alex-garcia"&gt;Datasette comments, pins and write UI with Alex Garcia&lt;/h4&gt;
&lt;p&gt;&lt;lite-youtube videoid="i0u4N6g15Zg"
  title="Datasette comments, pins and write UI with Alex Garcia"
  playlabel="Play: Datasette comments, pins and write UI with Alex Garcia"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;
&lt;p&gt;We finished with three plugin demos from Alex, showcasing collaborative features we have been developing for &lt;a href="https://www.datasette.cloud/"&gt;Datasette Cloud&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-write-ui"&gt;datasette-write-ui&lt;/a&gt; provides tools for editing and adding data to Datasette tables. A new feature here is the ability to shift-click a row to open the editing interface for that row.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-pins"&gt;datasette-pins&lt;/a&gt; allows users to pin tables and databases to their Datasette home page, making them easier to find.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/datasette/datasette-comments"&gt;datasette-comments&lt;/a&gt; adds a commenting interface to Datasette, allowing users to leave comments on individual rows in a table.&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/community"&gt;community&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/demos"&gt;demos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/enrichments"&gt;enrichments&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-public-office-hours"&gt;datasette-public-office-hours&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="community"/><category term="data-journalism"/><category term="demos"/><category term="derek-willis"/><category term="youtube"/><category term="ai"/><category term="datasette"/><category term="alex-garcia"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="enrichments"/><category term="datasette-public-office-hours"/></entry><entry><title>Teaching News Apps with Codespaces</title><link href="https://simonwillison.net/2023/Mar/23/teaching-news-apps-with-codespaces/#atom-tag" rel="alternate"/><published>2023-03-23T00:39:33+00:00</published><updated>2023-03-23T00:39:33+00:00</updated><id>https://simonwillison.net/2023/Mar/23/teaching-news-apps-with-codespaces/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.thescoop.org/archives/2023/03/22/teaching-newsapps-with-codespaces/"&gt;Teaching News Apps with Codespaces&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Derek Willis used GitHub Codespaces for the latest data journalism class he taught, and it eliminated the painful process of trying to get students on an assortment of Mac, Windows and Chromebook laptops all to a point where they could start working and learning together.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://newsie.social/@derekwillis/110069743946512179"&gt;@derekwillis&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/teaching"&gt;teaching&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-codespaces"&gt;github-codespaces&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="derek-willis"/><category term="github"/><category term="teaching"/><category term="github-codespaces"/></entry><entry><title>How much can you learn from just two columns?</title><link href="https://simonwillison.net/2020/Jun/15/how-much-can-you-learn-just-two-columns/#atom-tag" rel="alternate"/><published>2020-06-15T18:46:57+00:00</published><updated>2020-06-15T18:46:57+00:00</updated><id>https://simonwillison.net/2020/Jun/15/how-much-can-you-learn-just-two-columns/#atom-tag</id><summary type="html">
    &lt;p&gt;Derek Willis &lt;a href="https://twitter.com/derekwillis/status/1272544740718915585"&gt;shared&lt;/a&gt; an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.&lt;/p&gt;

&lt;p&gt;He &lt;a href="https://official-gop-following.herokuapp.com/following/following"&gt;published it here&lt;/a&gt; using Datasette. It's a single table containing 385,979 rows - each row is a &lt;code&gt;username&lt;/code&gt;, &lt;code&gt;account_name&lt;/code&gt; pair, where &lt;code&gt;username&lt;/code&gt; is the Twitter account that is being followed and &lt;code&gt;account_name&lt;/code&gt; is the congressional Twitter account that's following it.&lt;/p&gt;

&lt;p&gt;Here's some sample data:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-account_name" scope="col"&gt;account_name&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;njhotline&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenatorTimScott&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;emilykpierce&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;RobWittman&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;jessblevinsoh&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;OHPressSec&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;familylink&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenMikeLee&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;howardsnowdon&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;RepThomasMassie&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;pattidomm&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenCapito&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;How much can we learn from just these two columns?&lt;/p&gt;

&lt;h4&gt;Which accounts have the most GOP congressional followers?&lt;/h4&gt;

&lt;p&gt;Let's start with a simple aggregation: which accounts on Twitter have the most GOP congressional followers?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  username,
  count(*) as num_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All we're doing here is counting the number of times a unique &lt;code&gt;username&lt;/code&gt; (an account that is being followed) shows up in our table, then sorting by those counts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++username%2C%0D%0A++count%28*%29+as+num_gop_followers%0D%0Afrom%0D%0A++following%0D%0Agroup+by%0D%0A++username%0D%0Aorder+by%0D%0A++num_gop_followers+desc"&gt;Here are the result&lt;/a&gt;. The top ten are:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-num_gop_followers" scope="col"&gt;num_gop_followers&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;housegop&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;231&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;gopleader&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;229&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;realdonaldtrump&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;219&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;vp&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;216&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;speakerryan&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;whitehouse&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;stevescalise&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;198&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;chadpergram&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;195&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;potus&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;195&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;foxnews&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;187&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4&gt;Adding a "view more" link&lt;/h4&gt;

&lt;p&gt;Wouldn't it be useful if you could see which accounts those 231 followers of @housegop were?&lt;/p&gt;

&lt;p&gt;We can do that in Datasette without a SQL query - we can instead use the form on the table page to construct a filter - or construct a querystring URL directly. Here are the 5 GOP congressional accounts following  @cityofdallas:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=cityofdallas"&gt;https://official-gop-following.herokuapp.com/following/following?username=cityofdallas&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's add that link to our original top-followed query. Datasette automatically links any value that begins with &lt;code&gt;https://&lt;/code&gt;, so we can use SQL concatenation trick (with the &lt;code&gt;||&lt;/code&gt; concatenation operator) to construct that URL as part of the query:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;
select
  username,
  count(*) as num_gop_followers,
  'https://official-gop-following.herokuapp.com/following/following?username=' || username as list_of_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++username%2C%0D%0A++count%28*%29+as+num_gop_followers%2C%0D%0A++%27https%3A%2F%2Fofficial-gop-following.herokuapp.com%2Ffollowing%2Ffollowing%3Fusername%3D%27+%7C%7C+username+as+list_of_gop_followers%0D%0Afrom%0D%0A++following%0D%0Agroup+by%0D%0A++username%0D%0Aorder+by%0D%0A++num_gop_followers+desc"&gt;Here's that query&lt;/a&gt;. The first five rows look like this:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-num_gop_followers" scope="col"&gt;num_gop_followers&lt;/th&gt;&lt;th class="col-list_of_gop_followers" scope="col"&gt;list_of_gop_followers&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;housegop&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;231&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=housegop"&gt;https://official-gop-following.herokuapp.com/following/following?username=housegop&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;gopleader&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;229&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=gopleader"&gt;https://official-gop-following.herokuapp.com/following/following?username=gopleader&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;realdonaldtrump&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;219&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=realdonaldtrump"&gt;https://official-gop-following.herokuapp.com/following/following?username=realdonaldtrump&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;vp&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;216&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=vp"&gt;https://official-gop-following.herokuapp.com/following/following?username=vp&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;speakerryan&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=speakerryan"&gt;https://official-gop-following.herokuapp.com/following/following?username=speakerryan&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4&gt;Congressional accounts who aren't following certain accounts&lt;/h4&gt;

&lt;p&gt;Since there are only &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select+count%28distinct+account_name%29+from+following"&gt;279&lt;/a&gt; congressional GOP Twitter accounts, how about seeing who are the 279 - 219 = 60 accounts that aren't following @realdonaldtrump?&lt;/p&gt;

&lt;p&gt;Let's construct a SQL query for this, using a sub-select:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  distinct account_name
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select+distinct+account_name%0D%0Afrom%0D%0A++following%0D%0Awhere+account_name+not+in+%28select+account_name+from+following+where+username+%3D+%27realdonaldtrump%27%29%0D%0A"&gt;Here that is in Datasette&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A neat thing we can do here is to parametrize that query. We can swap the hard-coded &lt;code&gt;'realdonaldtrump'&lt;/code&gt; value for a named parameter, &lt;code&gt;:name&lt;/code&gt;, instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
    distinct account_name
  from
    following
  where
    account_name not in (
      select
        account_name
      from
        following
      where
        username = :name
    )&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now when we &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++++distinct+account_name%0D%0A++from%0D%0A++++following%0D%0A++where%0D%0A++++account_name+not+in+%28%0D%0A++++++select%0D%0A++++++++account_name%0D%0A++++++from%0D%0A++++++++following%0D%0A++++++where%0D%0A++++++++username+%3D+%3Aname%0D%0A++++%29"&gt;visit that in Datasette&lt;/a&gt; it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2020/datasette-form.png" alt="Screenshot of the SQL for in Datasette, showing a name input value" style="max-width: 100%" /&gt;&lt;/p&gt;

&lt;p&gt;We can &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++++distinct+account_name%0D%0A++from%0D%0A++++following%0D%0A++where%0D%0A++++account_name+not+in+%28%0D%0A++++++select%0D%0A++++++++account_name%0D%0A++++++from%0D%0A++++++++following%0D%0A++++++where%0D%0A++++++++username+%3D+%3Aname%0D%0A++++%29&amp;amp;name=realdonaldtrump"&gt;add ?name=realdonaldtrump&lt;/a&gt; to the URL (or submit the form and save the resulting URL) to link to results for one individual.&lt;/p&gt;

&lt;p&gt;We've essentially created a new mini-application here - complete with an input form - just by bookmarking a URL in Datasette.&lt;/p&gt;

&lt;p&gt;Let's make the query a bit more interesting by including a count of the number of accounts those congress-people ARE following, and sorting by that.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  account_name,
  count(*) as num_accounts_they_follow
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )
group by
  account_name
order by
  num_accounts_they_follow desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here are &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++account_name%2C%0D%0A++count%28*%29+as+num_accounts_they_follow%0D%0Afrom%0D%0A++following%0D%0Awhere%0D%0A++account_name+not+in+%28%0D%0A++++select%0D%0A++++++account_name%0D%0A++++from%0D%0A++++++following%0D%0A++++where%0D%0A++++++username+%3D+%3Aname%0D%0A++%29%0D%0Agroup+by%0D%0A++account_name%0D%0Aorder+by%0D%0A++num_accounts_they_follow+desc&amp;amp;name=realdonaldtrump"&gt;the results&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-account_name" scope="col"&gt;account_name&lt;/th&gt;&lt;th class="col-num_accounts_they_follow" scope="col"&gt;num_accounts_they_follow&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;ChuckGrassley&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;13475&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;VernBuchanan&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;8560&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;CynthiaLummis&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;5793&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;GovAbbott&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;4423&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;SenatorTimScott&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;3846&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;@ChuckGrassley follows 13,475 accounts but none of them are the president!&lt;/p&gt;

&lt;h4&gt;Most similar accounts, based on number of shared follows&lt;/h4&gt;

&lt;p&gt;One last query. This time we're going to look at which accounts are "most similar" to each other, based on the largest overlap of follows. Here's the SQL for that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  :name as representative,
  account_name as similar_representative,
  count(*) as num_shared_follows
from
  following
where
  username in (
    select
      username
    from
      following
    where
      account_name = :name
  )
  and account_name != :name
group by
  account_name
order by
  num_shared_follows desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Again, we're using a &lt;code&gt;:name&lt;/code&gt; placeholder. Here are the congressional accounts that are &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++%3Aname+as+representative%2C%0D%0A++account_name+as+similar_representative%2C%0D%0A++count%28*%29+as+num_shared_follows%0D%0Afrom%0D%0A++following%0D%0Awhere%0D%0A++username+in+%28%0D%0A++++select%0D%0A++++++username%0D%0A++++from%0D%0A++++++following%0D%0A++++where%0D%0A++++++account_name+%3D+%3Aname%0D%0A++%29%0D%0A++and+account_name+%21%3D+%3Aname%0D%0Agroup+by%0D%0A++account_name%0D%0Aorder+by%0D%0A++num_shared_follows+desc&amp;amp;name=MikeKellyPA"&gt;most similar to @MikeKellyPA&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;What else can you do?&lt;/h4&gt;

&lt;p&gt;I'm pretty impressed at how much insight can be gained using SQL against just a two column table.&lt;/p&gt;

&lt;p&gt;This post started as a Twitter thread. Charles Arthur &lt;a href="https://twitter.com/charlesarthur/status/1272548372885504001"&gt;suggested&lt;/a&gt; cross-referencing this against other sources such as &lt;a href="https://www.govtrack.us/about/analysis#ideology"&gt;the GovTrack ideology analysis&lt;/a&gt; of congressional candidates. This is a great idea! It's also very feasible, given that much of the data underlying GovTrack is &lt;a href="https://github.com/unitedstates/congress-legislators"&gt;available on GitHub&lt;/a&gt;. Import that into Datasette alongside Derek's follower data and you could construct some very interesting SQL joins indeed.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="derek-willis"/><category term="politics"/><category term="sql"/><category term="twitter"/><category term="datasette"/></entry><entry><title>Represent</title><link href="https://simonwillison.net/2008/Dec/29/represent/#atom-tag" rel="alternate"/><published>2008-12-29T22:10:26+00:00</published><updated>2008-12-29T22:10:26+00:00</updated><id>https://simonwillison.net/2008/Dec/29/represent/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://open.blogs.nytimes.com/2008/12/22/represent/"&gt;Represent&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andrei Scheinkman and Derek Willis describe how they built the NYTimes Represent feature using GeoDjango and PostGIS.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andrei-scheinkman"&gt;andrei-scheinkman&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geodjango"&gt;geodjango&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/new-york-times"&gt;new-york-times&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgis"&gt;postgis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="andrei-scheinkman"/><category term="derek-willis"/><category term="django"/><category term="geodjango"/><category term="geospatial"/><category term="new-york-times"/><category term="postgis"/><category term="postgresql"/><category term="python"/></entry><entry><title>Represent and GeoDjango</title><link href="https://simonwillison.net/2008/Dec/20/represent/#atom-tag" rel="alternate"/><published>2008-12-20T21:07:55+00:00</published><updated>2008-12-20T21:07:55+00:00</updated><id>https://simonwillison.net/2008/Dec/20/represent/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.thescoop.org/archives/2008/12/19/represent-and-geodjango/"&gt;Represent and GeoDjango&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The NYTimes new Represent application is built on GeoDjango.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geodjango"&gt;geodjango&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/new-york-times"&gt;new-york-times&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/represent"&gt;represent&lt;/a&gt;&lt;/p&gt;



</summary><category term="derek-willis"/><category term="django"/><category term="geodjango"/><category term="new-york-times"/><category term="python"/><category term="represent"/></entry><entry><title>Django, iCal and vObject</title><link href="https://simonwillison.net/2007/Aug/1/vobject/#atom-tag" rel="alternate"/><published>2007-08-01T11:09:13+00:00</published><updated>2007-08-01T11:09:13+00:00</updated><id>https://simonwillison.net/2007/Aug/1/vobject/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.thescoop.org/archives/2007/07/31/django-ical-and-vobject/"&gt;Django, iCal and vObject&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Easy iCal generation for Django using vObject.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/icalendar"&gt;icalendar&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vobject"&gt;vobject&lt;/a&gt;&lt;/p&gt;



</summary><category term="derek-willis"/><category term="django"/><category term="icalendar"/><category term="python"/><category term="vobject"/></entry></feed>