<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: open-data</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/open-data.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-05-20T18:18:39+00:00</updated><author><name>Simon Willison</name></author><entry><title>cityofaustin/atd-data-tech issues</title><link href="https://simonwillison.net/2025/May/20/data-tech-issues/#atom-tag" rel="alternate"/><published>2025-05-20T18:18:39+00:00</published><updated>2025-05-20T18:18:39+00:00</updated><id>https://simonwillison.net/2025/May/20/data-tech-issues/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/cityofaustin/atd-data-tech/issues"&gt;cityofaustin/atd-data-tech issues&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I stumbled across this today while looking for interesting frequently updated data sources from local governments. It turns out the City of Austin's &lt;a href="https://austinmobility.io/"&gt;Transportation Data &amp;amp; Technology Services&lt;/a&gt; department run everything out of a public GitHub issues instance, which currently has 20,225 closed and 2,002 open issues. They also publish an &lt;a href="https://data.austintexas.gov/Transportation-and-Mobility/Transportation-Public-Works-Data-Tech-Services-Iss/rzwg-fyv8/about_data"&gt;exported copy&lt;/a&gt; of the issues data through the &lt;a href="https://data.austintexas.gov/"&gt;data.austintexas.gov&lt;/a&gt; open data portal.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="open-data"/><category term="github-issues"/></entry><entry><title>OpenTimes</title><link href="https://simonwillison.net/2025/Mar/17/opentimes/#atom-tag" rel="alternate"/><published>2025-03-17T22:49:59+00:00</published><updated>2025-03-17T22:49:59+00:00</updated><id>https://simonwillison.net/2025/Mar/17/opentimes/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sno.ws/opentimes/"&gt;OpenTimes&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Spectacular new open geospatial project by &lt;a href="https://sno.ws/"&gt;Dan Snow&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenTimes is a database of pre-computed, point-to-point travel times between United States Census geographies. It lets you download bulk travel time data for free and with no limits.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://opentimes.org/?id=060816135022&amp;amp;mode=car#9.76/37.5566/-122.3085"&gt;what I get&lt;/a&gt; for travel times by car from El Granada, California:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Isochrone map showing driving times from the El Granada census tract to other places in the San Francisco Bay Area" src="https://static.simonwillison.net/static/2025/opentimes.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The technical details are &lt;em&gt;fascinating&lt;/em&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The entire OpenTimes backend is just static Parquet files on &lt;a href="https://www.cloudflare.com/developer-platform/products/r2/"&gt;Cloudflare's R2&lt;/a&gt;. There's no RDBMS or running service, just files and a CDN. The whole thing costs about $10/month to host and costs nothing to serve. In my opinion, this is a &lt;em&gt;great&lt;/em&gt; way to serve infrequently updated, large public datasets at low cost (as long as you partition the files correctly).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sure enough, &lt;a href="https://developers.cloudflare.com/r2/pricing/"&gt;R2 pricing&lt;/a&gt; charges "based on the total volume of data stored" - $0.015 / GB-month for standard storage, then $0.36 / million requests for "Class B" operations which include reads. They charge nothing for outbound bandwidth.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;All travel times were calculated by pre-building the inputs (OSM, OSRM networks) and then distributing the compute over &lt;a href="https://github.com/dfsnow/opentimes/actions/workflows/calculate-times.yaml"&gt;hundreds of GitHub Actions jobs&lt;/a&gt;. This worked shockingly well for this specific workload (and was also completely free).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's a &lt;a href="https://github.com/dfsnow/opentimes/actions/runs/13094249792"&gt;GitHub Actions run&lt;/a&gt; of the &lt;a href="https://github.com/dfsnow/opentimes/blob/a6a5f7abcdd69559b3e29f360fe0ff0399dbb400/.github/workflows/calculate-times.yaml#L78-L80"&gt;calculate-times.yaml workflow&lt;/a&gt; which uses a matrix to run 255 jobs!&lt;/p&gt;
&lt;p&gt;&lt;img alt="GitHub Actions run: calculate-times.yaml run by workflow_dispatch taking 1h49m to execute 255 jobs with names like run-job (2020-01) " src="https://static.simonwillison.net/static/2025/opentimes-github-actions.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;Relevant YAML:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  matrix:
    year: ${{ fromJSON(needs.setup-jobs.outputs.years) }}
    state: ${{ fromJSON(needs.setup-jobs.outputs.states) }}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where those JSON files were created by the previous step, which reads in the year and state values from &lt;a href="https://github.com/dfsnow/opentimes/blob/a6a5f7abcdd69559b3e29f360fe0ff0399dbb400/data/params.yaml#L72-L132"&gt;this params.yaml file&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The query layer uses a single DuckDB database file with &lt;em&gt;views&lt;/em&gt; that point to static Parquet files via HTTP. This lets you query a table with hundreds of billions of records after downloading just the ~5MB pointer file.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a really creative use of DuckDB's feature that lets you run queries against large data from a laptop using HTTP range queries to avoid downloading the whole thing.&lt;/p&gt;
&lt;p&gt;The README shows &lt;a href="https://github.com/dfsnow/opentimes/blob/3439fa2c54af227e40997b4a5f55678739e0f6df/README.md#using-duckdb"&gt;how to use that from R and Python&lt;/a&gt; - I got this working in the &lt;code&gt;duckdb&lt;/code&gt; client (&lt;code&gt;brew install duckdb&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;INSTALL httpfs;
LOAD httpfs;
ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;

SELECT origin_id, destination_id, duration_sec
  FROM opentimes.public.times
  WHERE version = '0.0.1'
      AND mode = 'car'
      AND year = '2024'
      AND geography = 'tract'
      AND state = '17'
      AND origin_id LIKE '17031%' limit 10;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In answer to a question about adding public transit times &lt;a href="https://news.ycombinator.com/item?id=43392521#43393183"&gt;Dan said&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the next year or so maybe. The biggest obstacles to adding public transit are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Collecting all the necessary scheduling data (e.g. GTFS feeds) for every transit system in the county. Not insurmountable since there are services that do this currently.&lt;/li&gt;
&lt;li&gt;Finding a routing engine that can compute nation-scale travel time matrices quickly. Currently, the two fastest open-source engines I've tried (OSRM and Valhalla) don't support public transit for matrix calculations and the engines that do support public transit (R5, OpenTripPlanner, etc.) are too slow.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://gtfs.org/"&gt;GTFS&lt;/a&gt; is a popular CSV-based format for sharing transit schedules - here's &lt;a href="https://gtfs.org/resources/data/"&gt;an official list&lt;/a&gt; of available feed directories.&lt;/p&gt;
&lt;p&gt;This whole project feels to me like a great example of the &lt;a href="https://simonwillison.net/2021/Jul/28/baked-data/"&gt;baked data&lt;/a&gt; architectural pattern in action.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43392521"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/census"&gt;census&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openstreetmap"&gt;openstreetmap&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parquet"&gt;parquet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/baked-data"&gt;baked-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http-range-requests"&gt;http-range-requests&lt;/a&gt;&lt;/p&gt;



</summary><category term="census"/><category term="geospatial"/><category term="open-data"/><category term="openstreetmap"/><category term="cloudflare"/><category term="parquet"/><category term="github-actions"/><category term="baked-data"/><category term="duckdb"/><category term="http-range-requests"/></entry><entry><title>Overture Maps Foundation Releases Its First World-Wide Open Map Dataset</title><link href="https://simonwillison.net/2023/Jul/27/overture-maps/#atom-tag" rel="alternate"/><published>2023-07-27T16:45:09+00:00</published><updated>2023-07-27T16:45:09+00:00</updated><id>https://simonwillison.net/2023/Jul/27/overture-maps/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://overturemaps.org/overture-maps-foundation-releases-first-world-wide-open-map-dataset/"&gt;Overture Maps Foundation Releases Its First World-Wide Open Map Dataset&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Overture Maps Foundation is a collaboration lead by Amazon, Meta, Microsoft and TomTom dedicated to producing “reliable, easy-to-use, and interoperable open map data”.&lt;/p&gt;

&lt;p&gt;Yesterday they put out their first release and it’s pretty astonishing: four different layers of geodata, covering Places of Interest (shops, restaurants, attractions etc), administrative boundaries, building outlines and transportation networks.&lt;/p&gt;

&lt;p&gt;The data is available as Parquet. I just downloaded the 8GB places dataset and can confirm that it contains 59 million listings from around the world—I filtered to just places in my local town and a spot check showed that recently opened businesses (last 12 months) were present and the details all looked accurate.&lt;/p&gt;

&lt;p&gt;The places data is licensed under “Community Data License Agreement – Permissive” which looks like the only restriction is that you have to include that license when you further share the data.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parquet"&gt;parquet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/meta"&gt;meta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/overture"&gt;overture&lt;/a&gt;&lt;/p&gt;



</summary><category term="geospatial"/><category term="open-data"/><category term="parquet"/><category term="meta"/><category term="overture"/></entry><entry><title>Quoting Saloni Dattani</title><link href="https://simonwillison.net/2022/Oct/25/saloni-dattani/#atom-tag" rel="alternate"/><published>2022-10-25T22:48:06+00:00</published><updated>2022-10-25T22:48:06+00:00</updated><id>https://simonwillison.net/2022/Oct/25/saloni-dattani/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.wired.com/story/covid-19-open-science-public-health-data/"&gt;&lt;p&gt;Most researchers don’t share their data. If you’ve ever read the words “data is available upon request" in an academic paper, and emailed the authors to request it, the chances that you'll actually receive the data are just 7 percent. The rest of the time, the authors have lost access to their data, changed emails, or are too busy or unwilling.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.wired.com/story/covid-19-open-science-public-health-data/"&gt;Saloni Dattani&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/science"&gt;science&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="science"/></entry><entry><title>Weeknotes: datasette-socrata, and the last 10%...</title><link href="https://simonwillison.net/2022/Jun/19/weeknotes/#atom-tag" rel="alternate"/><published>2022-06-19T03:26:52+00:00</published><updated>2022-06-19T03:26:52+00:00</updated><id>https://simonwillison.net/2022/Jun/19/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;... takes 90% of the work. I continue to work towards a preview of the new Datasette Cloud, and keep finding new "just one more things" to delay inviting in users.&lt;/p&gt;
&lt;p&gt;Aside from continuing to work on that, my big project in the last week was a blog entry: &lt;a href="https://simonwillison.net/2022/Jun/12/twenty-years/"&gt;Twenty years of my blog&lt;/a&gt;, in which I celebrated twenty years since starting this site by pulling together a selection of highlights from over the years.&lt;/p&gt;
&lt;p&gt;I've actually updated that entry a few times over the past few days as I remembered new highlights I forgot to include - the Twitter thread that accompanies the entry has those updates, &lt;a href="https://twitter.com/simonw/status/1536364064951062528"&gt;starting here&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;datasette-socrata&lt;/h4&gt;
&lt;p&gt;I've been thinking a lot about the Datasette Cloud onboarding experience: how can I help new users understand what Datasette can be used for as quickly as possible?&lt;/p&gt;
&lt;p&gt;I want to get them to a point where they are interacting with a freshly created table of data. I can provide some examples, but I've always thought that one of the biggest opportunities for Datasette lies in working with the kind of data released by governments through their Open Data portals. This is especially true for its usage in the field of data journalism.&lt;/p&gt;
&lt;p&gt;Many open data portals - including &lt;a href="https://data.sfgov.org"&gt;the one for San Francisco&lt;/a&gt; - are powered by a piece of software called &lt;a href="https://dev.socrata.com/"&gt;Socrata&lt;/a&gt;. And it offers a pretty comprehensive API.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/plugins/datasette-socrata"&gt;datasette-socrata&lt;/a&gt; is a new Datasette plugin which can import data from Socrata instances. Give it the URL to a Socrata dataset (like &lt;a href="https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq"&gt;this one&lt;/a&gt;, my perennial favourite, listing all 195,000+ trees managed by the city of San Francisco) and it will import that data and its associated metadata into a brand new table.&lt;/p&gt;
&lt;p&gt;It's pretty neat! It even shows you a progress bar, since some of these datasets can get pretty large:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/datasette-socrata.gif" alt="Animated demo of a progress bar, starting at 137,000/179,605 and continuing until the entire set has been imported" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;As part of building this I ran into the interesting question of what a plugin like this should do if the system it is running on runs out of disk space?&lt;/p&gt;
&lt;p&gt;I'm still working through that, but I'm experimenting with a new type of Datasette plugin for it: &lt;a href="https://github.com/simonw/datasette-low-disk-space-hook"&gt;datasette-low-disk-space-hook&lt;/a&gt;, which introduces a new plugin hook (&lt;code&gt;low_disk_space(datasette)&lt;/code&gt;) which other plugins can use to report a situation where disk space is running out.&lt;/p&gt;
&lt;p&gt;I wrote a TIL about that here: &lt;a href="https://til.simonwillison.net/datasette/register-new-plugin-hooks"&gt;Registering new Datasette plugin hooks by defining them in other plugins&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I may use this same trick for a future upgrade to &lt;code&gt;datasette-graphql&lt;/code&gt;, to allow additional plugins to register custom GraphQL mutations.&lt;/p&gt;
&lt;h4 id="sqlite-utils-3-27"&gt;sqlite-utils 3.27&lt;/h4&gt;
&lt;p&gt;In working on &lt;code&gt;datasette-socrata&lt;/code&gt; I was inspired to push out a new release of &lt;code&gt;sqlite-utils&lt;/code&gt;. Here are the annotated &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-27"&gt;release notes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Documentation now uses the &lt;a href="https://github.com/pradyunsg/furo"&gt;Furo&lt;/a&gt; Sphinx theme. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/435"&gt;#435&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wrote about this &lt;a href="https://simonwillison.net/2022/May/26/weeknotes-building-datasette-cloud/#furo-theme"&gt;a few weeks ago&lt;/a&gt; - the new documentation theme is now &lt;a href="https://sqlite-utils.datasette.io/en/stable/"&gt;live for the stable documentation&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Code examples in documentation now have a "copy to clipboard" button. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/436"&gt;#436&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I made this change &lt;a href="https://github.com/simonw/datasette/issues/1748"&gt;to Datasette first&lt;/a&gt; - the &lt;a href="https://github.com/executablebooks/sphinx-copybutton"&gt;sphinx-copybutton&lt;/a&gt; plugin adds a neat "copy" button next to every code example.&lt;/p&gt;
&lt;p&gt;I also like how this encourages ensuring that every example will work if people directly copy and paste it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sqlite_utils.utils.utils.rows_from_file()&lt;/code&gt; is now a documented API, see &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-rows-from-file"&gt;Reading rows from a file&lt;/a&gt;. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/443"&gt;#443&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Francesco Frassinelli filed &lt;a href="https://github.com/simonw/sqlite-utils/issues/440"&gt;an issue&lt;/a&gt; about this utility function, which wasn't actually part of the documented stable API, but I saw no reason not to promote it.&lt;/p&gt;
&lt;p&gt;The function incorporates the logic that the &lt;code&gt;sqlite-utils&lt;/code&gt; CLI tool uses to automatically detect if a provided file is CSV, TSV or JSON and detect the CSV delimeter and other settings.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rows_from_file()&lt;/code&gt; has two new parameters to help handle CSV files with rows that contain more values than are listed in that CSV file's headings: &lt;code&gt;ignore_extras=True&lt;/code&gt; and &lt;code&gt;extras_key="name-of-key"&lt;/code&gt;. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/440"&gt;#440&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;It turns out &lt;code&gt;csv.DictReader&lt;/code&gt; in the Python standard library has a mechanism for handling CSV rows that contain too many commas.&lt;/p&gt;
&lt;p&gt;In working on this I found a bug in &lt;code&gt;mypy&lt;/code&gt; which I &lt;a href="https://github.com/python/typeshed/issues/8075"&gt;reported here&lt;/a&gt;, but it turned out to be a dupe of an &lt;a href="https://github.com/python/typeshed/issues/7953"&gt;already fixed issue&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sqlite_utils.utils.maximize_csv_field_size_limit()&lt;/code&gt; helper function for increasing the field size limit for reading CSV files to its maximum, see &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-maximize-csv-field-size-limit"&gt;Setting the maximum CSV field size limit&lt;/a&gt;. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/442"&gt;#442&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a workaround for the following Python error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;_csv.Error: field larger than field limit (131072)&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's an error that occurs when a field in a CSV file is longer than a default length.&lt;/p&gt;
&lt;p&gt;Saying "yeah, I want to be able to handle the maximum length possible" is surprisingly hard - Python doesn't let you set a maximum, and can throw errors depending on the platform if you set a number too high. Here's the idiom that works, which is encapsulated by the new utility function:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;field_size_limit&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sys&lt;/span&gt;.&lt;span class="pl-s1"&gt;maxsize&lt;/span&gt;

&lt;span class="pl-k"&gt;while&lt;/span&gt; &lt;span class="pl-c1"&gt;True&lt;/span&gt;:
    &lt;span class="pl-k"&gt;try&lt;/span&gt;:
        &lt;span class="pl-s1"&gt;csv_std&lt;/span&gt;.&lt;span class="pl-en"&gt;field_size_limit&lt;/span&gt;(&lt;span class="pl-s1"&gt;field_size_limit&lt;/span&gt;)
        &lt;span class="pl-k"&gt;break&lt;/span&gt;
    &lt;span class="pl-k"&gt;except&lt;/span&gt; &lt;span class="pl-v"&gt;OverflowError&lt;/span&gt;:
        &lt;span class="pl-s1"&gt;field_size_limit&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;int&lt;/span&gt;(&lt;span class="pl-s1"&gt;field_size_limit&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-c1"&gt;10&lt;/span&gt;)&lt;/pre&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;table.search(where=, where_args=)&lt;/code&gt; parameters for adding additional &lt;code&gt;WHERE&lt;/code&gt; clauses to a search query. The &lt;code&gt;where=&lt;/code&gt; parameter is available on &lt;code&gt;table.search_sql(...)&lt;/code&gt; as well. See &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-fts-search"&gt;Searching with table.search()&lt;/a&gt;. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/441"&gt;#441&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was a feature suggestion &lt;a href="https://github.com/simonw/sqlite-utils/issues/441"&gt;from Tim Head&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Fixed bug where &lt;code&gt;table.detect_fts()&lt;/code&gt; and other search-related functions could fail if two FTS-enabled tables had names that were prefixes of each other. (&lt;a href="https://github.com/simonw/sqlite-utils/issues/434"&gt;#434&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was quite a gnarly bug. &lt;code&gt;sqlite-utils&lt;/code&gt; attempts to detect if a table has an associated full-text search table by looking through the schema for another table that has a definition like this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;CREATE VIRTUAL TABLE &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;searchable_fts&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
USING FTS4 (
    text1,
    text2,
    [name with . &lt;span class="pl-k"&gt;and&lt;/span&gt; spaces],
    content&lt;span class="pl-k"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;searchable&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
)&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I was checking for &lt;code&gt;content="searchable"&lt;/code&gt; using a LIKE query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;SELECT&lt;/span&gt; name &lt;span class="pl-k"&gt;FROM&lt;/span&gt; sqlite_master
&lt;span class="pl-k"&gt;WHERE&lt;/span&gt; rootpage &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;0&lt;/span&gt;
&lt;span class="pl-k"&gt;AND&lt;/span&gt;
sql &lt;span class="pl-k"&gt;LIKE&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%VIRTUAL TABLE%USING FTS%content=%searchable%&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But this would incorrectly match strings such as &lt;code&gt;content="searchable2"&lt;/code&gt; as well!&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-socrata"&gt;datasette-socrata&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-socrata/releases/tag/0.3"&gt;0.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-socrata/releases"&gt;4 releases total&lt;/a&gt;) - 2022-06-17
&lt;br /&gt;Import data from Socrata into Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-low-disk-space-hook"&gt;datasette-low-disk-space-hook&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-low-disk-space-hook/releases/tag/0.1"&gt;0.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-low-disk-space-hook/releases"&gt;2 releases total&lt;/a&gt;) - 2022-06-17
&lt;br /&gt;Datasette plugin providing the low_disk_space hook for other plugins to check for low disk space&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.27"&gt;3.27&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;101 releases total&lt;/a&gt;) - 2022-06-15
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-ics"&gt;datasette-ics&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-ics/releases/tag/0.5.1"&gt;0.5.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-ics/releases"&gt;4 releases total&lt;/a&gt;) - 2022-06-10
&lt;br /&gt;Datasette plugin for outputting iCalendar files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.7.1"&gt;0.7.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-upload-csvs/releases"&gt;9 releases total&lt;/a&gt;) - 2022-06-09
&lt;br /&gt;Datasette plugin for uploading CSV files and converting them to database tables&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/networking/http-ipv6"&gt;Making HTTP calls using IPv6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/jinja/format-thousands"&gt;Formatting thousands in Jinja&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/linux/iconv"&gt;Using iconv to convert the text encoding of a file&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/datasette/register-new-plugin-hooks"&gt;Registering new Datasette plugin hooks by defining them in other plugins&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-data"/><category term="plugins"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="sqlite-utils"/><category term="annotated-release-notes"/></entry><entry><title>Usable Data</title><link href="https://simonwillison.net/2019/Jan/11/usable-data/#atom-tag" rel="alternate"/><published>2019-01-11T18:33:18+00:00</published><updated>2019-01-11T18:33:18+00:00</updated><id>https://simonwillison.net/2019/Jan/11/usable-data/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://medium.com/@ftrain/usable-data-eb7234d64309"&gt;Usable Data&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don’t know how I missed this one—it predates Datasette, but it perfectly captures the benefits that I’m trying to expose with the project. “In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.”

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1083793391631032320"&gt;Twitter&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paul-ford"&gt;paul-ford&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="paul-ford"/><category term="sqlite"/><category term="datasette"/></entry><entry><title>Quoting Five stars of open data</title><link href="https://simonwillison.net/2018/Apr/17/five-stars-open-data/#atom-tag" rel="alternate"/><published>2018-04-17T04:20:28+00:00</published><updated>2018-04-17T04:20:28+00:00</updated><id>https://simonwillison.net/2018/Apr/17/five-stars-open-data/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://opendatahandbook.org/glossary/en/terms/five-stars-of-open-data/"&gt;&lt;p&gt;A rating system for open data proposed by Tim Berners-Lee, founder of the World Wide Web. To score the maximum five stars, data must (1) be available on the Web under an open licence, (2) be in the form of structured data, (3) be in a non-proprietary file format, (4) use URIs as its identifiers (see also RDF), (5) include links to other data sources (see linked data). To score 3 stars, it must satisfy all of (1)-(3), etc.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://opendatahandbook.org/glossary/en/terms/five-stars-of-open-data/"&gt;Five stars of open data&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tim-berners-lee"&gt;tim-berners-lee&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="tim-berners-lee"/></entry><entry><title>GOV.UK Registers</title><link href="https://simonwillison.net/2017/Nov/7/govuk-registers/#atom-tag" rel="alternate"/><published>2017-11-07T15:31:46+00:00</published><updated>2017-11-07T15:31:46+00:00</updated><id>https://simonwillison.net/2017/Nov/7/govuk-registers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://registers.cloudapps.digital/"&gt;GOV.UK Registers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Canonical sources of “lists of information” intended for use by GDS teams building software for the UK government, but available for anyone. 17 registers are “ready for use”, 45 are “in progress”. Covers things like the FCO’s country list, the official list of prison estates, and DEFRA’s list of public bodies in England that manage drainage systems.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://shkspr.mobi/blog/2017/11/input-type-country/"&gt;Terence Eden&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datagov"&gt;datagov&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/government"&gt;government&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gov-uk"&gt;gov-uk&lt;/a&gt;&lt;/p&gt;



</summary><category term="datagov"/><category term="government"/><category term="open-data"/><category term="gov-uk"/></entry><entry><title>Exploring United States Policing Data Using Python</title><link href="https://simonwillison.net/2017/Oct/29/exploring-united-states-policing-data-using-python/#atom-tag" rel="alternate"/><published>2017-10-29T16:58:36+00:00</published><updated>2017-10-29T16:58:36+00:00</updated><id>https://simonwillison.net/2017/Oct/29/exploring-united-states-policing-data-using-python/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.patricktriest.com/police-data-python/"&gt;Exploring United States Policing Data Using Python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Outstanding introduction to data analysis with Jupyter and Pandas.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="pandas"/><category term="python"/><category term="jupyter"/></entry><entry><title>OpenCorporates</title><link href="https://simonwillison.net/2010/Dec/22/opencorporates/#atom-tag" rel="alternate"/><published>2010-12-22T11:52:00+00:00</published><updated>2010-12-22T11:52:00+00:00</updated><id>https://simonwillison.net/2010/Dec/22/opencorporates/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://opencorporates.com/"&gt;OpenCorporates&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“The Open Database Of The Corporate World”—a URL for every UK company.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://blog.okfn.org/2010/12/20/opencorporates-the-open-database-of-the-corporate-world/"&gt;Open Knowledge Foundation Blog&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="recovered"/></entry><entry><title>Doing things with Ordnance Survey OpenData</title><link href="https://simonwillison.net/2010/May/20/os/#atom-tag" rel="alternate"/><published>2010-05-20T15:22:00+00:00</published><updated>2010-05-20T15:22:00+00:00</updated><id>https://simonwillison.net/2010/May/20/os/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://frot.org/t/talks/techmeetup.html"&gt;Doing things with Ordnance Survey OpenData&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jo Walsh’s guide to processing Ordnance Survey OpenData using PostgreSQL and PostGIS.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mapping"&gt;mapping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ordnancesurvey"&gt;ordnancesurvey&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgis"&gt;postgis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jo-walsh"&gt;jo-walsh&lt;/a&gt;&lt;/p&gt;



</summary><category term="mapping"/><category term="open-data"/><category term="ordnancesurvey"/><category term="postgis"/><category term="postgresql"/><category term="recovered"/><category term="jo-walsh"/></entry><entry><title>Preview: Freebase Gridworks</title><link href="https://simonwillison.net/2010/Mar/27/gridworks/#atom-tag" rel="alternate"/><published>2010-03-27T18:43:42+00:00</published><updated>2010-03-27T18:43:42+00:00</updated><id>https://simonwillison.net/2010/Mar/27/gridworks/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.freebase.com/2010/03/26/preview-freebase-gridworks/"&gt;Preview: Freebase Gridworks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
If my experience with government datasets has taught me anything, it’s that most datasets are collected by human beings (probably using Excel) and human beings are inconsistent. The first step in any data related project inevitably involves cleaning up the data. The Freebase team must run up against this all the time, and it looks like they’re tackling the problem head-on. Freebase Gridworks is just a screencast preview at the moment but an open source release is promised “within a month”—and the tool looks absolutely fantastic. DabbleDB-style data refactoring of spreadsheet data, running on your desktop but with the UI served in a browser. Full undo, a JavaScript-based expression language, powerful faceting and the ability to “reconcile” data against Freebase types (matching up country names, for example). I can’t wait to get my hands on this.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://blog.jonudell.net/2010/03/26/freebase-gridworks-a-power-tool-for-data-scrubbers/"&gt;Jon Udell&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cleanup"&gt;cleanup&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dabbledb"&gt;dabbledb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data"&gt;data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/freebase"&gt;freebase&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gridworks"&gt;gridworks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;&lt;/p&gt;



</summary><category term="cleanup"/><category term="dabbledb"/><category term="data"/><category term="freebase"/><category term="gridworks"/><category term="javascript"/><category term="open-data"/></entry><entry><title>No PDFs!</title><link href="https://simonwillison.net/2009/Nov/1/pdfs/#atom-tag" rel="alternate"/><published>2009-11-01T12:04:36+00:00</published><updated>2009-11-01T12:04:36+00:00</updated><id>https://simonwillison.net/2009/Nov/1/pdfs/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.sunlightfoundation.com/2009/06/05/no-pdfs/"&gt;No PDFs!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Sunlight Foundation point out that PDFs are a terrible way of implementing “more transparent government” due to their general lack of structure. At the Guardian (and I’m sure at other newspapers) we waste an absurd amount of time manually extracting data from PDF files and turning it in to something more useful. Even CSV is significantly more useful for many types of information.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/adobe"&gt;adobe&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/csv"&gt;csv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/opengovernment"&gt;opengovernment&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sunlightfoundation"&gt;sunlightfoundation&lt;/a&gt;&lt;/p&gt;



</summary><category term="adobe"/><category term="csv"/><category term="open-data"/><category term="opengovernment"/><category term="pdf"/><category term="sunlightfoundation"/></entry><entry><title>Show Us a Better Way</title><link href="https://simonwillison.net/2008/Jul/4/show/#atom-tag" rel="alternate"/><published>2008-07-04T09:36:43+00:00</published><updated>2008-07-04T09:36:43+00:00</updated><id>https://simonwillison.net/2008/Jul/4/show/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.showusabetterway.com/"&gt;Show Us a Better Way&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The UK Government’s Power of Information Taskforce are running a mashup competition (a.k.a. “ideas for new products that could improve the way public information is communicated”) with a £20,000 prize fund and gigabytes of brand new data and APIs. This is a great opportunity for the software community to demonstrate how important this kind of open data really is.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mashups"&gt;mashups&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/powerofinformation"&gt;powerofinformation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ukgovernment"&gt;ukgovernment&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="mashups"/><category term="open-data"/><category term="powerofinformation"/><category term="ukgovernment"/></entry><entry><title>Freeing the postcode</title><link href="https://simonwillison.net/2006/Nov/17/postcode/#atom-tag" rel="alternate"/><published>2006-11-17T17:29:59+00:00</published><updated>2006-11-17T17:29:59+00:00</updated><id>https://simonwillison.net/2006/Nov/17/postcode/#atom-tag</id><summary type="html">
    &lt;p id="p-0"&gt;&lt;a href="http://en.wikipedia.org/wiki/UK postcodes"&gt;UK postcodes&lt;/a&gt; have some interesting characteristics: a full six character post code identifies an average of around 14 house holds, and postcodes are mainly hierarchical - W1W will always be contained within W1 for example. They're useful for a huge range of interesting things.&lt;/p&gt;

&lt;p id="p-1"&gt;The problem is that the postcode database (of nearly 1.8 million postcodes) is &lt;a href="http://en.wikipedia.org/wiki/Postcode_Address_File" title="Postcode Address File"&gt;owned by the Royal Mail&lt;/a&gt; and licensed at a not inconsiderable fee of between £150 and £9,000 per year.&lt;/p&gt;

&lt;p id="p-2"&gt;&lt;a href="http://www.freethepostcode.org/"&gt;Free the postcode&lt;/a&gt; was set up a while ago to try to remedy this situation, by asking people to enter their postcode along with the latitude/longitude coordinates collected from a GPS. Having people enter coordinates from online mapping services is no good as EU database law may see that as a derivative work. It's had some success, but the GPS requirement has seriously stunted its growth.&lt;/p&gt;

&lt;p id="p-3"&gt;Then a few weeks ago, &lt;a href="http://www.npemap.org.uk/" title="New Popular Edition Maps"&gt;npemap.org.uk&lt;/a&gt; launched. It's an interface for browsing scans of out-of-copyright maps from the 1950s (credits at the bottom of &lt;a href="http://www.npemap.org.uk/FAQ.html"&gt;the FAQ&lt;/a&gt;). The site asks people to enter post codes based on that old mapping data, which can then be placed in the public domain.&lt;/p&gt;

&lt;p id="p-4"&gt;If you haven't already done so, you should go and add any postcodes that you know about now. It takes no time at all, and is especially important if you live in one of the &lt;a href="http://www.npemap.org.uk/stats/missing_district_stats.html"&gt;230 districts&lt;/a&gt; for which no data has yet been collected.&lt;/p&gt;

&lt;p id="p-5"&gt;You can grab the data they've already collected &lt;a href="http://www.npemap.org.uk/data/" title="Download our postcodes"&gt;from here&lt;/a&gt;. There's a really cool &lt;a href="http://www.npemap.org.uk/postcodeine/"&gt;interactive visualisation&lt;/a&gt; of their data here, based on &lt;a href="http://bitter.ukcod.org.uk/~chris/postcodeine/"&gt;previous work&lt;/a&gt; by Chris Lightfoot using the commercially licensed postcode database.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/npemap"&gt;npemap&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postcode"&gt;postcode&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/royalmail"&gt;royalmail&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="npemap"/><category term="open-data"/><category term="postcode"/><category term="royalmail"/></entry></feed>