<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: pandas</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/pandas.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-01-03T20:04:00+00:00</updated><author><name>Simon Willison</name></author><entry><title>Fastest Way to Read Excel in Python</title><link href="https://simonwillison.net/2024/Jan/3/fastest-way-to-read-excel-in-python/#atom-tag" rel="alternate"/><published>2024-01-03T20:04:00+00:00</published><updated>2024-01-03T20:04:00+00:00</updated><id>https://simonwillison.net/2024/Jan/3/fastest-way-to-read-excel-in-python/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hakibenita.com/fast-excel-python"&gt;Fastest Way to Read Excel in Python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Haki Benita produced a meticulously researched and written exploration of the options for reading a large Excel spreadsheet into Python. He explored Pandas, Tablib, Openpyxl, shelling out to LibreOffice, DuckDB and python-calamine (a Python wrapper of a Rust library). Calamine was the winner, taking 3.58s to read 500,00 rows—compared to Pandas in last place at 32.98s.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/5tugrd/fastest_way_read_excel_python"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/excel"&gt;excel&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/haki-benita"&gt;haki-benita&lt;/a&gt;&lt;/p&gt;



</summary><category term="excel"/><category term="pandas"/><category term="python"/><category term="rust"/><category term="duckdb"/><category term="haki-benita"/></entry><entry><title>I’m banned for life from advertising on Meta. Because I teach Python.</title><link href="https://simonwillison.net/2023/Oct/19/banned-for-life/#atom-tag" rel="alternate"/><published>2023-10-19T14:56:05+00:00</published><updated>2023-10-19T14:56:05+00:00</updated><id>https://simonwillison.net/2023/Oct/19/banned-for-life/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://lerner.co.il/2023/10/19/im-banned-for-life-from-advertising-on-meta-because-i-teach-python/"&gt;I’m banned for life from advertising on Meta. Because I teach Python.&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
If accurate, this describes a nightmare scenario of automated decision making.&lt;/p&gt;

&lt;p&gt;Reuven recently found he had a permanent ban from advertising on Facebook. They won’t tell him exactly why, and have marked this as a final decision that can never be reviewed.&lt;/p&gt;

&lt;p&gt;His best theory (impossible for him to confirm) is that it’s because he tried advertising a course on Python and Pandas a few years ago which was blocked because a dumb algorithm thought he was trading exotic animals!&lt;/p&gt;

&lt;p&gt;The worst part? An appeal is no longer possible because relevant data is only retained for 180 days and so all of the related evidence has now been deleted.&lt;/p&gt;

&lt;p&gt;Various comments on Hacker News from people familiar with these systems confirm that this story likely holds up.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=37939269"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facebook"&gt;facebook&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/meta"&gt;meta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-misuse"&gt;ai-misuse&lt;/a&gt;&lt;/p&gt;



</summary><category term="ethics"/><category term="facebook"/><category term="pandas"/><category term="python"/><category term="ai"/><category term="meta"/><category term="ai-ethics"/><category term="ai-misuse"/></entry><entry><title>Proof of concept: sqlite_utils magic for Jupyter</title><link href="https://simonwillison.net/2020/Oct/21/magic/#atom-tag" rel="alternate"/><published>2020-10-21T17:26:43+00:00</published><updated>2020-10-21T17:26:43+00:00</updated><id>https://simonwillison.net/2020/Oct/21/magic/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://gist.github.com/psychemedia/3187bce18ffdd79bf258d6011ec301b3"&gt;Proof of concept: sqlite_utils magic for Jupyter&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tony Hirst has been experimenting with building a Jupyter “magic” that adds special syntax for using sqlite-utils to insert data and run queries. Query results come back as a Pandas DataFrame, which Jupyter then displays as a table.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/psychemedia/status/1318962507650912257"&gt;@psychemedia&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tony-hirst"&gt;tony-hirst&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="sqlite"/><category term="tony-hirst"/><category term="jupyter"/><category term="sqlite-utils"/></entry><entry><title>Los Angeles Weedmaps analysis</title><link href="https://simonwillison.net/2019/May/30/los-angeles-weedmaps-analysis/#atom-tag" rel="alternate"/><published>2019-05-30T04:35:42+00:00</published><updated>2019-05-30T04:35:42+00:00</updated><id>https://simonwillison.net/2019/May/30/los-angeles-weedmaps-analysis/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://nbviewer.jupyter.org/github/datadesk/la-weedmaps-analysis/blob/master/notebook.ipynb"&gt;Los Angeles Weedmaps analysis&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/palewire/status/1133723284116086784"&gt;Ben Welsh&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/latimes"&gt;latimes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-welsh"&gt;ben-welsh&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="geospatial"/><category term="latimes"/><category term="pandas"/><category term="jupyter"/><category term="ben-welsh"/></entry><entry><title>Pyodide: Bringing the scientific Python stack to the browser</title><link href="https://simonwillison.net/2019/Apr/17/pyodide-bringing-scientific-python-stack-browser/#atom-tag" rel="alternate"/><published>2019-04-17T04:23:33+00:00</published><updated>2019-04-17T04:23:33+00:00</updated><id>https://simonwillison.net/2019/Apr/17/pyodide-bringing-scientific-python-stack-browser/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hacks.mozilla.org/2019/04/pyodide-bringing-the-scientific-python-stack-to-the-browser/"&gt;Pyodide: Bringing the scientific Python stack to the browser&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
More fun with WebAssembly: Pyodide attempts (and mostly succeeds) to bring the full Python data stack to the browser: CPython, NumPy, Pandas, Scipy, and Matplotlib. Also includes interesting bridge tools for e.g. driving a canvas element from Python. Really interesting project from the Firefox Data Platform team.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=19677721"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scipy"&gt;scipy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/numpy"&gt;numpy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pyodide"&gt;pyodide&lt;/a&gt;&lt;/p&gt;



</summary><category term="mozilla"/><category term="pandas"/><category term="python"/><category term="scipy"/><category term="webassembly"/><category term="numpy"/><category term="pyodide"/></entry><entry><title>How to rewrite your SQL queries in Pandas, and more</title><link href="https://simonwillison.net/2018/Apr/19/sql-pandas/#atom-tag" rel="alternate"/><published>2018-04-19T18:34:42+00:00</published><updated>2018-04-19T18:34:42+00:00</updated><id>https://simonwillison.net/2018/Apr/19/sql-pandas/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://codeburst.io/how-to-rewrite-your-sql-queries-in-pandas-and-more-149d341fc53e"&gt;How to rewrite your SQL queries in Pandas, and more&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://medium.freecodecamp.org/python-collection-of-my-favorite-articles-8469b8455939"&gt;Gergely Szerovay&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="python"/><category term="sql"/></entry><entry><title>Analyzing my Twitter followers with Datasette</title><link href="https://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/#atom-tag" rel="alternate"/><published>2018-01-28T06:41:38+00:00</published><updated>2018-01-28T06:41:38+00:00</updated><id>https://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/#atom-tag</id><summary type="html">
    &lt;p&gt;I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using &lt;a href="https://github.com/simonw/datasette"&gt;Datasette&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="The_end_result_4"&gt;&lt;/a&gt;The end result&lt;/h3&gt;
&lt;p&gt;Here’s the Datasette database containing all of my Twitter followers: &lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a"&gt;https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Much more interesting though are the queries I can now run against it. A few examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+followers.rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+location.value+as+location+from+followers+join+location+on+followers.location+%3D+location.id+where+followers.rowid+in+%28select+rowid+from+%5Bfollowers_fts%5D+where+%5Bfollowers_fts%5D+match+%3Asearch%29+order+by+followers_count+desc"&gt;Search my followers (their name, bio and location), return results ordered by follower count&lt;/a&gt;. This is a parameterized query - here are the resuts for &lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+followers.rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+location.value+as+location+from+followers+join+location+on+followers.location+%3D+location.id+where+followers.rowid+in+%28select+rowid+from+%5Bfollowers_fts%5D+where+%5Bfollowers_fts%5D+match+%3Asearch%29+order+by+followers_count+desc&amp;amp;search=django"&gt;django&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+verified+from+followers+order+by+followers_count+desc"&gt;Whe are my most influential followers based on their own follower count?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+verified+from+followers+where+verified+%3D+1+order+by+friends_count"&gt;For all of my “verified” followers, who are following the least numbers of people themselves?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+rowid%2C+name%2C+%281.0+*+friends_count+%2F+followers_count%29+*+100+as+ratio%2C%0D%0A++++screen_name%2C+description%2C+followers_count%2C+friends_count%2C+verified%0D%0Afrom+followers+where+followers_count+%3E+0%0D%0Aorder+by+%281.0+*+friends_count+%2F+followers_count%29"&gt;Sort my followers by their friend-to-follower ratio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+strftime%28%27%25Y%27%2C+created_at%29+as+join_year%2C+count%28*%29%0D%0Afrom+followers%0D%0Agroup+by+join_year%0D%0Aorder+by+join_year"&gt;In which years did my followers first join Twitter?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+location.value+as+location%2C+count%28*%29+as+n%0D%0Afrom+followers%0D%0Ajoin+location+on+followers.location+%3D+location.id%0D%0Agroup+by+followers.location+order+by+n+desc"&gt;What are the most common locations for my followers?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+lower%28lang.value%29%2C+count%28*%29+as+n+from+followers+join+lang+on+followers.lang+%3D+lang.id+group+by+lower%28lang.value%29+order+by+n+desc"&gt;The most common language setting for my followers&lt;/a&gt; - 13,504 en, 403 es, 257 fr, 172 pt etc&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+time_zone.value+as+time_zone%2C+lang.value+as+lang%2C+count%28*%29+as+n%0D%0Afrom+followers%0D%0Ajoin+time_zone+on+followers.time_zone+%3D+time_zone.id%0D%0Ajoin+lang+on+followers.lang+%3D+lang.id%0D%0Awhere+lang.value+%3D+%3Alang%0D%0Agroup+by+time_zone.value+order+by+n+desc"&gt;What are the most common time zones for followers with different languages?&lt;/a&gt; This is another parameterized query - here are the results for &lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+time_zone.value+as+time_zone%2C+lang.value+as+lang%2C+count%28*%29+as+n%0D%0Afrom+followers%0D%0Ajoin+time_zone+on+followers.time_zone+%3D+time_zone.id%0D%0Ajoin+lang+on+followers.lang+%3D+lang.id%0D%0Awhere+lang.value+%3D+%3Alang%0D%0Agroup+by+time_zone.value+order+by+n+desc&amp;amp;lang=en"&gt;en&lt;/a&gt;, &lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+time_zone.value+as+time_zone%2C+lang.value+as+lang%2C+count%28*%29+as+n%0D%0Afrom+followers%0D%0Ajoin+time_zone+on+followers.time_zone+%3D+time_zone.id%0D%0Ajoin+lang+on+followers.lang+%3D+lang.id%0D%0Awhere+lang.value+%3D+%3Alang%0D%0Agroup+by+time_zone.value+order+by+n+desc&amp;amp;lang=es"&gt;es&lt;/a&gt;, &lt;a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+time_zone.value+as+time_zone%2C+lang.value+as+lang%2C+count%28*%29+as+n%0D%0Afrom+followers%0D%0Ajoin+time_zone+on+followers.time_zone+%3D+time_zone.id%0D%0Ajoin+lang+on+followers.lang+%3D+lang.id%0D%0Awhere+lang.value+%3D+%3Alang%0D%0Agroup+by+time_zone.value+order+by+n+desc&amp;amp;lang=fr"&gt;fr&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The thing I find most exciting about this use-case for Datasette is that it allows you to construct entire mini-applications using just a SQL query encoded in a URL. Type queries into the textarea, iterate on them until they do something useful, add some &lt;code&gt;:named&lt;/code&gt; parameters (which generate form fields) and bookmark the resulting URL. It’s an incredibly powerful way to build custom interfaces for exploring data.&lt;/p&gt;
&lt;p&gt;The rest of this post will describe how I pulled the data from Twitter and turned it into a SQLite database for publication with Datasette.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Fetching_my_followers_20"&gt;&lt;/a&gt;Fetching my followers&lt;/h3&gt;
&lt;p&gt;To work with the Twitter API, we first need credentials. Twitter still mostly uses the OAuth 1 model of authentication which is infuriatingly complicated, requiring you to sign parameters using two pairs of keys and secrets. OAuth 2 mostly uses a single access token sent over TLS to avoid the signing pain, but Twitter’s API dates back to the times when API client libraries with robust TLS were not a safe assumption.&lt;/p&gt;
&lt;p&gt;Since I have to re-figure out the Twitter API every few years, here’s how I got it working this time. I created a new Twitter app using the form on &lt;a href="https://apps.twitter.com/"&gt;https://apps.twitter.com/&lt;/a&gt; (which is surprisingly hard to find if you start out on the &lt;a href="https://developer.twitter.com/"&gt;https://developer.twitter.com/&lt;/a&gt; portal). Having created the app I navigated to the “Keys and Access Tokens” tab, scrolled down and clicked the “Create my access token” button. Then I grabbed the four magic tokens from the following spots on the page:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2018/twitter-api-keys.png" alt="Twitter application setup" style="max-width: 100%" /&gt;&lt;/p&gt;
&lt;p&gt;Now in Python I can make properly signed calls to the Twitter API like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from requests_oauthlib import OAuth1Session
twitter = OAuth1Session(
    client_key='...',
    client_secret='...',
    resource_owner_key='...',
    resource_owner_secret='...'
)
print(twitter.get(
    'https://api.twitter.com/1.1/users/show.json?screen_name=simonw'
).json())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Twitter API has an endpoint for retrieving everyone who follows an account as a paginated JSON list: &lt;a href="https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-followers-list"&gt;&lt;code&gt;followers/list&lt;/code&gt;&lt;/a&gt;. At some point in the past few years Twitter got &lt;em&gt;really stingy&lt;/em&gt; with &lt;a href=""&gt;their rate limits&lt;/a&gt; - most endpoints, including &lt;code&gt;followers/list&lt;/code&gt; only allow 15 requests every 15 minutes! You can request up to 200 followers at a time, but with 15,000 followers that meant the full fetch would take 75 minutes. So I set the following running in a Jupyter notebook and went for a walk with &lt;a href="https://twitter.com/cleopaws"&gt;the dog&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from requests_oauthlib import OAuth1Session
import urllib.parse
import time

twitter = OAuth1Session(...)
url = 'https://api.twitter.com/1.1/followers/list.json'

def fetch_followers(cursor=-1):
    r = twitter.get(url + '?'+ urllib.parse.urlencode({
        'count': 200,
        'cursor': cursor
    }))
    return r.headers, r.json()

cursor = -1
users = []
while cursor:
    headers, body = fetch_followers(cursor)
    print(headers)
    users.extend(body['users'])
    print(len(users))
    cursor = body['next_cursor']
    time.sleep(70)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A couple of hours later I had a &lt;code&gt;users&lt;/code&gt; list with 15,281 user dictionaries in it. I wrote that to disk for safe keeping:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import json
json.dump(users, open('twitter-followers.json', 'w'), indent=4)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Converting_that_JSON_into_a_SQLite_database_72"&gt;&lt;/a&gt;Converting that JSON into a SQLite database&lt;/h3&gt;
&lt;p&gt;I wrote some notes on &lt;a href="https://gist.github.com/simonw/eb5ad8e55d75bbc3003dd9e5d6eb438b"&gt;How to turn a list of JSON objects into a Datasette&lt;/a&gt; using Pandas a few weeks ago. This works really well, but we need to do a bit of cleanup first: Pandas prefers a list of flat dictionaries, but the Twitter API has given us back some nested structures.&lt;/p&gt;
&lt;p&gt;I won’t do a line-by-line breakdown of it, but here’s the code I ended up using. The &lt;code&gt;expand_entities()&lt;/code&gt; function replaces Twitter’s ugly &lt;code&gt;t.co&lt;/code&gt; links with their expanded &lt;code&gt;display_url&lt;/code&gt; alternatives - then &lt;code&gt;clean_user()&lt;/code&gt; flattens a nested user into a simple dictionary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def expand_entities(s, entities):
    for key, ents in entities.items():
        for ent in ents:
            if 'url' in ent:
                replacement = ent['expanded_url'] or ent['url']
                s = s.replace(ent['url'], replacement)
    return s

def clean_user(user):
    if user['description'] and 'description' in user['entities']:
        user['description'] = expand_entities(
            user['description'], user['entities']['description']
        )
    if user['url'] and 'url' in user['entities']:
        user['url'] = expand_entities(user['url'], user['entities']['url'])
    if 'entities' in user:
        del user['entities']
    if 'status' in user:
        del user['status']

for user in users:
    clean_user(user):
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I now have a nice flat list of users dictionaries - a subset of which is &lt;a href="https://gist.github.com/simonw/f0206b1b9b70025b37d47c9e1b6712a8"&gt;provided here for illustration&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One additional step: SQLite’s built-in functions for handling date and time prefer ISO formatted timestamps, but previewing the DataFrame in Jupyter shows that the data I pulled from Twitter has dates in a different format altogether. I can fix this with a one-liner using the ever-handy &lt;a href="https://dateutil.readthedocs.io/en/stable/"&gt;dateutil&lt;/a&gt; library:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from dateutil.parser import parse
import pandas as pd
df = pd.DataFrame(users)
df['created_at'] = df['created_at'].apply(lambda s: parse(s).isoformat())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here’s the before and after:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2018/dataframe-apply-twitter.png" alt="df.apply() illustrated" style="max-width: 100%" /&gt;&lt;/p&gt;
&lt;p&gt;Now that the list contains just simple dictionaries, I can load it into a Pandas DataFrame and convert it to a SQLite table like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import sqlite3
conn = sqlite3.connect('/tmp/followers.db')
df.to_sql('followers', conn)
conn.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can run &lt;code&gt;datasette /tmp/followers.db&lt;/code&gt; to preview what I’ve got so far.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Extracting_columns_and_setting_up_fulltext_search_121"&gt;&lt;/a&gt;Extracting columns and setting up full-text search&lt;/h3&gt;
&lt;p&gt;This all works fine, but it’s not quite the finished product I demonstrated above. My desired final state has two additional features: common values in the &lt;code&gt;lang&lt;/code&gt;, &lt;code&gt;location&lt;/code&gt;, &lt;code&gt;time_zone&lt;/code&gt; and &lt;code&gt;translator_type&lt;/code&gt; columns have been pulled out into lookup tables, and I’ve enabled SQLite full-text search against a subset of the columns.&lt;/p&gt;
&lt;p&gt;Normally I would use the &lt;code&gt;-c&lt;/code&gt; and &lt;code&gt;-f&lt;/code&gt; arguments to my &lt;a href="https://github.com/simonw/csvs-to-sqlite"&gt;csvs-to-sqlite&lt;/a&gt; tool to do this (see &lt;a href="https://simonwillison.net/2017/Nov/25/new-in-datasette/#Using_csvstosqlite_to_build_foreign_key_tables_51"&gt;my write-up here&lt;/a&gt;), but that tool only works against CSV files on disk. I want to work with an in-memory Pandas DataFrame.&lt;/p&gt;
&lt;p&gt;So I reverse-engineered my own code and figured out how to apply the same transformations from an interactive Python prompt instead. It ended up looking like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from csvs_to_sqlite import utils

conn = sqlite3.connect('/tmp/simonw-twitter-followers.db')

# Define columns I want to refactor:
foreign_keys = {
    'time_zone': ('time_zone', 'value'),
    'translator_type': ('translator_type', 'value'),
    'location': ('location', 'value'),
    'lang': ('lang', 'value'),
}
new_frames = utils.refactor_dataframes(conn, [df], foreign_keys)

# Save my refactored DataFrame to SQLite
utils.to_sql_with_foreign_keys(
    conn, new_frames[0], 'followers',
    foreign_keys, None, index_fks=True
)

# Create the full-text search index across these columns:
fts = ['screen_name', 'description', 'name', 'location']
utils.generate_and_populate_fts(conn, ['followers'], fts, foreign_keys)

conn.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Final_step_publishing_with_Datasette_154"&gt;&lt;/a&gt;Final step: publishing with Datasette&lt;/h3&gt;
&lt;p&gt;Having run &lt;code&gt;datasette /tmp/simonw-twitter-followers.db&lt;/code&gt; to confirm locally that I got the results I was looking for, the last step was to publish it to the internet. As always, I used &lt;a href="https://zeit.co/now"&gt;Zeit Now&lt;/a&gt; via the &lt;code&gt;datasette publish&lt;/code&gt; command for this final step:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tmp $ datasette publish now simonw-twitter-followers.db \
    --title=&amp;quot;@simonw Twitter followers, 27 Jan 2018&amp;quot;
&amp;gt; Deploying /private/var/.../datasette under simonw
&amp;gt; Ready! https://datasette-cmpznehuku.now.sh (copied to clipboard) [14s]
&amp;gt; Synced 2 files (11.29MB) [0ms] 
&amp;gt; Initializing…
&amp;gt; Building
&amp;gt; ▲ docker build
Sending build context to Docker daemon 11.85 MBkB
&amp;gt; Step 1 : FROM python:3
...
&amp;gt; Deployment complete!
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I ran &lt;code&gt;new alias&lt;/code&gt; to assign a permanent, more memorable URL:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; now alias https://datasette-cmpznehuku.now.sh simonw-twitter-followers.now.sh
&lt;/code&gt;&lt;/pre&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="pandas"/><category term="projects"/><category term="sqlite"/><category term="twitter"/><category term="datasette"/></entry><entry><title>How to turn a list of JSON objects into a Datasette</title><link href="https://simonwillison.net/2018/Jan/20/datasette-json/#atom-tag" rel="alternate"/><published>2018-01-20T01:07:09+00:00</published><updated>2018-01-20T01:07:09+00:00</updated><id>https://simonwillison.net/2018/Jan/20/datasette-json/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://gist.github.com/simonw/eb5ad8e55d75bbc3003dd9e5d6eb438b"&gt;How to turn a list of JSON objects into a Datasette&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;



</summary><category term="json"/><category term="pandas"/><category term="sqlite"/><category term="datasette"/></entry><entry><title>Big Data Workflow with Pandas and SQLite</title><link href="https://simonwillison.net/2017/Nov/28/big-data-workflow-with-pandas-and-sqlite/#atom-tag" rel="alternate"/><published>2017-11-28T23:02:50+00:00</published><updated>2017-11-28T23:02:50+00:00</updated><id>https://simonwillison.net/2017/Nov/28/big-data-workflow-with-pandas-and-sqlite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://plot.ly/python/big-data-analytics-with-pandas-and-sqlite/"&gt;Big Data Workflow with Pandas and SQLite&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/palewire/status/935642068461826049"&gt;Ben Welsh&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/csv"&gt;csv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="csv"/><category term="pandas"/><category term="sqlite"/></entry><entry><title>Exploring Line Lengths in Python Packages</title><link href="https://simonwillison.net/2017/Nov/10/exploring-line-lengths-in-python-packages/#atom-tag" rel="alternate"/><published>2017-11-10T15:34:29+00:00</published><updated>2017-11-10T15:34:29+00:00</updated><id>https://simonwillison.net/2017/Nov/10/exploring-line-lengths-in-python-packages/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://jakevdp.github.io/blog/2017/11/09/exploring-line-lengths-in-python-packages/"&gt;Exploring Line Lengths in Python Packages&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualization"&gt;visualization&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="python"/><category term="visualization"/><category term="jupyter"/></entry><entry><title>A Minimalist Guide to SQLite</title><link href="https://simonwillison.net/2017/Nov/2/a-minimalist-guide-to-sqlite/#atom-tag" rel="alternate"/><published>2017-11-02T01:23:17+00:00</published><updated>2017-11-02T01:23:17+00:00</updated><id>https://simonwillison.net/2017/Nov/2/a-minimalist-guide-to-sqlite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://tech.marksblogg.com/sqlite3-tutorial-and-guide.html"&gt;A Minimalist Guide to SQLite&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="python"/><category term="sqlite"/><category term="jupyter"/></entry><entry><title>Exploring United States Policing Data Using Python</title><link href="https://simonwillison.net/2017/Oct/29/exploring-united-states-policing-data-using-python/#atom-tag" rel="alternate"/><published>2017-10-29T16:58:36+00:00</published><updated>2017-10-29T16:58:36+00:00</updated><id>https://simonwillison.net/2017/Oct/29/exploring-united-states-policing-data-using-python/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.patricktriest.com/police-data-python/"&gt;Exploring United States Policing Data Using Python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Outstanding introduction to data analysis with Jupyter and Pandas.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-data"&gt;open-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-data"/><category term="pandas"/><category term="python"/><category term="jupyter"/></entry><entry><title>Streaming Dataframes</title><link href="https://simonwillison.net/2017/Oct/19/streaming/#atom-tag" rel="alternate"/><published>2017-10-19T14:25:09+00:00</published><updated>2017-10-19T14:25:09+00:00</updated><id>https://simonwillison.net/2017/Oct/19/streaming/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://matthewrocklin.com/blog/work/2017/10/16/streaming-dataframes-1"&gt;Streaming Dataframes&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is some deep and brilliant magic: Matthew Rocklin’s Streamz Python library provides some elegant abstractions for consuming infinite streams of data and calculating cumulative averages and rolling reductions... and now he’s added an integration with jupyter that lets you embed bokeh graphs and pandas dataframe tables that continue to update in realtime as the stream continues! Check out the animated screenshots, this really is a phenomenal piece of work.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="jupyter"/></entry><entry><title>PyPy v5.9 Released, Now Supports Pandas, NumPy</title><link href="https://simonwillison.net/2017/Oct/5/pypy/#atom-tag" rel="alternate"/><published>2017-10-05T16:58:44+00:00</published><updated>2017-10-05T16:58:44+00:00</updated><id>https://simonwillison.net/2017/Oct/5/pypy/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://morepypy.blogspot.com/2017/10/pypy-v59-released-now-supports-pandas.html"&gt;PyPy v5.9 Released, Now Supports Pandas, NumPy&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
NumPy and Pandas now work on PyPy2.7. “Many other modules based on C-API extensions work on PyPy as well.”


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pypy"&gt;pypy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/numpy"&gt;numpy&lt;/a&gt;&lt;/p&gt;



</summary><category term="pandas"/><category term="pypy"/><category term="numpy"/></entry><entry><title>Generating interactive HTML charts from Python?</title><link href="https://simonwillison.net/2016/Nov/25/generating-interactive-html-charts-from/#atom-tag" rel="alternate"/><published>2016-11-25T20:05:00+00:00</published><updated>2016-11-25T20:05:00+00:00</updated><id>https://simonwillison.net/2016/Nov/25/generating-interactive-html-charts-from/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="http://ask.metafilter.com/303041/Generating-interactive-HTML-charts-from-Python#4388982"&gt;Generating interactive HTML charts from Python?&lt;/a&gt; on Ask MetaFilter&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;D3 is absolutely amazing but the learning curve is a bit steep. Totally worth the effort to learn it in the long run, but it's not so useful if you want to get something done quickly.&lt;/p&gt;
&lt;p&gt;I've used &lt;a href="http://nvd3.org"&gt;NVD3&lt;/a&gt; successfully in the past - it's another high level library on top of D3. Much faster to get results than D3 on its own.&lt;/p&gt;
&lt;p&gt;From your description it sounds like you should also check out &lt;a href="http://square.github.io/crossfilter/"&gt;crossfilter&lt;/a&gt; which, for the right use-cases, is phenomenal.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ask-metafilter"&gt;ask-metafilter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bokeh"&gt;bokeh&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualization"&gt;visualization&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plotly"&gt;plotly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datavis"&gt;datavis&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ask-metafilter"/><category term="bokeh"/><category term="pandas"/><category term="python"/><category term="visualization"/><category term="plotly"/><category term="datavis"/></entry><entry><title>Panda Tuesday; The History of the Panda, New APIs, Explore and You</title><link href="https://simonwillison.net/2009/Mar/4/pandas/#atom-tag" rel="alternate"/><published>2009-03-04T11:49:21+00:00</published><updated>2009-03-04T11:49:21+00:00</updated><id>https://simonwillison.net/2009/Mar/4/pandas/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://code.flickr.com/blog/2009/03/03/panda-tuesday-the-history-of-the-panda-new-apis-explore-and-you/"&gt;Panda Tuesday; The History of the Panda, New APIs, Explore and You&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Flickr’s Rainbow Vomiting Panda of Awesomeness now has a family of associated APIs.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/flickr"&gt;flickr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pandas"&gt;pandas&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="flickr"/><category term="pandas"/></entry></feed>