<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: facetedsearch</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/facetedsearch.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2018-05-20T23:42:18+00:00</updated><author><name>Simon Willison</name></author><entry><title>Datasette Facets</title><link href="https://simonwillison.net/2018/May/20/datasette-facets/#atom-tag" rel="alternate"/><published>2018-05-20T23:42:18+00:00</published><updated>2018-05-20T23:42:18+00:00</updated><id>https://simonwillison.net/2018/May/20/datasette-facets/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://github.com/simonw/datasette/releases/tag/0.22"&gt;Datasette 0.22&lt;/a&gt; is out with the most significant new feature I’ve added since the initial release: &lt;strong&gt;faceted browse&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette"&gt;Datasette&lt;/a&gt; lets you deploy an instant web UI and JSON API for any SQLite database. &lt;a href="https://github.com/simonw/csvs-to-sqlite"&gt;csvs-to-sqlite&lt;/a&gt; makes it easy to create a SQLite database out of any collection of CSV files. &lt;a href="https://publish.datasettes.com/"&gt;Datasette Publish&lt;/a&gt; is a web app that can run these combined tools against CSV files you upload from your browser. And now the new &lt;a href="https://datasette.readthedocs.io/en/latest/facets.html"&gt;Datasette Facets&lt;/a&gt; feature lets you explore any CSV file using faceted navigation with a couple of clicks.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Exploring_characters_from_Marvel_comics_6"&gt;&lt;/a&gt;Exploring characters from Marvel comics&lt;/h3&gt;
&lt;p&gt;Let’s use facets to explore every character in the Marvel Universe.&lt;/p&gt;
&lt;p&gt;FiveThirtyEight have published &lt;a href="https://github.com/fivethirtyeight/data/tree/master/comic-characters"&gt;a CSV file&lt;/a&gt; of 16,376 characters from Marvel comics, scraped from Wikia as part of the research for their 2014 story &lt;a href="https://fivethirtyeight.com/features/women-in-comic-books/"&gt;Comic Books Are Still Made By Men, For Men And About Men&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s that CSV file &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data"&gt;loaded into the latest version of Datasette&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/marvel-facets.gif" alt="Marvel characters explored using Datasette Facets" /&gt;&lt;/p&gt;
&lt;p&gt;We start by applying the &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data?_facet=ID&amp;amp;_facet=ALIGN&amp;amp;_facet=SEX"&gt;identity status, alignment and gender facets&lt;/a&gt;. Then we filter down to just the &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data?_facet=ID&amp;amp;_facet=ALIGN&amp;amp;_facet=SEX&amp;amp;ID=Public+Identity&amp;amp;ALIGN=Bad+Characters"&gt;bad characters with a public identity&lt;/a&gt;, and apply the &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data?_facet=ID&amp;amp;_facet=ALIGN&amp;amp;_facet=SEX&amp;amp;ID=Public+Identity&amp;amp;ALIGN=Bad+Characters&amp;amp;_facet=EYE#facet-EYE"&gt;eye colour facet&lt;/a&gt;. Now we can filter to just the &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data?_facet=ID&amp;amp;_facet=ALIGN&amp;amp;_facet=SEX&amp;amp;ID=Public+Identity&amp;amp;ALIGN=Bad+Characters&amp;amp;_facet=EYE&amp;amp;EYE=Yellow+Eyes"&gt;20 bad characters with a public identity and yellow eyes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At each stage along the way we could see numerical summaries of the other facets. That’s a pretty sophisticated piece of analysis we’ve been able to run with just a few clicks (and it works responsively on mobile as well).&lt;/p&gt;
&lt;p&gt;I’ve published a full copy of everything else in the FiveThirtyEight data repository, which means you can find plenty more examples of facets in action at &lt;a href="https://fivethirtyeight.datasettes.com/"&gt;https://fivethirtyeight.datasettes.com/&lt;/a&gt; - one example: &lt;a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/antiquities-act%2Factions_under_antiquities_act?_facet=current_agency&amp;amp;_facet=states&amp;amp;_facet=action&amp;amp;_facet=pres_or_congress"&gt;Actions under the Antiquities Act&lt;/a&gt;, faceted by states, pres_or_congress, action and current_agency.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Analyzing_GSA_IT_Standards_with_Datasette_Publish_22"&gt;&lt;/a&gt;Analyzing GSA IT Standards with Datasette Publish&lt;/h3&gt;
&lt;p&gt;The US government’s General Services Administration have a GitHub account, and they use it to publish &lt;a href="https://github.com/GSA/data"&gt;a repository of assorted data&lt;/a&gt; as CSVs.&lt;/p&gt;
&lt;p&gt;Let’s take one of those CSVS and analyze it with Datasette Facets, using the &lt;a href="https://publish.datasettes.com/"&gt;Datasette Publish&lt;/a&gt; web app to upload and process the CSV.&lt;/p&gt;
&lt;p&gt;We’ll start with the &lt;a href="https://github.com/GSA/data/blob/master/enterprise-architecture/it-standards.csv"&gt;it-standards.csv&lt;/a&gt; file, downloaded from their repo. We’ll upload it to Datasette Publish and add some associated metadata:&lt;/p&gt;
&lt;p&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/gsa-facets.gif" alt="GSA IT standards published using Datasette Publish" /&gt;&lt;/p&gt;
&lt;p&gt;Here’s the result, with &lt;a href="https://datasette-tcuhqhqpud.now.sh/csv-data-41a17b5/it-standards?_facet=Status&amp;amp;_facet=Deployment+Type"&gt;the Status and Deployment Type facets applied&lt;/a&gt;. And here’s a query showing just &lt;a href="https://datasette-tcuhqhqpud.now.sh/csv-data-41a17b5/it-standards?_facet=Status&amp;amp;_facet=Deployment+Type&amp;amp;Deployment+Type=SaaS&amp;amp;Status=Approved+-+Preferred"&gt;SaaS tools with status Approved - Preferred&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="European_Power_Stations_34"&gt;&lt;/a&gt;European Power Stations&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://open-power-system-data.org/"&gt;Open Power System Data project&lt;/a&gt; publishes data about electricity systems. They publish data in a number of formats, including SQLite databases. Let’s take their &lt;a href="https://data.open-power-system-data.org/conventional_power_plants/"&gt;conventional_power_plants.sqlite file&lt;/a&gt; and explore it with Datasette. With Datasette installed, run the following commands in your terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wget https://data.open-power-system-data.org/conventional_power_plants/2018-02-27/conventional_power_plants.sqlite
datasette conventional_power_plants.sqlite
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will start Datasette running at &lt;code&gt;http://127.0.0.1:8001/&lt;/code&gt; ready for you to explore the data.&lt;/p&gt;
&lt;p&gt;Next we can publish the SQLite database directly to the internet using the &lt;code&gt;datasette publish&lt;/code&gt; command-line tool:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ datasette publish now conventional_power_plants.sqlite \
    --source=&amp;quot;Open Power System Data. 2018. Data Package Conventional power plants. Version 2018-02-27&amp;quot; \
    --source_url=&amp;quot;https://data.open-power-system-data.org/conventional_power_plants/2018-02-27/&amp;quot; \
    --title=&amp;quot;Conventional power plants&amp;quot; \
    --branch=master
&amp;gt; Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpufvxrzgp/datasette under simonw
&amp;gt; https://datasette-tgngfjddix.now.sh [in clipboard] (sfo1) [11s]
&amp;gt; Synced 3 files (1.28MB) [11s]
&amp;gt; Building…
&amp;gt; ▲ docker build
Sending build context to Docker daemon 1.343 MBkB
&amp;gt; Step 1/7 : FROM python:3
&amp;gt; 3: Pulling from library/python
&amp;gt; 3d77ce4481b1: Already exists
&amp;gt; 534514c83d69: Already exists
...
&amp;gt; Successfully built da7ac223e8aa
&amp;gt; Successfully tagged registry.now.systems/now/3d6d318f0da06d3ea1bc97417c7dc484aaac9026:latest
&amp;gt; ▲ Storing image
&amp;gt; Build completed
&amp;gt; Verifying instantiation in sfo1
&amp;gt; [0] Serve! files=('conventional_power_plants.sqlite',) on port 8001
&amp;gt; [0] [2018-05-20 22:51:51 +0000] [1] [INFO] Goin' Fast @ http://0.0.0.0:8001
&amp;gt; [0] [2018-05-20 22:51:51 +0000] [1] [INFO] Starting worker [1]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, let’s give it a nicer URL using &lt;code&gt;now alias&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; now alias https://datasette-tgngfjddix.now.sh conventional-power-plants.now.sh
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result can now be seen at &lt;a href="https://conventional-power-plants.now.sh/"&gt;https://conventional-power-plants.now.sh/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here’s every conventional power plant in Europe &lt;a href="https://conventional-power-plants.now.sh/conventional_power_plants-e3c301c/conventional_power_plants_EU?_facet=country&amp;amp;_facet=energy_source&amp;amp;_facet=technology"&gt;faceted by country, energy source and technology&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/power-plant-facets.png" alt="Power Plant Facets" /&gt;&lt;/p&gt;
&lt;h3&gt;&lt;a id="Implementation_notes_80"&gt;&lt;/a&gt;Implementation notes&lt;/h3&gt;
&lt;p&gt;I love faceted search engines. One of my first approaches to understanding any new large dataset has long been to throw it into a faceted search engine and see what comes out. In the past I’ve built them using &lt;a href="https://simonwillison.net/tags/solr/"&gt;Solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/elasticsearch/"&gt;Elasticsearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/"&gt;PostgreSQL&lt;/a&gt; and even &lt;a href="https://whoosh.readthedocs.io/en/latest/facets.html"&gt;Whoosh&lt;/a&gt;. I guess it was inevitable that I’d try to build one with SQLite.&lt;/p&gt;
&lt;p&gt;You can follow the development of Datasette Facets in the now-closed &lt;a href="https://github.com/simonw/datasette/issues/255"&gt;issue #255&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;p&gt;Facets are requested by appending one or more &lt;code&gt;?_facet=colname&lt;/code&gt; parameters to the URL. This causes Datasette to run the following SQL query for each of those specified columns:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select colname as value, count(*) as count
from tablename where (current where clauses)
group by colname order by count desc limit 31
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For large tables, this could get expensive. Datasette supports time limits for SQLite queries, and facets are given up to 200ms (by default, this limit &lt;a href="https://datasette.readthedocs.io/en/latest/config.html#facet-time-limit-ms"&gt;can be customized&lt;/a&gt;) to finish executing. If the query doesn’t complete in the given time the user sees a warning that the facet could not be displayed.&lt;/p&gt;
&lt;p&gt;We ask for 31 values in the limit clause even though we only display 30. This lets us detect if there are more values available and show a &lt;code&gt;...&lt;/code&gt; indicator to let the user know that the facets were truncated.&lt;/p&gt;
&lt;p&gt;Datasette also suggests facets that you might want to apply. This is implemented using another query, this time run against every column that is not yet being used as a facet. If a table has 20 columns this means 20 queries, so they run with an even tighter 50ms time limit. The query looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select distinct colname
from tablename where (current where clauses)
limit 31
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All we are doing here is trying to determine if the column in question has less than 30 unique values. The limit clause here means that if you run this query against a column with entirely distinct values (the primary key for example) the query will terminate extremely quickly - after it has found just the first 31 values.&lt;/p&gt;
&lt;p&gt;Once the query has executed, we count the distinct values and check to see if this column, when used as a facet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Will return 30 or less unique options&lt;/li&gt;
&lt;li&gt;Will return more than one unique option&lt;/li&gt;
&lt;li&gt;Will return less unique options than the current total number of filtered rows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the query takes longer than 50ms we terminate it and do not suggest that column as a potential facet.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Facets_via_JSON_112"&gt;&lt;/a&gt;Facets via JSON&lt;/h3&gt;
&lt;p&gt;As with everything in Datasette, the facets you can view in your browser are also available as part of the JSON API (which ships with CORS headers so you can easily fetch data from JavaScript running in a browser on any web page).&lt;/p&gt;
&lt;p&gt;To get back JSON, add &lt;code&gt;.json&lt;/code&gt; to the path (before the first &lt;code&gt;?&lt;/code&gt;). Here’s that power plants example returned as JSON: &lt;a href="https://conventional-power-plants.now.sh/conventional_power_plants-e3c301c/conventional_power_plants_EU.json?_facet=country&amp;amp;_facet=energy_source&amp;amp;_facet=technology"&gt;https://conventional-power-plants.now.sh/conventional_power_plants-e3c301c/conventional_power_plants_EU.json?_facet=country&amp;amp;_facet=energy_source&amp;amp;_facet=technology&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Please &lt;a href="https://twitter.com/simonw"&gt;let me know&lt;/a&gt; if you build something interesting with Datasette Facets!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/facets"&gt;facets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facetedsearch"&gt;facetedsearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="facets"/><category term="projects"/><category term="facetedsearch"/><category term="datasette"/></entry><entry><title>Generate dynamic titles for /search/ pages</title><link href="https://simonwillison.net/2017/Oct/8/titles/#atom-tag" rel="alternate"/><published>2017-10-08T19:47:24+00:00</published><updated>2017-10-08T19:47:24+00:00</updated><id>https://simonwillison.net/2017/Oct/8/titles/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/simonwillisonblog/commit/995f3149312043d876f3b36e659e9b0d11520824"&gt;Generate dynamic titles for /search/ pages&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fun little enhancement to my faceted search implementation: I now generate dynamic titles for each search results page describing the search, e.g. “Blogmarks tagged security in Feb, 2005” or ““python” in quotations tagged ruby, python in 2007”.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facetedsearch"&gt;facetedsearch&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="facetedsearch"/></entry><entry><title>Facets</title><link href="https://simonwillison.net/2017/Oct/8/facets/#atom-tag" rel="alternate"/><published>2017-10-08T00:21:58+00:00</published><updated>2017-10-08T00:21:58+00:00</updated><id>https://simonwillison.net/2017/Oct/8/facets/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://pair-code.github.io/facets/"&gt;Facets&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New open source visualization and data exploration tool from Google (“Disclaimer: This is not an official Google product”, whatever that means). It’s intended for visualizing machine learning datasets but it’s obviously useful outside of ML as well—any time you need to understand a large dataset this looks like it could be extremely useful. Ships with example jupyter notebooks and an easy mechanism for embedding the Facets interactive UI directly inside a notebook cell.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualisation"&gt;visualisation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facetedsearch"&gt;facetedsearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="visualisation"/><category term="facetedsearch"/><category term="jupyter"/></entry><entry><title>Implementing faceted search with Django and PostgreSQL</title><link href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/#atom-tag" rel="alternate"/><published>2017-10-05T14:12:27+00:00</published><updated>2017-10-05T14:12:27+00:00</updated><id>https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/#atom-tag</id><summary type="html">
    &lt;p&gt;I’ve added &lt;a href="https://simonwillison.net/search/"&gt;a faceted search engine&lt;/a&gt; to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL"like" queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=postgresql"&gt;All content matching “postgresql”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=django&amp;amp;type=quotation"&gt;Just quotations matching “django”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/search/?q=python+javascript&amp;amp;tag=mozilla&amp;amp;year=2007"&gt;All content matching “python” and “javascript” with the tag “mozilla” posted in 2007&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also provides facet counts, so you can tell how many results you will get back before you apply one of these filters - and get a general feeling for the shape of the corpus as you navigate it.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2017/faceted-search.png" alt="Screenshot of my faceted search interface" style="width: 100%"/&gt;&lt;/p&gt;

&lt;p&gt;I love this kind of search interface, because the counts tell you so much more about the underlying data. Turns out I was &lt;a href="https://simonwillison.net/search/?q=javascript&amp;amp;type=quotation"&gt;most active in quoting people talking about JavaScript back in 2007&lt;/a&gt;, for example.&lt;/p&gt;
&lt;p&gt;I usually build faceted search engines using either &lt;a href="https://simonwillison.net/tags/solr/"&gt;Solr&lt;/a&gt; or &lt;a href="https://simonwillison.net/tags/solr/"&gt;Elasticsearch&lt;/a&gt; (though the first version of search on this blog was actually powered by &lt;a href="http://fallabs.com/hyperestraier/intro-en.html"&gt;Hyper Estraier&lt;/a&gt;) - but I’m hosting this blog as simply and inexpensively as possible on Heroku and I don’t want to shell out for a SaaS search solution or run an Elasticsearch instance somewhere myself. I thought I’d have to go back to using &lt;a href="https://developers.google.com/custom-search/"&gt;Google Custom Search&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I read &lt;a href="http://rachbelaid.com/postgres-full-text-search-is-good-enough/"&gt;Postgres full-text search is Good Enough!&lt;/a&gt; by Rachid Belaid - closely followed by &lt;a href="http://blog.lotech.org/postgres-full-text-search-with-django.html"&gt;Postgres Full-Text Search With Django&lt;/a&gt; by Nathan Shafer - and I decided to have a play with the new PostgreSQL search functionality that was &lt;a href="https://docs.djangoproject.com/en/1.11/releases/1.10/#full-text-search-for-postgresql"&gt;introduced in Django 1.10&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;… and wow! Full-text search is yet another example of a feature that’s been in PostgreSQL for &lt;a href="https://www.postgresql.org/docs/8.3/static/release-8-3.html"&gt;nearly a decade now&lt;/a&gt;, incrementally improving with every release to the point where it’s now  really, &lt;em&gt;really&lt;/em&gt; good.&lt;/p&gt;
&lt;p&gt;At its most basic level a search system needs to handle four things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It needs to take user input and find matching documents.&lt;/li&gt;
&lt;li&gt;It needs to understand and ignore stopwords (common words like “the” and “and”) and apply stemming - knowing that “ridicule” and “ridiculous” should be treated as the same root, for example. Both of these features need to be language-aware.&lt;/li&gt;
&lt;li&gt;It needs to be able to apply relevance ranking, calculating which documents are the best match for a search query.&lt;/li&gt;
&lt;li&gt;It needs to be &lt;em&gt;fast&lt;/em&gt; - working against some kind of index rather than scanning every available document in full.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Modern PostgreSQL &lt;a href="https://www.postgresql.org/docs/9.5/static/textsearch.html"&gt;ticks all of those boxes&lt;/a&gt;. Let’s put it to work.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Simple_search_without_an_index_29"&gt;&lt;/a&gt;Simple search without an index&lt;/h3&gt;
&lt;p&gt;Here’s how to execute a full-text search query against a simple text column:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; blog.models &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; Entry
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.search &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; SearchVector

results = Entry.objects.annotate(
    searchable=SearchVector(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;)
).filter(searchable=&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The generated SQL looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="hljs-operator"&gt;&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;, ...,
to_tsvector(&lt;span class="hljs-keyword"&gt;COALESCE&lt;/span&gt;(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"body"&lt;/span&gt;, %s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"searchable"&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; to_tsvector(&lt;span class="hljs-keyword"&gt;COALESCE&lt;/span&gt;(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"body"&lt;/span&gt;, &lt;span class="hljs-string"&gt;"django"&lt;/span&gt;))
    @@ (plainto_tsquery(&lt;span class="hljs-string"&gt;"django"&lt;/span&gt;)) = &lt;span class="hljs-literal"&gt;true&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;SearchVector&lt;/code&gt; class constructs a stemmed, stopword-removed representation of the &lt;code&gt;body&lt;/code&gt; column ready to be searched. The resulting queryset contains entries that are a match for “django”.&lt;/p&gt;
&lt;p&gt;My blog entries are stored as HTML, but I don’t want search to include those HTML tags. One (extremely un-performant) solution is to use Django’s &lt;code&gt;Func&lt;/code&gt; helper to apply a regular expression inside PostgreSQL to strip tags before they are considered for search:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.db.models &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; Value, F, Func

results = Entry.objects.annotate(
    searchable=SearchVector(
        Func(
            F(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'&amp;lt;.*?&amp;gt;'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;''&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'g'&lt;/span&gt;),
            function=&lt;span class="hljs-string"&gt;'regexp_replace'&lt;/span&gt;
        )
    )
).filter(searchable=&lt;span class="hljs-string"&gt;'http'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 6th October 8:23pm UTC&lt;/strong&gt; - it turns out this step is entirely unnecessary. &lt;a href="https://github.com/simonw/simonwillisonblog/issues/1#issuecomment-334770577"&gt;Paolo Melchiorre points out&lt;/a&gt; that the PostgreSQL ts_vector() function already handles tag removal. Sure enough, executing &lt;samp&gt;SELECT to_tsvector('&amp;lt;div&amp;gt;Hey look what happens to &amp;lt;blockquote&amp;gt;this tag&amp;lt;/blockquote&amp;gt;&amp;lt;/div&amp;gt;')&lt;/samp&gt; &lt;a href="http://sqlfiddle.com/#!17/9eecb/4552"&gt;using SQL Fiddle&lt;/a&gt; returns &lt;samp&gt;'happen':4 'hey':1 'look':2 'tag':7&lt;/samp&gt;, with the tags already stripped.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This works, but performance isn’t great. PostgreSQL ends up having to scan every row and construct a list of search vectors for each one every time you execute a query.&lt;/p&gt;
&lt;p&gt;If you want it to go fast, you need to add a special search vector column to your table and then create the appropriate index on it. As of Django 1.11 this is trivial:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.search &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; SearchVectorField
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.contrib.postgres.indexes &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; GinIndex

&lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Entry&lt;/span&gt;&lt;span class="hljs-params"&gt;(models.Model)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-comment"&gt;# ...&lt;/span&gt;
    search_document = SearchVectorField(null=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)

    &lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Meta&lt;/span&gt;:&lt;/span&gt;
        indexes = [
            GinIndex(fields=[&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;])
        ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Django’s migration system will automatically add both the field and the special &lt;a href="https://www.postgresql.org/docs/9.5/static/gin-intro.html"&gt;GIN index&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What’s trickier is populating that &lt;code&gt;search_document&lt;/code&gt; field. Django does not yet support a easy method to populate it directly in your initial INSERT call, instead recommending that you populated with a SQL UPDATE statement after the fact. Here is a one-liner that will populate the field for everything in that table (and strip tags at the same time):&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;strip_tags_func&lt;/span&gt;&lt;span class="hljs-params"&gt;(field)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; Func(
        F(field), Value(&lt;span class="hljs-string"&gt;'&amp;lt;.*?&amp;gt;'&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;''&lt;/span&gt;), Value(&lt;span class="hljs-string"&gt;'g'&lt;/span&gt;),
        function=&lt;span class="hljs-string"&gt;'regexp_replace'&lt;/span&gt;
    )
 
Entry.objects.update(
    search_document=(
        SearchVector(&lt;span class="hljs-string"&gt;'title'&lt;/span&gt;, weight=&lt;span class="hljs-string"&gt;'A'&lt;/span&gt;) +
        SearchVector(strip_tags_func(&lt;span class="hljs-string"&gt;'body'&lt;/span&gt;), weight=&lt;span class="hljs-string"&gt;'C'&lt;/span&gt;)
    )
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m using a neat feature of the &lt;code&gt;SearchVector&lt;/code&gt; class here: it can be concatenated together using the &lt;code&gt;+&lt;/code&gt; operator, and each component can be assigned a weight of &lt;code&gt;A&lt;/code&gt;, &lt;code&gt;B&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt; or &lt;code&gt;D&lt;/code&gt;. These weights affect ranking calculations later on.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Updates_using_signals"&gt;&lt;/a&gt;Updates using signals&lt;/h3&gt;
&lt;p&gt;We could just set this up to run periodically (as I did in my &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7e3a02178e3ca71c464ae68a3b68d70e5fa66692#diff-1cbcc518bc02f9495bba963e698143e0"&gt;initial implementation&lt;/a&gt;), but we can get better real-time results by ensuring this field gets updated automatically when the rest of the model is modified. Some people solve this with PostgreSQL triggers, but I’m still more comfortable handling this kind of thing in python code - so I opted to use Django’s &lt;a href="https://docs.djangoproject.com/en/1.11/topics/signals/"&gt;signals mechanism&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;Since I need to run search queries across three different types of blog content - Entries, Blogmarks and Quotations - I added a method to each model that returns the text fragments corresponding to each of the weight values. Here’s that method for my Quotation model:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-class"&gt;&lt;span class="hljs-keyword"&gt;class&lt;/span&gt; &lt;span class="hljs-title"&gt;Quotation&lt;/span&gt;&lt;span class="hljs-params"&gt;(models.Model)&lt;/span&gt;:&lt;/span&gt;
    quotation = models.TextField()
    source = models.CharField(max_length=&lt;span class="hljs-number"&gt;255&lt;/span&gt;)
    tags = models.ManyToManyField(Tag, blank=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)

    &lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;index_components&lt;/span&gt;&lt;span class="hljs-params"&gt;(self)&lt;/span&gt;:&lt;/span&gt;
        &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; {
            &lt;span class="hljs-string"&gt;'A'&lt;/span&gt;: self.quotation,
            &lt;span class="hljs-string"&gt;'B'&lt;/span&gt;: &lt;span class="hljs-string"&gt;' '&lt;/span&gt;.join(self.tags.values_list(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;, flat=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)),
            &lt;span class="hljs-string"&gt;'C'&lt;/span&gt;: self.source,
        }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, I’m including the tags that have been assigned to the quotation in the searchable document.&lt;/p&gt;
&lt;p&gt;Here are my signals - loaded once via an import statement in my blog application’s &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/apps.py"&gt;&lt;code&gt;AppConfig.ready()&lt;/code&gt; method&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-decorator"&gt;@receiver(post_save)&lt;/span&gt;
&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_save&lt;/span&gt;&lt;span class="hljs-params"&gt;(sender, **kwargs)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; &lt;span class="hljs-keyword"&gt;not&lt;/span&gt; issubclass(sender, BaseModel):
        &lt;span class="hljs-keyword"&gt;return&lt;/span&gt;
    transaction.on_commit(make_updater(kwargs[&lt;span class="hljs-string"&gt;'instance'&lt;/span&gt;]))

&lt;span class="hljs-decorator"&gt;@receiver(m2m_changed)&lt;/span&gt;
&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_m2m_changed&lt;/span&gt;&lt;span class="hljs-params"&gt;(sender, **kwargs)&lt;/span&gt;:&lt;/span&gt;
    instance = kwargs[&lt;span class="hljs-string"&gt;'instance'&lt;/span&gt;]
    model = kwargs[&lt;span class="hljs-string"&gt;'model'&lt;/span&gt;]
    &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; model &lt;span class="hljs-keyword"&gt;is&lt;/span&gt; Tag:
        transaction.on_commit(make_updater(instance))
    &lt;span class="hljs-keyword"&gt;elif&lt;/span&gt; isinstance(instance, Tag):
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; obj &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; model.objects.filter(pk__in=kwargs[&lt;span class="hljs-string"&gt;'pk_set'&lt;/span&gt;]):
            transaction.on_commit(make_updater(obj))

&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;make_updater&lt;/span&gt;&lt;span class="hljs-params"&gt;(instance)&lt;/span&gt;:&lt;/span&gt;
    components = instance.index_components()
    pk = instance.pk

    &lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;on_commit&lt;/span&gt;&lt;span class="hljs-params"&gt;()&lt;/span&gt;:&lt;/span&gt;
        search_vectors = []
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; weight, text &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; components.items():
            search_vectors.append(
                SearchVector(Value(text, output_field=models.TextField()), weight=weight)
            )
        instance.__class__.objects.filter(pk=pk).update(
            search_document=reduce(operator.add, search_vectors)
        )
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; on_commit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The full code can be &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/signals.py"&gt;found here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;on_save&lt;/code&gt; method is pretty straightforward - it checks if the model that was just saved has my &lt;code&gt;BaseModel&lt;/code&gt; as a base class, then it calls &lt;code&gt;make_updater&lt;/code&gt; to get a function to be executed by the &lt;code&gt;transaction.on_commit&lt;/code&gt; hook.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;on_m2m_changed&lt;/code&gt; handler is &lt;a href="https://docs.djangoproject.com/en/1.11/ref/signals/#m2m-changed"&gt;significantly more complicated&lt;/a&gt;. There are a number of scenarios in which this will be called - I’m reasonably confident that the idiom I use here will capture all of the modifications that should trigger a re-indexing operation.&lt;/p&gt;
&lt;p&gt;Running a search now looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;results = Entry.objects.filter(
    search_document=SearchQuery(&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We need one more thing though: we need to sort our search results by relevance. PostgreSQL has pretty good relevance built in, and sorting by the relevance score can be done by  applying a Django ORM annotation:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;query = SearchQuery(&lt;span class="hljs-string"&gt;'ibm'&lt;/span&gt;)

results = Entry.objects.filter(
    search_document=query
).annotate(
    rank=SearchRank(F(&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;), query)
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have basic full text search implemented against a single Django model, making use of a GIN index. This is lightning fast.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Searching_multiple_tables_using_querysetunion_192"&gt;&lt;/a&gt;Searching multiple tables using queryset.union()&lt;/h3&gt;
&lt;p&gt;My site has three types of content, represented in three different models and hence three different underlying database tables.&lt;/p&gt;
&lt;p&gt;I’m using &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/models.py#L78-L107"&gt;an abstract base model&lt;/a&gt; to define common fields shared by all three: the created date, the slug (used to construct permalink urls) and the &lt;code&gt;search_document&lt;/code&gt; field populated above.&lt;/p&gt;
&lt;p&gt;As of Django 1.11 It’s possible to combine queries across different tables &lt;a href="https://docs.djangoproject.com/en/1.11/releases/1.11/#models"&gt;using the SQL union operator&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s  what that looks like for running a search across three tables, all with the same &lt;code&gt;search_document&lt;/code&gt; search vector field. I need to use &lt;code&gt;.values()&lt;/code&gt; to restrict the querysets I am unioning to the same subset of fields:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;query = SearchQuery(&lt;span class="hljs-string"&gt;'django'&lt;/span&gt;)
rank_annotation = SearchRank(F(&lt;span class="hljs-string"&gt;'search_document'&lt;/span&gt;), query)
qs = Blogmark.objects.annotate(
    rank=rank_annotation,
).filter(
    search_document=query
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).union(
    Entry.objects.annotate(
        rank=rank_annotation,
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
    Quotation.objects.annotate(
        rank=rank_annotation,
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)[:&lt;span class="hljs-number"&gt;5&lt;/span&gt;]

&lt;span class="hljs-comment"&gt;# Output&lt;/span&gt;
&amp;lt;QuerySet [
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;186&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.875179&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2008&lt;/span&gt;, &lt;span class="hljs-number"&gt;4&lt;/span&gt;, &lt;span class="hljs-number"&gt;8&lt;/span&gt;, &lt;span class="hljs-number"&gt;13&lt;/span&gt;, &lt;span class="hljs-number"&gt;48&lt;/span&gt;, &lt;span class="hljs-number"&gt;18&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;134&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.842655&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2007&lt;/span&gt;, &lt;span class="hljs-number"&gt;10&lt;/span&gt;, &lt;span class="hljs-number"&gt;20&lt;/span&gt;, &lt;span class="hljs-number"&gt;13&lt;/span&gt;, &lt;span class="hljs-number"&gt;46&lt;/span&gt;, &lt;span class="hljs-number"&gt;56&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;1591&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.804502&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2009&lt;/span&gt;, &lt;span class="hljs-number"&gt;9&lt;/span&gt;, &lt;span class="hljs-number"&gt;28&lt;/span&gt;, &lt;span class="hljs-number"&gt;23&lt;/span&gt;, &lt;span class="hljs-number"&gt;32&lt;/span&gt;, &lt;span class="hljs-number"&gt;4&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;5093&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.788616&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2010&lt;/span&gt;, &lt;span class="hljs-number"&gt;2&lt;/span&gt;, &lt;span class="hljs-number"&gt;26&lt;/span&gt;, &lt;span class="hljs-number"&gt;19&lt;/span&gt;, &lt;span class="hljs-number"&gt;22&lt;/span&gt;, &lt;span class="hljs-number"&gt;47&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;2598&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.786928&lt;/span&gt;, &lt;span class="hljs-string"&gt;'created'&lt;/span&gt;: datetime.datetime(&lt;span class="hljs-number"&gt;2007&lt;/span&gt;, &lt;span class="hljs-number"&gt;1&lt;/span&gt;, &lt;span class="hljs-number"&gt;26&lt;/span&gt;, &lt;span class="hljs-number"&gt;12&lt;/span&gt;, &lt;span class="hljs-number"&gt;38&lt;/span&gt;, &lt;span class="hljs-number"&gt;46&lt;/span&gt;, tzinfo=&amp;lt;UTC&amp;gt;)}
]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is not enough information though - I have the primary keys, but I don’t know which type of model they belong to. In order to retrieve the actual resulting objects from the database I need to know which type of content is represented by each of those results.&lt;/p&gt;
&lt;p&gt;I can achieve that using another annotation:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;qs = Blogmark.objects.annotate(
    rank=rank_annotation,
    type=models.Value(&lt;span class="hljs-string"&gt;'blogmark'&lt;/span&gt;, output_field=models.CharField())
).filter(
    search_document=query
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).union(
    Entry.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
    Quotation.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;),
).order_by(&lt;span class="hljs-string"&gt;'-rank'&lt;/span&gt;)[:&lt;span class="hljs-number"&gt;5&lt;/span&gt;]

&lt;span class="hljs-comment"&gt;# Output:&lt;/span&gt;
&amp;lt;QuerySet [
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;186&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'quotation'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.875179&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;134&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'quotation'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.842655&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;1591&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'entry'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.804502&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;5093&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'blogmark'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.788616&lt;/span&gt;},
    {&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;: &lt;span class="hljs-number"&gt;2598&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;: &lt;span class="hljs-string"&gt;u'blogmark'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;: &lt;span class="hljs-number"&gt;0.786928&lt;/span&gt;}
]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I just need to write function which can take a list of types and primary keys and return the full objects from the database:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;load_mixed_objects&lt;/span&gt;&lt;span class="hljs-params"&gt;(dicts)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-string"&gt;"""
    Takes a list of dictionaries, each of which must at least have a 'type'
    and a 'pk' key. Returns a list of ORM objects of those various types.
    Each returned ORM object has a .original_dict attribute populated.
    """&lt;/span&gt;
    to_fetch = {}
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; d &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; dicts:
        to_fetch.setdefault(d[&lt;span class="hljs-string"&gt;'type'&lt;/span&gt;], set()).add(d[&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;])
    fetched = {}
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; key, model &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; (
        (&lt;span class="hljs-string"&gt;'blogmark'&lt;/span&gt;, Blogmark),
        (&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;, Entry),
        (&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, Quotation),
    ):
        ids = to_fetch.get(key) &lt;span class="hljs-keyword"&gt;or&lt;/span&gt; []
        objects = model.objects.prefetch_related(&lt;span class="hljs-string"&gt;'tags'&lt;/span&gt;).filter(pk__in=ids)
        &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; obj &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; objects:
            fetched[(key, obj.pk)] = obj
    &lt;span class="hljs-comment"&gt;# Build list in same order as dicts argument&lt;/span&gt;
    to_return = []
    &lt;span class="hljs-keyword"&gt;for&lt;/span&gt; d &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; dicts:
        item = fetched.get((d[&lt;span class="hljs-string"&gt;'type'&lt;/span&gt;], d[&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;])) &lt;span class="hljs-keyword"&gt;or&lt;/span&gt; &lt;span class="hljs-keyword"&gt;None&lt;/span&gt;
        &lt;span class="hljs-keyword"&gt;if&lt;/span&gt; item:
            item.original_dict = d
        to_return.append(item)
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; to_return
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One last challenge: when I add filtering by type, I’m going to want to selectively union together only a subset of these querysets. I need a queryset to start unions against, but I don’t yet know which queryset I will be using. I can abuse Django’s &lt;code&gt;queryset.none()&lt;/code&gt; method to crate an empty &lt;code&gt;ValuesQuerySet&lt;/code&gt; in the correct shape like this&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;qs = Entry.objects.annotate(
    type=models.Value(&lt;span class="hljs-string"&gt;'empty'&lt;/span&gt;, output_field=models.CharField()),
    rank=rank_annotation
).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;).none()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can progressively build up my union in a loop like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; klass &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; (Entry, Blogmark, Quotation):
    qs = qs.union(klass.objects.annotate(
        rank=rank_annotation,
        type=models.Value(&lt;span class="hljs-string"&gt;'quotation'&lt;/span&gt;, output_field=models.CharField())
    ).filter(
        search_document=query
    ).values(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'type'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'rank'&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Django ORM is smart enough to compile away the empty queryset when it constructs the SQL, which ends up looking something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;(((&lt;span class="hljs-operator"&gt;&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
            &lt;span class="hljs-string"&gt;"entry"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
            ts_rank(&lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
     &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_entry"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;))
 &lt;span class="hljs-keyword"&gt;UNION&lt;/span&gt;
   (&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
           &lt;span class="hljs-string"&gt;"blogmark"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
           ts_rank(&lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_blogmark"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;))
&lt;span class="hljs-keyword"&gt;UNION&lt;/span&gt;
  (&lt;span class="hljs-keyword"&gt;SELECT&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"id"&lt;/span&gt;,
          &lt;span class="hljs-string"&gt;"quotation"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"type"&lt;/span&gt;,
          ts_rank(&lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt;, plainto_tsquery(%s)) &lt;span class="hljs-keyword"&gt;AS&lt;/span&gt; &lt;span class="hljs-string"&gt;"rank"&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;FROM&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;WHERE&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"search_document"&lt;/span&gt; @@ (plainto_tsquery(%s)) = &lt;span class="hljs-literal"&gt;TRUE&lt;/span&gt;
   &lt;span class="hljs-keyword"&gt;ORDER&lt;/span&gt; &lt;span class="hljs-keyword"&gt;BY&lt;/span&gt; &lt;span class="hljs-string"&gt;"blog_quotation"&lt;/span&gt;.&lt;span class="hljs-string"&gt;"created"&lt;/span&gt; &lt;span class="hljs-keyword"&gt;DESC&lt;/span&gt;)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Applying_filters_345"&gt;&lt;/a&gt;Applying filters&lt;/h3&gt;
&lt;p&gt;So far, our search engine can only handle user-entered query strings. If I am going to build a faceted search interface I need to be able to handle filtering as well. I want the ability to filter by year, tag and type.&lt;/p&gt;
&lt;p&gt;The key difference between filtering and querying (borrowing &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html"&gt;these definitions from Elasticsearch&lt;/a&gt;) is that querying is loose - it involves stemming and stopwords - while filtering is exact. Additionally, querying affects the calculated relevance score while filtering does not - a document either matches the filter or it doesn’t.&lt;/p&gt;
&lt;p&gt;Since PostgreSQL is a relational database, filtering can be handled by simply constructing extra SQL where clauses using the Django ORM.&lt;/p&gt;
&lt;p&gt;Each of the filters I need requires a slightly different approach. Filtering by type is easy - I just selectively include or exclude that model from my union queryset.&lt;/p&gt;
&lt;p&gt;Year and month work like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;selected_year = request.GET.get(&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;, &lt;span class="hljs-string"&gt;''&lt;/span&gt;)
selected_month = request.GET.get(&lt;span class="hljs-string"&gt;'month'&lt;/span&gt;, &lt;span class="hljs-string"&gt;''&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; selected_year:
    qs = qs.filter(created__year=int(selected_year))
&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; selected_month:
    qs = qs.filter(created__month=int(selected_month))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tags involve a join through a many-2-many relationship against the Tags table.  We want to be able to apply more than one tag, for example this search for &lt;a href="https://simonwillison.net/search/?tag=python&amp;amp;tag=javascript"&gt;all items tagged both python and javascript&lt;/a&gt;. Django’s ORM makes this easy:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;selected_tags = request.GET.getlist(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; tag &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; selected_tags:
    qs = qs.filter(tags__tag=tag)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Adding_facet_counts_374"&gt;&lt;/a&gt;Adding facet counts&lt;/h3&gt;
&lt;p&gt;There is just one more ingredient needed to complete our faceted search: facet counts!&lt;/p&gt;
&lt;p&gt;Again, the way we calculate these is different for each of our filters. For types, we need to call &lt;code&gt;.count()&lt;/code&gt; on a separate queryset for each of the types we are searching:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;queryset = make_queryset(Entry, &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;)
type_counts[&lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;] = queryset.count()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(the make_queryset function is &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/views.py#L408-L423"&gt;defined here&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;For years we can do this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; django.db.models.functions &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; TruncYear

&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; row &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; queryset.order_by().annotate(
    year=TruncYear(&lt;span class="hljs-string"&gt;'created'&lt;/span&gt;)
).values(&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;).annotate(n=models.Count(&lt;span class="hljs-string"&gt;'pk'&lt;/span&gt;)):
    year_counts[row[&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;]] = year_counts.get(
        row[&lt;span class="hljs-string"&gt;'year'&lt;/span&gt;], &lt;span class="hljs-number"&gt;0&lt;/span&gt;
    ) + row[&lt;span class="hljs-string"&gt;'n'&lt;/span&gt;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tags are trickiest. Let’s take advantage of he fact that Django’s ORM knows how to construct sub-selects if you pass another queryset to the &lt;code&gt;__in&lt;/code&gt;  operator.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;tag_counts = {}
type_name = &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;
queryset = make_queryset(Entry, &lt;span class="hljs-string"&gt;'entry'&lt;/span&gt;)
&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; tag, count &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; Tag.objects.filter(**{
    &lt;span class="hljs-string"&gt;'%s__in'&lt;/span&gt; % type_name: queryset
}).annotate(
    n=models.Count(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;)
).values_list(&lt;span class="hljs-string"&gt;'tag'&lt;/span&gt;, &lt;span class="hljs-string"&gt;'n'&lt;/span&gt;):
    tag_counts[tag] = tag_counts.get(tag, &lt;span class="hljs-number"&gt;0&lt;/span&gt;) + count
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Rendering_it_all_in_a_template_414"&gt;&lt;/a&gt;Rendering it all in a template&lt;/h3&gt;
&lt;p&gt;Having constructed the various facets counts in the view function, the template is really simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;{% if type_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Types&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in type_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;type"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.type&lt;/span&gt; %}"&amp;gt;&lt;/span&gt;{{ t.type }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
{% if year_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Years&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in year_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;year"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.year&lt;/span&gt;|&lt;span class="hljs-attribute"&gt;date:&lt;/span&gt;"&lt;span class="hljs-attribute"&gt;Y&lt;/span&gt;" %}"&amp;gt;&lt;/span&gt;{{ t.year|date:"Y" }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
{% if tag_counts %}
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;Tags&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;h3&lt;/span&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% for t in tag_counts %}
            &lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-title"&gt;a&lt;/span&gt; &lt;span class="hljs-attribute"&gt;href&lt;/span&gt;=&lt;span class="hljs-value"&gt;"{% add_qsarg "&lt;/span&gt;&lt;span class="hljs-value"&gt;tag"&lt;/span&gt; &lt;span class="hljs-attribute"&gt;t.tag&lt;/span&gt; %}"&amp;gt;&lt;/span&gt;{{ t.tag }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt; {{ t.n }}&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;a&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;li&lt;/span&gt;&amp;gt;&lt;/span&gt;
        {% endfor %}
    &lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-title"&gt;ul&lt;/span&gt;&amp;gt;&lt;/span&gt;
{% endif %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I am using &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/templatetags/blog_tags.py#L54-L68"&gt;custom templates tags&lt;/a&gt; here to add arguments to the current URL. I’ve built systems like this in the past where the URLs are instead generated in the view logic, which I think I prefer. As always, perfect is the enemy of shipped.&lt;/p&gt;
&lt;p&gt;And because the results are just a Django queryset, we can use Django’s pagination helpers for the pagination links.&lt;/p&gt;
&lt;h3&gt;&lt;a id="The_final_implementation_449"&gt;&lt;/a&gt;The final implementation&lt;/h3&gt;
&lt;p&gt;The full current version of the code at time of writing &lt;a href="https://github.com/simonw/simonwillisonblog/blob/3f5ca05248e409a946b53593f7d11b6f9551044f/blog/views.py#L388-L552"&gt;can be seen here&lt;/a&gt;. You can follow my initial implementation of this feature through the following commits: &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7e3a0217"&gt;7e3a0217&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/c7e7b30c"&gt;c7e7b30c&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7f6b524c"&gt;7f6b524c&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/a16ddb5e"&gt;a16ddb5e&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/7055c7e1"&gt;7055c7e1&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/74c194d9"&gt;74c194d9&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/f3ffc100"&gt;f3ffc100&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/6c24d9fd"&gt;6c24d9fd&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/cb88c2d4"&gt;cb88c2d4&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/2c262c75"&gt;2c262c75&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/776a562a"&gt;776a562a&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/b8484c50"&gt;b8484c50&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/0b361c78"&gt;0b361c78&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/1322ada2"&gt;1322ada2&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/79b1b13d"&gt;79b1b13d&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/3955f41b"&gt;3955f41b&lt;/a&gt; &lt;a href="https://github.com/simonw/simonwillisonblog/commit/3f5ca052"&gt;3f5ca052&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And that’s how I built faceted search on top of PostgreSQL and Django! I don’t have my blog comments up and running yet, so please post any thoughts or feedback over on &lt;a href="https://github.com/simonw/simonwillisonblog/issues/1"&gt;this GitHub issue&lt;/a&gt; or &lt;a href="https://news.ycombinator.com/item?id=15409733"&gt;over on this thread on Hacker News&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 9th September 2021&lt;/strong&gt;: A few years after implementing this I started to notice performance issues with my blog, which turned out to be caused by search engine crawlers hitting every possible combination of facets, triggering a ton of expensive SQL queries. I excluded &lt;code&gt;/search&lt;/code&gt; from being crawled using &lt;code&gt;robots.txt&lt;/code&gt; which fixed the problem.&lt;/em&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/orm"&gt;orm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facetedsearch"&gt;facetedsearch&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="django"/><category term="full-text-search"/><category term="orm"/><category term="postgresql"/><category term="projects"/><category term="search"/><category term="facetedsearch"/></entry></feed>