<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: seo</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/seo.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-08-24T23:00:01+00:00</updated><author><name>Simon Willison</name></author><entry><title>SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL</title><link href="https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/#atom-tag" rel="alternate"/><published>2024-08-24T23:00:01+00:00</published><updated>2024-08-24T23:00:01+00:00</updated><id>https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/"&gt;SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A new paper from Google Research describing custom syntax for analytical SQL queries that has been rolling out inside Google since February, reaching 1,600 "seven-day-active users" by August 2024.&lt;/p&gt;
&lt;p&gt;A key idea is here is to fix one of the biggest usability problems with standard SQL: the order of the clauses in a query. Starting with &lt;code&gt;SELECT&lt;/code&gt; instead of &lt;code&gt;FROM&lt;/code&gt; has always been confusing, see &lt;a href="https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/"&gt;SQL queries don't start with SELECT&lt;/a&gt; by Julia Evans.&lt;/p&gt;
&lt;p&gt;Here's an example of the new alternative syntax, taken from the &lt;a href="https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-syntax.md"&gt;Pipe query syntax documentation&lt;/a&gt; that was added to Google's open source &lt;a href="https://github.com/google/zetasql"&gt;ZetaSQL&lt;/a&gt; project last week.&lt;/p&gt;
&lt;p&gt;For this SQL query:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;SELECT&lt;/span&gt; component_id, &lt;span class="pl-c1"&gt;COUNT&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;)
&lt;span class="pl-k"&gt;FROM&lt;/span&gt; ticketing_system_table
&lt;span class="pl-k"&gt;WHERE&lt;/span&gt;
  &lt;span class="pl-c1"&gt;assignee_user&lt;/span&gt;.&lt;span class="pl-c1"&gt;email&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;username@email.com&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-k"&gt;AND&lt;/span&gt; status &lt;span class="pl-k"&gt;IN&lt;/span&gt; (&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;NEW&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;ASSIGNED&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;ACCEPTED&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;)
&lt;span class="pl-k"&gt;GROUP BY&lt;/span&gt; component_id
&lt;span class="pl-k"&gt;ORDER BY&lt;/span&gt; component_id &lt;span class="pl-k"&gt;DESC&lt;/span&gt;;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Pipe query alternative would look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;FROM ticketing_system_table
|&amp;gt; WHERE
    assignee_user.email = 'username@email.com'
    AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
|&amp;gt; AGGREGATE COUNT(*)
   GROUP AND ORDER BY component_id DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Google Research paper is released as a two-column PDF. I &lt;a href="https://news.ycombinator.com/item?id=41339138"&gt;snarked about this&lt;/a&gt; on Hacker News: &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Google: you are a web company. Please learn to publish your research papers as web pages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This remains a long-standing pet peeve of mine. PDFs like this are horrible to read on mobile phones, hard to copy-and-paste from, have poor accessibility (see &lt;a href="https://fedi.simonwillison.net/@simon/113017908957136345"&gt;this Mastodon conversation&lt;/a&gt;) and are generally just &lt;em&gt;bad citizens&lt;/em&gt; of the web.&lt;/p&gt;
&lt;p&gt;Having complained about this I felt compelled to see if I could address it myself. Google's own Gemini Pro 1.5 model can process PDFs, so I uploaded the PDF to &lt;a href="https://aistudio.google.com/"&gt;Google AI Studio&lt;/a&gt; and prompted the &lt;code&gt;gemini-1.5-pro-exp-0801&lt;/code&gt; model like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Convert this document to neatly styled semantic HTML&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This worked &lt;em&gt;surprisingly well&lt;/em&gt;. It output HTML for about half the document and then stopped, presumably hitting the output length limit, but a follow-up prompt of "and the rest" caused it to continue from where it stopped and run until the end.&lt;/p&gt;
&lt;p&gt;Here's the result (with a banner I added at the top explaining that it's a conversion): &lt;a href="https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html"&gt;Pipe-Syntax-In-SQL.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I haven't compared the two completely, so I can't guarantee there are no omissions or mistakes.&lt;/p&gt;
&lt;p&gt;The figures from the PDF aren't present - Gemini Pro output tags like &lt;code&gt;&amp;lt;img src="figure1.png" alt="Figure 1: SQL syntactic clause order doesn't match semantic evaluation order. (From [25].)"&amp;gt;&lt;/code&gt; but did nothing to help me create those images.&lt;/p&gt;
&lt;p&gt;Amusingly the document ends with &lt;code&gt;&amp;lt;p&amp;gt;(A long list of references, which I won't reproduce here to save space.)&amp;lt;/p&amp;gt;&lt;/code&gt; rather than actually including the references from the paper!&lt;/p&gt;
&lt;p&gt;So this isn't a perfect solution, but considering it took just the first prompt I could think of it's a very promising start. I expect someone willing to spend more than the couple of minutes I invested in this could produce a very useful HTML alternative version of the paper with the assistance of Gemini Pro.&lt;/p&gt;
&lt;p&gt;One last amusing note: I posted a link to this &lt;a href="https://news.ycombinator.com/item?id=41339238"&gt;to Hacker News&lt;/a&gt; a few hours ago. Just now when I searched Google for the exact title of the paper my HTML version was already the third result!&lt;/p&gt;
&lt;p&gt;I've now added a &lt;code&gt;&amp;lt;meta name="robots" content="noindex, follow"&amp;gt;&lt;/code&gt; tag to the top of the HTML to keep this unverified &lt;a href="https://simonwillison.net/tags/slop/"&gt;AI slop&lt;/a&gt; out of their search index. This is a good reminder of how much better HTML is than PDF for sharing information on the web!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41338877"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/julia-evans"&gt;julia-evans&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="pdf"/><category term="seo"/><category term="sql"/><category term="ai"/><category term="julia-evans"/><category term="generative-ai"/><category term="llms"/><category term="gemini"/><category term="slop"/></entry><entry><title>Google is the only search engine that works on Reddit now thanks to AI deal</title><link href="https://simonwillison.net/2024/Jul/24/google-reddit/#atom-tag" rel="alternate"/><published>2024-07-24T18:29:55+00:00</published><updated>2024-07-24T18:29:55+00:00</updated><id>https://simonwillison.net/2024/Jul/24/google-reddit/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.404media.co/google-is-the-only-search-engine-that-works-on-reddit-now-thanks-to-ai-deal/"&gt;Google is the only search engine that works on Reddit now thanks to AI deal&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is depressing. As of around June 25th &lt;a href="https://www.reddit.com/robots.txt"&gt;reddit.com/robots.txt&lt;/a&gt; contains this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;User-agent: *
Disallow: /
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Along with a link to Reddit's &lt;a href="https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy"&gt;Public Content Policy&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Is this a direct result of Google's deal to license Reddit content for AI training, rumored &lt;a href="https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/"&gt;at $60 million&lt;/a&gt;? That's not been confirmed but it looks likely, especially since accessing that &lt;code&gt;robots.txt&lt;/code&gt; using the &lt;a href="https://search.google.com/test/rich-results"&gt;Google Rich Results testing tool&lt;/a&gt; (hence proxied via their IP) appears to return a different file, via &lt;a href="https://news.ycombinator.com/item?id=41057033#41058375"&gt;this comment&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/be0e8e595178207b1b3dce3b81eacfb3"&gt;my copy here&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41057033"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/reddit"&gt;reddit&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="reddit"/><category term="search-engines"/><category term="seo"/><category term="ai"/><category term="llms"/></entry><entry><title>Give people something to link to so they can talk about your features and ideas</title><link href="https://simonwillison.net/2024/Jul/13/give-people-something-to-link-to/#atom-tag" rel="alternate"/><published>2024-07-13T16:06:28+00:00</published><updated>2024-07-13T16:06:28+00:00</updated><id>https://simonwillison.net/2024/Jul/13/give-people-something-to-link-to/#atom-tag</id><summary type="html">
    &lt;p&gt;If you have a project, an idea, a product feature, or anything else that you want other people to understand and have conversations about... give them something to link to!&lt;/p&gt;
&lt;p&gt;Two illustrative examples are ChatGPT Code Interpreter and Boring Technology.&lt;/p&gt;
&lt;h4 id="chatgpt-code-interpreter-is-effectively-invisible"&gt;ChatGPT Code Interpreter is effectively invisible&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;ChatGPT Code Interpreter&lt;/strong&gt; has been one of my favourite AI tools for over a year. It's the feature of ChatGPT which allows the bot to write &lt;em&gt;and then execute&lt;/em&gt; Python code as part of responding to your prompts. It's incredibly powerful... and almost invisible! If you don't know how to use prompts to activate the feature you may not realize it exists.&lt;/p&gt;
&lt;p&gt;OpenAI don't even have a help page for it (and it very desperately needs documentation) - if you search their site you'll find &lt;a href="https://platform.openai.com/docs/assistants/tools/code-interpreter"&gt;confusing technical docs&lt;/a&gt; about an API feature and &lt;a href="https://community.openai.com/t/how-can-i-access-the-code-interpreter-plugin-model/205304"&gt;misleading outdated forum threads&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I evangelize this tool &lt;em&gt;a lot&lt;/em&gt;, but OpenAI really aren't helping me do that. I end up linking people to &lt;a href="https://simonwillison.net/tags/code-interpreter/"&gt;my code-interpreter tag page&lt;/a&gt; because it's more useful than anything on OpenAI's own site.&lt;/p&gt;
&lt;p&gt;Compare this with Claude's similar Artifacts feature which at least has an &lt;a href="https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them"&gt;easily discovered help page&lt;/a&gt; - though &lt;a href="https://www.anthropic.com/news/claude-3-5-sonnet"&gt;the Artifacts announcement post&lt;/a&gt; was shared with Claude 3.5 Sonnet so isn't obviously linkable. Even that help page isn't quite what I'm after. Features deserve dedicated pages!&lt;/p&gt;
&lt;p&gt;GitHub understand this: here are their feature landing pages for &lt;a href="https://github.com/features/codespaces"&gt;Codespaces&lt;/a&gt; and &lt;a href="https://github.com/features/copilot"&gt;Copilot&lt;/a&gt; (I could even guess the URL for Copilot's page based on the Codespaces one).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; It turns out there IS documentation about Code Interpreter mode... but I failed to find it because it didn't use those terms anywhere on the page! The title is &lt;a href="https://help.openai.com/en/articles/8437071-data-analysis-with-chatgpt"&gt;Data analysis with ChatGPT&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This amuses me greatly because OpenAI have been oscillating on the name for this feature almost since they launched - Code Interpreter, then Advanced Data Analysis, now Data analysis with ChatGPT. I made fun of this &lt;a href="https://simonwillison.net/2023/Oct/17/open-questions/#open-questions.034.jpeg"&gt;last year&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id="boring-technology-an-idea-with-a-website"&gt;Boring Technology: an idea with a website&lt;/h4&gt;
&lt;p&gt;Dan McKinley coined the term &lt;strong&gt;Boring Technology&lt;/strong&gt; in &lt;a href="https://mcfunley.com/choose-boring-technology"&gt;an essay in 2015&lt;/a&gt;. The key idea is that any development team has a limited capacity to solve new problems which should be reserved for the things that make their product unique. For everything else they should pick the most boring and well-understood technologies available to them - stuff where any bugs or limitations have been understood and discussed online for years.&lt;/p&gt;
&lt;p&gt;(I'm very proud that Django has earned the honorific of "boring technology" in this context!)&lt;/p&gt;
&lt;p&gt;Dan turned that essay into a talk, and then he turned that talk into a website with a brilliant domain name:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://boringtechnology.club/"&gt;boringtechnology.club&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The idea has stuck. I've had many productive conversations about it, and more importantly if someone &lt;em&gt;hasn't&lt;/em&gt; heard the term before I can drop in that one link and they'll be up to speed a few minutes later.&lt;/p&gt;
&lt;p&gt;I've tried to do this myself for some of my own ideas: &lt;a href="https://simonwillison.net/2021/Jul/28/baked-data/"&gt;baked data&lt;/a&gt;, &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;git scraping&lt;/a&gt; and &lt;a href="https://simonwillison.net/series/prompt-injection/"&gt;prompt injection&lt;/a&gt; all have pages that I frequently link people to. I never went as far as committing to a domain though and I think maybe that was a mistake - having a clear message that "this is the key page to link to" is a very powerful thing.&lt;/p&gt;
&lt;h4 id="this-is-about-both-seo-and-conversations"&gt;This is about both SEO and conversations&lt;/h4&gt;
&lt;p&gt;One obvious goal here is SEO: if someone searches for your product feature you want them to land on your own site, not surrender valuable attention to someone else who's squatting on the search term.&lt;/p&gt;
&lt;p&gt;I personally value the conversation side of it even more. Hyperlinks are the best thing about the web - if I want to talk about something I'd much rather drop in a link to the definitive explanation rather than waste a paragraph (as I did earlier with Code Interpreter) explaining what the thing is for the upmteenth time!&lt;/p&gt;
&lt;p&gt;If you have an idea, project or feature that you want people to understand and discuss, build it the web page it deserves. &lt;strong&gt;Give people something to link to!&lt;/strong&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/marketing"&gt;marketing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/writing"&gt;writing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/boring-technology"&gt;boring-technology&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/code-interpreter"&gt;code-interpreter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="github"/><category term="marketing"/><category term="seo"/><category term="writing"/><category term="openai"/><category term="chatgpt"/><category term="claude"/><category term="boring-technology"/><category term="code-interpreter"/><category term="coding-agents"/></entry><entry><title>Weeknotes: python_requires, documentation SEO</title><link href="https://simonwillison.net/2022/Jan/25/weeknotes/#atom-tag" rel="alternate"/><published>2022-01-25T23:54:52+00:00</published><updated>2022-01-25T23:54:52+00:00</updated><id>https://simonwillison.net/2022/Jan/25/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;Fixed Datasette on Python 3.6 for the last time. Worked on documentation infrastructure improvements. Spent some time with Fly Volumes.&lt;/p&gt;
&lt;h4&gt;Datasette 0.60.1 for Python 3.6&lt;/h4&gt;
&lt;p&gt;I got &lt;a href="https://github.com/simonw/datasette/issues/1609"&gt;a report&lt;/a&gt; that users of Python 3.6 were seeing errors when they tried to install Datasette.&lt;/p&gt;
&lt;p&gt;I actually &lt;a href="https://github.com/simonw/datasette/issues/1577"&gt;dropped support&lt;/a&gt; for 3.6 a few weeks ago, but that shouldn't have affected the already released Datasette 0.60 - so something was clearly wrong.&lt;/p&gt;
&lt;p&gt;This lead me to finally get my head around how &lt;code&gt;pip install&lt;/code&gt; handles Python version support. It's actually a very neat system which I hadn't previously taken the time to understand.&lt;/p&gt;
&lt;p&gt;Python packages can (and should!) provide a &lt;code&gt;python_requires=&lt;/code&gt; line in their &lt;code&gt;setup.py&lt;/code&gt;. That line for Datasette currently looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python_requires="&amp;gt;=3.7"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But in the 0.60 release it was still this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python_requires="&amp;gt;=3.6"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you run &lt;code&gt;pip install package&lt;/code&gt; this becomes part of the &lt;code&gt;pip&lt;/code&gt; resolution mechanism - it will default to attempting to install the highest available version of the package that supports your version of Python.&lt;/p&gt;
&lt;p&gt;So why did &lt;code&gt;pip install datasette&lt;/code&gt; break? It turned out that one of Datasette's dependencies, &lt;a href="https://www.uvicorn.org/"&gt;Uvicorn&lt;/a&gt;, had dropped support for Python 3.6 but did not have a &lt;code&gt;python_requires&lt;/code&gt; indicator that pip could use to resolve the correct version.&lt;/p&gt;
&lt;p&gt;Coincidentally, Uvicorn actually added &lt;code&gt;python_requires&lt;/code&gt; just &lt;a href="https://github.com/encode/uvicorn/pull/1328"&gt;a few weeks ago&lt;/a&gt; - but it wasn't out in a release yet, so &lt;code&gt;pip install&lt;/code&gt; couldn't take it into account.&lt;/p&gt;
&lt;p&gt;I raised this issue with the Uvicorn development team and  they turned around a fix really promptly - &lt;a href="https://github.com/encode/uvicorn/releases/tag/0.17.0.post1"&gt;0.17.0.post1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But before I had seen how fast the Uvicorn team could move I figured out how to fix the issue myself, thanks to &lt;a href="https://twitter.com/samuel_hames/status/1484327636860293121"&gt;a tip from Sam Hames&lt;/a&gt; on Twitter.&lt;/p&gt;
&lt;p&gt;The key to fixing it was &lt;a href="https://www.python.org/dev/peps/pep-0508/#environment-markers"&gt;environment markers&lt;/a&gt;, a feature of Python's dependency resolution system that allows you to provide extra rules for when a dependency should be used.&lt;/p&gt;
&lt;p&gt;Here's an &lt;code&gt;install_requires=&lt;/code&gt; example showing these in action:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;install_requires&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[
    &lt;span class="pl-s"&gt;"uvicorn~=0.11"&lt;/span&gt;,
    &lt;span class="pl-s"&gt;'uvicorn&amp;lt;=0.16.0;python_version&amp;lt;="3.6"'&lt;/span&gt;
]&lt;/pre&gt;
&lt;p&gt;This will install a Uvicorn version that loosely matches 0.11, but over-rides that rule to specify that it must be &lt;code&gt;&amp;lt;=0.16.0&lt;/code&gt; if the user's Python version is 3.6 or less.&lt;/p&gt;
&lt;p&gt;Since Datasette 0.60.1 will be the last version of Datasette to support Python 3.6, I decided to play it safe and pin the dependencies of every library to the most recent version that I have tested in Python 3.6 myself. Here's &lt;a href="https://github.com/simonw/datasette/blob/0.60.1/setup.py#L44-L78"&gt;the setup.py file&lt;/a&gt; I constructed for that.&lt;/p&gt;
&lt;p&gt;This ties into a larger open question for me about Datasette's approach to pinning dependencies.&lt;/p&gt;
&lt;p&gt;The rule of thumb I've heard is that you should pin dependencies for standalone released tools but leave dependencies loose for libraries that people will use as dependencies in their own projects - ensuring those users can run with different dependency versions if their projects require them.&lt;/p&gt;
&lt;p&gt;Datasette is &lt;em&gt;mostly&lt;/em&gt; a standalone tool - but it can also be used as a library. I'm actually planning to make its use as a library more obvious through &lt;a href="https://github.com/simonw/datasette/issues/1398"&gt;improvements to the documentation&lt;/a&gt; in the future.&lt;/p&gt;
&lt;p&gt;As such, pinning exact versions doesn't feel quite right to me.&lt;/p&gt;
&lt;p&gt;Maybe the solution here is to split the reusable library parts of Datasette out into a separate package - maybe &lt;code&gt;datasette-core&lt;/code&gt; - and have the &lt;code&gt;datasette&lt;/code&gt; CLI package depend on exact pinned dependencies while the &lt;code&gt;datasette-core&lt;/code&gt; library uses loose dependencies instead.&lt;/p&gt;
&lt;p&gt;Still thinking about this.&lt;/p&gt;
&lt;h4&gt;Datasette documentation tweaks&lt;/h4&gt;
&lt;p&gt;Datasette uses &lt;a href="https://readthedocs.org/"&gt;Read The Docs&lt;/a&gt; to host the documentation. Among other benefits, this makes it easy to host multiple documentation versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/latest/"&gt;docs.datasette.io/en/latest/&lt;/a&gt; is the latest version of the documentation, continuously deployed from the &lt;code&gt;main&lt;/code&gt; branch on GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/stable/"&gt;docs.datasette.io/en/stable/&lt;/a&gt; is the documentation for the most recent stable (non alpha or beta) release - currently 0.60.1. This is the version you get when you run &lt;code&gt;pip install datasette&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/0.59/"&gt;docs.datasette.io/en/0.59/&lt;/a&gt; is the documentation for version 0.59 - and every version back to 0.22.1 is hosted under similar URLs, currently covering 73 different releases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those previous versions all automatically show a note at the top of the page warning that this is out-dated documentation and linking back to stable - a feature which Read The Docs provides automatically.&lt;/p&gt;
&lt;p&gt;But... I noticed that &lt;code&gt;/en/latest/&lt;/code&gt; didn't do this. I wanted a warning banner to let people know that they were looking at the in-development version of the documentation.&lt;/p&gt;
&lt;p&gt;After some digging around, I fixed it using &lt;a href="https://til.simonwillison.net/readthedocs/link-from-latest-to-stable"&gt;a little bit of extra JavaScript&lt;/a&gt; added to my documentation template. Here's the key implementation detail:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;jQuery&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;$&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Show banner linking to /stable/ if this is a /latest/ page&lt;/span&gt;
  &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-c1"&gt;!&lt;/span&gt;&lt;span class="pl-pds"&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;span class="pl-cce"&gt;\/&lt;/span&gt;latest&lt;span class="pl-cce"&gt;\/&lt;/span&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;test&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;location&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;pathname&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;return&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;stableUrl&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;location&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;pathname&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;replace&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"/latest/"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"/stable/"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Check it's not a 404&lt;/span&gt;
  &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;stableUrl&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt; &lt;span class="pl-c1"&gt;method&lt;/span&gt;: &lt;span class="pl-s"&gt;"HEAD"&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;response&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;response&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;status&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;200&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Page exists, insert a warning banner linking to it&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses &lt;code&gt;fetch()&lt;/code&gt; to make an HTTP HEAD request for the stable documentation page, and inserts a warning banner only if that page isn't a 404. This avoids linking to a non-existant documentation page if it has been created in development but not yet released as part of a stable release. &lt;a href="https://docs.datasette.io/en/latest/csv_export.html"&gt;Example here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/latest-docs-warning.png" alt="Screenshot of the documentation page with a banner that says: This documentation covers the development version of Datasette. See this page for the current stable release." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Thinking about this problem got me thinking about SEO.&lt;/p&gt;
&lt;p&gt;A problem I've had with other projects that host multiple versions of their documentation is that sometimes I'll search on Google and end up landing on a page covering a much older version of their project. I think I've had this happen for both PostgreSQL and Python in the past.&lt;/p&gt;
&lt;p&gt;What's best practice for avoiding this? I &lt;a href="https://twitter.com/simonw/status/1484287724773203971"&gt;asked on Twitter&lt;/a&gt; and also started digging around for answers. "If in doubt, imitate Django" is a good general rule of thumb, so I had a look at how Django did this and spotted the following in the HTML of one of their &lt;a href="https://docs.djangoproject.com/en/2.2/topics/db/"&gt;prior version pages&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-text-html-basic"&gt;&lt;pre&gt;&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;link&lt;/span&gt; &lt;span class="pl-c1"&gt;rel&lt;/span&gt;="&lt;span class="pl-s"&gt;canonical&lt;/span&gt;" &lt;span class="pl-c1"&gt;href&lt;/span&gt;="&lt;span class="pl-s"&gt;https://docs.djangoproject.com/en/4.0/topics/db/&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;So Django are using the &lt;a href="https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls"&gt;rel=canonical&lt;/a&gt; tag to point crawlers towards their most recent release.&lt;/p&gt;
&lt;p&gt;I decided to implement that myself... and then discovered that the Datasette documentation was doing it already! Read The Docs &lt;a href="https://docs.readthedocs.io/en/latest/custom_domains.html#canonical-urls"&gt;implement this piece&lt;/a&gt; of SEO best practice out of the box.&lt;/p&gt;
&lt;h4&gt;Datasette on Fly volumes&lt;/h4&gt;
&lt;p&gt;This one isn't released yet, but I made some good progress on it this week.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; announced this week that they would be providing 3GB of volume storage to accounts on their free tier. They called this announcement &lt;a href="https://fly.io/blog/free-postgres/"&gt;Free Postgres Databases&lt;/a&gt;, but tucked away in the blog post was this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The lede is "free Postgres" because that's what matters to full stack apps. You don't have to use these for Postgres. If SQLite is more your jam, mount up to 3GB of volumes and use "free SQLite." Yeah, we're probably underselling that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(There is &lt;a href="https://twitter.com/mrkurt/status/1484609372114272261"&gt;evidence&lt;/a&gt; that they may have been &lt;a href="https://xkcd.com/356/"&gt;nerd sniping&lt;/a&gt; me with that paragraph.)&lt;/p&gt;
&lt;p&gt;I have a plugin called &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt; which publishes Datasette instances to Fly. Obviously that needs to grow support for configuring volumes!&lt;/p&gt;
&lt;p&gt;I've so far &lt;a href="https://github.com/simonw/datasette-publish-fly/issues/11"&gt;completed the research&lt;/a&gt; on how that feature should work. The next step is to finish &lt;a href="https://github.com/simonw/datasette-publish-fly/issues/10"&gt;implementing the feature&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;sqlite-utils --help&lt;/h4&gt;
&lt;p&gt;I pushed out minor release &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-22-1"&gt;sqlite-utils 3.22.1&lt;/a&gt; today with one notable improvement: every single one of the 39 commands in the CLI tool now includes an example of usage as part of the &lt;code&gt;--help&lt;/code&gt; text.&lt;/p&gt;

&lt;p&gt;This feature was inspired by the new &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli-reference.html#cli-reference"&gt;CLI reference page&lt;/a&gt; in the documentation, which shows the help output for every command on one page - making it much easier to spot potential improvements.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.22.1"&gt;3.22.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;94 releases total&lt;/a&gt;) - 2022-01-26
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.10"&gt;0.10&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-credentials/releases"&gt;10 releases total&lt;/a&gt;) - 2022-01-25
&lt;br /&gt;A tool for creating credentials for accessing S3 buckets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60.1"&gt;0.60.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;106 releases total&lt;/a&gt;) - 2022-01-21
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/json-extract-path"&gt;json_extract() path syntax in SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/aws/helper-for-boto-aws-pagination"&gt;Helper function for pagination using AWS boto3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pixelmator/pixel-editing-favicon"&gt;Pixel editing a favicon with Pixelmator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/readthedocs/documentation-seo-canonical"&gt;Promoting the stable version of the documentation using rel=canonical&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/readthedocs/link-from-latest-to-stable"&gt;Linking from /latest/ to /stable/ on Read The Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/undocumented-graphql-api"&gt;Using the undocumented Fly GraphQL API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/read-the-docs"&gt;read-the-docs&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="seo"/><category term="datasette"/><category term="weeknotes"/><category term="fly"/><category term="sqlite-utils"/><category term="read-the-docs"/></entry><entry><title>datasette-block-robots</title><link href="https://simonwillison.net/2020/Jun/23/datasette-block-robots/#atom-tag" rel="alternate"/><published>2020-06-23T03:28:00+00:00</published><updated>2020-06-23T03:28:00+00:00</updated><id>https://simonwillison.net/2020/Jun/23/datasette-block-robots/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-block-robots"&gt;datasette-block-robots&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Another little Datasette plugin: this one adds a &lt;code&gt;/robots.txt&lt;/code&gt; page with &lt;code&gt;Disallow: /&lt;/code&gt; to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the &lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; cookiecutter template.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;



</summary><category term="crawling"/><category term="plugins"/><category term="projects"/><category term="robots-txt"/><category term="seo"/><category term="datasette"/></entry><entry><title>Building a sitemap.xml with a one-off Datasette plugin</title><link href="https://simonwillison.net/2020/Jan/6/sitemap-xml/#atom-tag" rel="alternate"/><published>2020-01-06T23:02:48+00:00</published><updated>2020-01-06T23:02:48+00:00</updated><id>https://simonwillison.net/2020/Jan/6/sitemap-xml/#atom-tag</id><summary type="html">
    &lt;p&gt;One of the fun things about launching a new website is re-learning what it takes to promote a website from scratch on the modern web. I've been thoroughly enjoying using &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt; as an excuse to explore 2020-era SEO.&lt;/p&gt;

&lt;p&gt;I used to use Google Webmaster Tools for this, but apparently that got rebranded as &lt;a href="https://en.wikipedia.org/wiki/Google_Search_Console"&gt;Google Search Console&lt;/a&gt; back in May 2015. It's really useful. It shows which search terms got impressions, which ones got clicks and lets you review which of your pages are indexed and which have returned errors.&lt;/p&gt;

&lt;p&gt;Niche Museums has been live since October 24th, but it was a SPA for the first month. I &lt;a href="https://simonwillison.net/2019/Nov/25/niche-museums/"&gt;switched it to server-side rendering&lt;/a&gt; (with separate pages for each museum) on November 25th. The Google Search Console shows it first appeared in search results on 2nd December.&lt;/p&gt;

&lt;p&gt;So far, I've had 35 clicks! Not exactly earth-shattering, but every site has to start somewhere.&lt;/p&gt;

&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2020/google-search-console.png" alt="Screenshot of the Google Search Console." /&gt;

&lt;p&gt;In a bid to increase the number of indexed pages, I decided to build a &lt;a href="https://www.sitemaps.org/protocol.html"&gt;sitemap.xml&lt;/a&gt;. This probably isn't necessary - Google advise that you might not need one if your site is "small", defined as 500 pages or less (Niche Museums lists 88 museums, though it's still increasing by one every day). It's nice to be able to view that sitemap and confirm that those pages have all been indexed inside the Search Console though.&lt;/p&gt;

&lt;p&gt;Since Niche Museums is entirely powered by a customized &lt;a href="https://datasette.readthedocs.io/"&gt;Datasette&lt;/a&gt; instance, I needed to figure out how best to build that sitemap.&lt;/p&gt;

&lt;h3 id="one-off-plugins"&gt;One-off plugins&lt;/h3&gt;

&lt;p&gt;Datasette's most powerful customization options are provided by &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html"&gt;the plugins mechanism&lt;/a&gt;. Back in June I &lt;a href="https://simonwillison.net/2019/Jun/23/datasette-asgi/"&gt;ported Datasette to ASGI&lt;/a&gt;, and the subsequent release of Datasette 0.29 introduced &lt;a href="https://simonwillison.net/2019/Jul/14/sso-asgi/"&gt;a new asgi_wrapper plugin hook&lt;/a&gt;. This hook makes it possible to intercept requests and implement an entirely custom response - ideal for serving up a &lt;code&gt;/sitemap.xml&lt;/code&gt; page.&lt;/p&gt;

&lt;p&gt;I considered building and releasing a generic &lt;code&gt;datasette-sitemap&lt;/code&gt; plugin that could be used anywhere, but that felt like over-kill for this particular problem. Instead, I decided to take advantage of &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html#writing-plugins"&gt;the --plugins-dir=&lt;/a&gt; Datasette option to build a one-off custom plugin for the site.&lt;/p&gt;

&lt;p&gt;The Datasette instance that runs Niche Museums starts up like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ datasette browse.db about.db \
    --template-dir=templates/ \
    --plugins-dir=plugins/ \
    --static css:static/ \
    -m metadata.json&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This serves the two SQLite database files, loads custom templatse from the &lt;code&gt;templates/&lt;/code&gt; directory, sets up &lt;a href="https://www.niche-museums.com/css/museums.css"&gt;www.niche-museums.com/css/museums.css&lt;/a&gt; to serve data from the &lt;code&gt;static/&lt;/code&gt; directory and loads metadata settings from &lt;code&gt;metadata.json&lt;/code&gt;. All of these files &lt;a href="https://github.com/simonw/museums"&gt;are on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It also tells Datasette to look for any Python files in the &lt;code&gt;plugins/&lt;/code&gt; directory and load those up as plugins.&lt;/p&gt;

&lt;p&gt;I currently have four Python files in that directory - you can &lt;a href="https://github.com/simonw/museums/tree/c745927f989fc173cc30609d83fe36d847080621/plugins"&gt;see them here&lt;/a&gt;. The &lt;code&gt;sitemap.xml&lt;/code&gt; is implemented using the new &lt;a href="https://github.com/simonw/museums/blob/c745927f989fc173cc30609d83fe36d847080621/plugins/sitemap.py"&gt;sitemap.py plugin file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's the first part of that file, which wraps the Datasette ASGI app with middleware that checks for the URL &lt;code&gt;/robots.txt&lt;/code&gt; or &lt;code&gt;/sitemap.xml&lt;/code&gt; and returns custom content for either of them:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;from datasette import hookimpl
from datasette.utils.asgi import asgi_send


@hookimpl
def asgi_wrapper(datasette):
    def wrap_with_robots_and_sitemap(app):
        async def robots_and_sitemap(scope, recieve, send):
            if scope["path"] == "/robots.txt":
                await asgi_send(
                    send, "Sitemap: https://www.niche-museums.com/sitemap.xml", 200
                )
            elif scope["path"] == "/sitemap.xml":
                await send_sitemap(send, datasette)
            else:
                await app(scope, recieve, send)

        return robots_and_sitemap

    return wrap_with_robots_and_sitemap&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The boilerplate here is a little convoluted, but this does the job. I'm considering adding alternative plugin hooks for custom pages that could simplify this in the future.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;asgi_wrapper(datasette)&lt;/code&gt; plugin function is expected to return a function which will be used to &lt;em&gt;wrap&lt;/em&gt; the Datasette ASGI application. In this case that wrapper function is called &lt;code&gt;wrap_with_robots_and_sitemap(app)&lt;/code&gt;. Here's &lt;a href="https://github.com/simonw/datasette/blob/3c861f363df02a59a67c59036278338e4760d2ed/datasette/app.py#L647-L652"&gt;the Datasette core code&lt;/a&gt; that builds the ASGI app and applies the wrappers:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;asgi = AsgiLifespan(
    AsgiTracer(DatasetteRouter(self, routes)), on_startup=setup_db
)
for wrapper in pm.hook.asgi_wrapper(datasette=self):
    asgi = wrapper(asgi)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So this plugin will be executed as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;asgi = wrap_with_robots_and_sitemap(asgi)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;wrap_with_robots_and_sitemap(app)&lt;/code&gt; function then returns another, asynchronous function. This function follows the ASGI protocol specification, and has the following signature and body:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;async def robots_and_sitemap(scope, recieve, send):
    if scope["path"] == "/robots.txt":
        await asgi_send(
            send, "Sitemap: https://www.niche-museums.com/sitemap.xml", 200
        )
    elif scope["path"] == "/sitemap.xml":
        await send_sitemap(send, datasette)
    else:
        await app(scope, recieve, send)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If the incoming URL path is &lt;code&gt;/robots.txt&lt;/code&gt;, the function directly returns a reference to the sitemap, as seen at &lt;a href="https://www.niche-museums.com/robots.txt"&gt;www.niche-museums.com/robots.txt&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If the path is &lt;code&gt;/sitemap.xml&lt;/code&gt;, it calls the &lt;code&gt;send_sitemap(...)&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;For any other path, it proxies the call to the original ASGI app function that was passed to the wrapper function: &lt;code&gt;await app(scope, recieve, send)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The most interesting part of the implementation is that &lt;code&gt;send_sitemap()&lt;/code&gt; function. This is the function which constructs the sitemap.xml returned by &lt;a href="https://www.niche-museums.com/sitemap.xml"&gt;www.niche-museums.com/sitemap.xml&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's what that function looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;async def send_sitemap(send, datasette):
    content = [
        '&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;',
        '&amp;lt;urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&amp;gt;',
    ]
    for db in datasette.databases.values():
        hidden = await db.hidden_table_names()
        tables = await db.table_names()
        for table in tables:
            if table in hidden:
                continue
            for row in await db.execute("select id from [{}]".format(table)):
                content.append(
                    "&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;https://www.niche-museums.com/browse/{}/{}&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;".format(
                        table, row["id"]
                    )
                )
    content.append("&amp;lt;/urlset&amp;gt;")
    await asgi_send(send, "\n".join(content), 200, content_type="application/xml")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The key trick here is to use the &lt;code&gt;datasette&lt;/code&gt; instance object which was passed to the &lt;code&gt;asgi_wrapper()&lt;/code&gt; plugin hook.&lt;/p&gt;

&lt;p&gt;The code uses that instance to introspect the attached SQLite databases. It loops through them listing all of their tables, and filtering out any hidden tables (which in this case are tables used by the SQLite FTS indexing mechanism). Then for each of those tables it runs &lt;code&gt;select id from [tablename]&lt;/code&gt; and uses the results to build the URLs that are listed in the sitemap.&lt;/p&gt;

&lt;p&gt;Finally, the resulting XML is concatenated together and sent back to the client with an &lt;code&gt;application/xml&lt;/code&gt; content type.&lt;/p&gt;

&lt;p&gt;For the moment, Niche Museums only has one table that needs including in the sitemap - the &lt;a href="https://www.niche-museums.com/browse/museums"&gt;museums table&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I have &lt;a href="https://github.com/simonw/datasette/issues/576"&gt;a longer-term goal&lt;/a&gt; to provide detailed documentation for the &lt;code&gt;datasette&lt;/code&gt; object here: since it's exposed to plugins it's become part of the API interface for Datasette itself. I want to stabilize this before I release Datasette 1.0.&lt;/p&gt;

&lt;h3 id="this-week-in-new-museums"&gt;This week's new museums&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/82"&gt;The Donkey Sanctuary&lt;/a&gt; in East Devon&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/84"&gt;Hazel-Atlas Sand Mine&lt;/a&gt; in Antioch&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/85"&gt;Ilfracombe Tunnels Beaches&lt;/a&gt; in North Devon&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/86"&gt;The Beat Museum&lt;/a&gt; in San Francisco&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/87"&gt;Sea Lions at Pier 39&lt;/a&gt; in San Francisco&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/88"&gt;Griffith Observatory&lt;/a&gt; in Los Angeles&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/89"&gt;Cohen Bray House&lt;/a&gt; in Oakland&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I had a lot of fun writing up the &lt;a href="https://www.niche-museums.com/browse/museums/88"&gt;Griffith Observatory&lt;/a&gt;: it turns out founding donor Griffith J. Griffith was &lt;a href="https://www.kcet.org/shows/lost-la/the-complex-life-of-griffith-j-griffith"&gt;a truly terrible individual&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/museums"&gt;museums&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="museums"/><category term="plugins"/><category term="projects"/><category term="seo"/><category term="datasette"/><category term="weeknotes"/></entry><entry><title>Evolving “nofollow” – new ways to identify the nature of links</title><link href="https://simonwillison.net/2019/Sep/10/evolving-nofollow-new-ways-to-identify-the-nature-of-links/#atom-tag" rel="alternate"/><published>2019-09-10T21:16:53+00:00</published><updated>2019-09-10T21:16:53+00:00</updated><id>https://simonwillison.net/2019/Sep/10/evolving-nofollow-new-ways-to-identify-the-nature-of-links/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://webmasters.googleblog.com/2019/09/evolving-nofollow-new-ways-to-identify.html"&gt;Evolving “nofollow” – new ways to identify the nature of links&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Slightly confusing announcement from Google: they’re introducing rel=ugc and rel=sponsored in addition to rel=nofollow, and will be treating all three values as “hints” for their indexing system. They’re very unclear as to what the concrete effects of these hints will be, presumably because they will become part of the secret sauce of their ranking algorithm.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=20930270"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nofollow"&gt;nofollow&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="nofollow"/><category term="seo"/></entry><entry><title>Googlebot's Javascript random() function is deterministic</title><link href="https://simonwillison.net/2018/Feb/7/googlebot-javascript/#atom-tag" rel="alternate"/><published>2018-02-07T02:41:16+00:00</published><updated>2018-02-07T02:41:16+00:00</updated><id>https://simonwillison.net/2018/Feb/7/googlebot-javascript/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.tomanthony.co.uk/blog/googlebot-javascript-random/"&gt;Googlebot&amp;#x27;s Javascript random() function is deterministic&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?”


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="crawling"/><category term="google"/><category term="seo"/></entry><entry><title>What is the plural of blitz?</title><link href="https://simonwillison.net/2017/Nov/25/what-is-the-plural-of-blitz/#atom-tag" rel="alternate"/><published>2017-11-25T17:42:07+00:00</published><updated>2017-11-25T17:42:07+00:00</updated><id>https://simonwillison.net/2017/Nov/25/what-is-the-plural-of-blitz/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.wordhippo.com/what-is/the-plural-of/blitz.html"&gt;What is the plural of blitz?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Wow, WordHippo is a straight up masterclass in keyword SEO tactics. Everything from the page URL to the keyword-crammed content to the enormous quantity of related links.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="seo"/></entry><entry><title>Whether 404 custom error page necessary for a website?</title><link href="https://simonwillison.net/2014/Jan/3/whether-404-custom-error/#atom-tag" rel="alternate"/><published>2014-01-03T13:14:00+00:00</published><updated>2014-01-03T13:14:00+00:00</updated><id>https://simonwillison.net/2014/Jan/3/whether-404-custom-error/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Whether-404-custom-error-page-necessary-for-a-website/answer/Simon-Willison"&gt;Whether 404 custom error page necessary for a website?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They aren't required, but if you don't have a custom 404 page you're missing out on a very easy way of improving the user experience of your site, and protecting against expired or incorrect links from elsewhere on the web.&lt;/p&gt;

&lt;p&gt;Even just a search box and a link to your homepage is enough to ensure visitors who arrive on a 404 can still visit the rest of your site, and hopefully find what they were looking for when they clicked on the link.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="http"/><category term="seo"/><category term="quora"/></entry><entry><title>What are good sources to learn about SEO?</title><link href="https://simonwillison.net/2013/Dec/5/what-are-good-sources/#atom-tag" rel="alternate"/><published>2013-12-05T11:33:00+00:00</published><updated>2013-12-05T11:33:00+00:00</updated><id>https://simonwillison.net/2013/Dec/5/what-are-good-sources/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-are-good-sources-to-learn-about-SEO/answer/Simon-Willison"&gt;What are good sources to learn about SEO?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;span&gt;&lt;a href="http://moz.com/beginners-guide-to-seo"&gt;The Beginner's Guide to SEO&lt;/a&gt;&lt;/span&gt; from Moz (previously SEOMoz) is an excellent introduction to SEO fundamentals.
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="seo"/><category term="quora"/></entry><entry><title>Do comments really count for SEO link building?</title><link href="https://simonwillison.net/2013/Jan/1/do-comments-really-count/#atom-tag" rel="alternate"/><published>2013-01-01T09:48:00+00:00</published><updated>2013-01-01T09:48:00+00:00</updated><id>https://simonwillison.net/2013/Jan/1/do-comments-really-count/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Do-comments-really-count-for-SEO-link-building/answer/Simon-Willison"&gt;Do comments really count for SEO link building?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most sensible commenting systems will put rel=nofollow on links to discourage comment spam, which will have a significant effect on SEO.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="search-engines"/><category term="seo"/><category term="quora"/></entry><entry><title>What are the ways to  Convert Dynamic JSP pages to a Static HTML to Appear in Google search results?</title><link href="https://simonwillison.net/2012/Sep/22/what-are-the-ways/#atom-tag" rel="alternate"/><published>2012-09-22T17:40:00+00:00</published><updated>2012-09-22T17:40:00+00:00</updated><id>https://simonwillison.net/2012/Sep/22/what-are-the-ways/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-are-the-ways-to-Convert-Dynamic-JSP-pages-to-a-Static-HTML-to-Appear-in-Google-search-results/answer/Simon-Willison"&gt;What are the ways to  Convert Dynamic JSP pages to a Static HTML to Appear in Google search results?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You don't have to do anything. You're misunderstanding how dynamic server-side languages like JSP work.&lt;/p&gt;

&lt;p&gt;Hit view source in your browser on a "static" HTML site (if you can find one - they're increasingly rare these days, and as an end user it's actually impossible to tell the difference, as I'm about to explain). You'll see what Google's search crawler sees: a bunch of HTML.&lt;/p&gt;

&lt;p&gt;Now do the same thing on a "dynamically" generated site - anything with a .php or .jsp extension is a good start (since they're revealing their technology choices through their URL, which is a bit tacky but does at least let you see what the'yre using). You'll see a bunch of HTML.&lt;/p&gt;

&lt;p&gt;Dynamic server-side technologies like JSP, PHP, Django, Rails, &lt;span&gt;&lt;a href="http://ASP.NET"&gt;ASP.NET&lt;/a&gt;&lt;/span&gt; etc run on the server - they generate HTML, which is then served to regular users and to search engine crawlers alike. It's not possible to tell for sure if that HTML was generated by code or is just a single static file that someone hosted on a web server.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="seo"/><category term="quora"/></entry><entry><title>What is the optimal description length in the Apple App Store?</title><link href="https://simonwillison.net/2012/Feb/9/what-is-the-optimal/#atom-tag" rel="alternate"/><published>2012-02-09T17:14:00+00:00</published><updated>2012-02-09T17:14:00+00:00</updated><id>https://simonwillison.net/2012/Feb/9/what-is-the-optimal/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-is-the-optimal-description-length-in-the-Apple-App-Store/answer/Simon-Willison"&gt;What is the optimal description length in the Apple App Store?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Have you ever come across one if those ugly, long pages advertising an ebook - the ones that bang on for dozens of paragraphs with bullet points, pictures, testimonials, headings, more testimonials, more bullet points and so on?&lt;/p&gt;

&lt;p&gt;Guess what: they work! The general format is called a "sales letter" - &lt;span&gt;&lt;a href="http://en.m.wikipedia.org/wiki/Sales_letter"&gt;http://en.m.wikipedia.org/wiki/S...&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;We know they work because people have been split testing them for decades.&lt;/p&gt;

&lt;p&gt;I imagine iPhone developers have discovered that the same trick (way too much information) works for 99 cent purchases on the App Store.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/iphone"&gt;iphone&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ios"&gt;ios&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="iphone"/><category term="seo"/><category term="quora"/><category term="ios"/></entry><entry><title>Why does Google use "Allow" in robots.txt, when the standard seems to be "Disallow?"</title><link href="https://simonwillison.net/2012/Feb/4/why-does-google-use/#atom-tag" rel="alternate"/><published>2012-02-04T09:45:00+00:00</published><updated>2012-02-04T09:45:00+00:00</updated><id>https://simonwillison.net/2012/Feb/4/why-does-google-use/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Why-does-Google-use-Allow-in-robots-txt-when-the-standard-seems-to-be-Disallow/answer/Simon-Willison"&gt;Why does Google use &amp;quot;Allow&amp;quot; in robots.txt, when the standard seems to be &amp;quot;Disallow?&amp;quot;&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Disallow command prevents search engines from crawling your site.&lt;/p&gt;

&lt;p&gt;The Allow command allows them to crawl your site.&lt;/p&gt;

&lt;p&gt;If you're using the Google Webmaster tools, you probably want Google to crawl your site.&lt;/p&gt;

&lt;p&gt;Am I misunderstanding your question?&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="crawling"/><category term="google"/><category term="search-engines"/><category term="seo"/><category term="quora"/></entry><entry><title>Why is Google indexing &amp; displaying www1 versions of my site and how might I stop this?</title><link href="https://simonwillison.net/2012/Jan/9/why-is-google-indexing/#atom-tag" rel="alternate"/><published>2012-01-09T12:43:00+00:00</published><updated>2012-01-09T12:43:00+00:00</updated><id>https://simonwillison.net/2012/Jan/9/why-is-google-indexing/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Why-is-Google-indexing-displaying-www1-versions-of-my-site-and-how-might-I-stop-this/answer/Simon-Willison"&gt;Why is Google indexing &amp;amp; displaying www1 versions of my site and how might I stop this?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You should stop serving your site to the public on multiple subdomains. Configure your site to serve a 301 permanent redirect from www1-www4 to the equivalent page on www - also, make sure that your site accessed without the www redirects to the right place as well.&lt;/p&gt;

&lt;p&gt;The use of 301s will avoid any SEO penalty.&lt;/p&gt;

&lt;p&gt;Why should you take this relatively extreme measure? Because serving on multiple subdomains hurts you in a bunch if ways:&lt;/p&gt;

&lt;p&gt;1) Your SEO is spread across multiple copies of the same page, hurting your page rank.&lt;/p&gt;

&lt;p&gt;2) Your cookies may end up spread across multiple domains, hurting your analytics and resulting in frustrated users who are signed in on only some of your subdomains&lt;/p&gt;

&lt;p&gt;3) You're damaging your scores on social media sharing sites. To use quite an old example, delicious used to use the number of bookmarks to a unique URL to decide what would appear on their "popular" page. Having multiple URLs for a piece of content split that score, making it much less likely you would appear there.&lt;/p&gt;

&lt;p&gt;4) You're making life harder for yourself should you need to switch to serving your entire site over SSL (which you may need to do to see Google search referral information as they move more of their search results pages to SSL)&lt;/p&gt;

&lt;p&gt;Using rel=canonical is a good short-term fix, but it's not too hate to implement the proper 301 fix and in my opinion it's well worth the effort.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/domains"&gt;domains&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="domains"/><category term="google"/><category term="search-engines"/><category term="seo"/><category term="quora"/></entry><entry><title>What are the best SEO conferences around Cincinnati?</title><link href="https://simonwillison.net/2012/Jan/6/what-are-the-best/#atom-tag" rel="alternate"/><published>2012-01-06T14:50:00+00:00</published><updated>2012-01-06T14:50:00+00:00</updated><id>https://simonwillison.net/2012/Jan/6/what-are-the-best/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-are-the-best-SEO-conferences-around-Cincinnati/answer/Simon-Willison"&gt;What are the best SEO conferences around Cincinnati?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It doesn't look like there are many (any?) SEO events in Cincinnati, but Chicago has has SES in November 2012: &lt;span&gt;&lt;a href="http://sesconference.com/chicago/"&gt;http://sesconference.com/chicago/&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/conferences"&gt;conferences&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="conferences"/><category term="seo"/><category term="quora"/></entry><entry><title>Does domain name masking negatively effect SEO?</title><link href="https://simonwillison.net/2011/Dec/30/does-domain-name-masking/#atom-tag" rel="alternate"/><published>2011-12-30T17:18:00+00:00</published><updated>2011-12-30T17:18:00+00:00</updated><id>https://simonwillison.net/2011/Dec/30/does-domain-name-masking/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Does-domain-name-masking-negatively-effect-SEO/answer/Simon-Willison"&gt;Does domain name masking negatively effect SEO?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, because you've made it impossible for people to share links to sub-pages on your site - which means you won't get incoming links to those pages, a crucial ranking metric.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="seo"/><category term="quora"/></entry><entry><title>Is it a good idea to allocate URLs such as quora.com/username to users?</title><link href="https://simonwillison.net/2010/Dec/22/is-it-a-good/#atom-tag" rel="alternate"/><published>2010-12-22T15:17:00+00:00</published><updated>2010-12-22T15:17:00+00:00</updated><id>https://simonwillison.net/2010/Dec/22/is-it-a-good/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Is-it-a-good-idea-to-allocate-URLs-such-as-quora-com-username-to-users/answer/Simon-Willison"&gt;Is it a good idea to allocate URLs such as quora.com/username to users?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's an interesting discussion about this issue on this question: &lt;span&gt;&lt;a href="https://www.quora.com/How-do-sites-prevent-vanity-URLs-from-colliding-with-future-features"&gt;How do sites prevent vanity URLs from colliding with future features ?&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="seo"/><category term="urls"/><category term="quora"/></entry><entry><title>If I have data that loads using  json / JavaScript will it get indexed by Google?</title><link href="https://simonwillison.net/2010/Oct/29/if-i-have-data/#atom-tag" rel="alternate"/><published>2010-10-29T15:30:00+00:00</published><updated>2010-10-29T15:30:00+00:00</updated><id>https://simonwillison.net/2010/Oct/29/if-i-have-data/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/If-I-have-data-that-loads-using-json-JavaScript-will-it-get-indexed-by-Google/answer/Simon-Willison"&gt;If I have data that loads using  json / JavaScript will it get indexed by Google?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No. Personally I dislike sites with content that is only accessible through JavaScript, but if you absolutely insist on doing this you should look in to implementing the Google Ajax Crawling mechanism: &lt;span&gt;&lt;a href="http://code.google.com/web/ajaxcrawling/"&gt;http://code.google.com/web/ajaxc...&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ajax"&gt;ajax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jquery"&gt;jquery&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/web-development"&gt;web-development&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ajax"/><category term="jquery"/><category term="json"/><category term="seo"/><category term="web-development"/><category term="quora"/></entry><entry><title>Great Literature Retitled To Boost Website Traffic</title><link href="https://simonwillison.net/2010/Jun/17/copy/#atom-tag" rel="alternate"/><published>2010-06-17T10:32:00+00:00</published><updated>2010-06-17T10:32:00+00:00</updated><id>https://simonwillison.net/2010/Jun/17/copy/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.mcsweeneys.net/links/lists/27lacher.html"&gt;Great Literature Retitled To Boost Website Traffic&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“7 Awesome Ways Barnyard Animals Are Like Communism”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://delicious.com/jacobian/mcsweeneys+seo"&gt;Jacob Kaplan-Moss&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/copy"&gt;copy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/funny"&gt;funny&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/awful"&gt;awful&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/headlines"&gt;headlines&lt;/a&gt;&lt;/p&gt;



</summary><category term="copy"/><category term="funny"/><category term="seo"/><category term="recovered"/><category term="awful"/><category term="headlines"/></entry><entry><title>Quoting Mark Pilgrim</title><link href="https://simonwillison.net/2010/Jun/8/html5/#atom-tag" rel="alternate"/><published>2010-06-08T20:48:00+00:00</published><updated>2010-06-08T20:48:00+00:00</updated><id>https://simonwillison.net/2010/Jun/8/html5/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://hg.diveintohtml5.org/hgweb.cgi/rev/31e07449843a7982c119bc7fe2c69b595a7e46f5"&gt;&lt;p&gt;I’m renaming the book to “Dive Into HTML 5” for better SEO. This is not a joke. The book is the #5 search result for “HTML5” (no space) but #13 for “HTML 5” (with a space). I get 514 visitors a day searching Google for “HTML5” but only 53 visitors a day searching for “HTML 5”.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://hg.diveintohtml5.org/hgweb.cgi/rev/31e07449843a7982c119bc7fe2c69b595a7e46f5"&gt;Mark Pilgrim&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/html5"&gt;html5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mark-pilgrim"&gt;mark-pilgrim&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/diveintohtml5"&gt;diveintohtml5&lt;/a&gt;&lt;/p&gt;



</summary><category term="html5"/><category term="mark-pilgrim"/><category term="seo"/><category term="recovered"/><category term="diveintohtml5"/></entry><entry><title>Official Google Webmaster Blog: A proposal for making AJAX crawlable</title><link href="https://simonwillison.net/2009/Oct/8/horrible/#atom-tag" rel="alternate"/><published>2009-10-08T17:52:31+00:00</published><updated>2009-10-08T17:52:31+00:00</updated><id>https://simonwillison.net/2009/Oct/8/horrible/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://googlewebmastercentral.blogspot.com/2009/10/proposal-for-making-ajax-crawlable.html"&gt;Official Google Webmaster Blog: A proposal for making AJAX crawlable&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It's horrible! The Google crawler would map &lt;code&gt;url#!state&lt;/code&gt; to &lt;code&gt;url?_escaped_fragment_=state&lt;/code&gt;, then expect your site to provide rendered HTML that reflects that state (they even go as far as to suggest running a headless browser within your web server to do this). Just stick to progressive enhancement instead, it's far less hideous. It looks like the proposal may have originated with the GWT team.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ajax"&gt;ajax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gwt"&gt;gwt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/progressive-enhancement"&gt;progressive-enhancement&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="ajax"/><category term="crawling"/><category term="google"/><category term="gwt"/><category term="javascript"/><category term="progressive-enhancement"/><category term="search-engines"/><category term="seo"/></entry><entry><title>Specify your canonical</title><link href="https://simonwillison.net/2009/Feb/14/canonical/#atom-tag" rel="alternate"/><published>2009-02-14T11:28:20+00:00</published><updated>2009-02-14T11:28:20+00:00</updated><id>https://simonwillison.net/2009/Feb/14/canonical/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html"&gt;Specify your canonical&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/canonical"&gt;canonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/metadata"&gt;metadata&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/relcanonical"&gt;relcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="canonical"/><category term="google"/><category term="metadata"/><category term="relcanonical"/><category term="search-engines"/><category term="seo"/><category term="urls"/></entry><entry><title>Underscores are now word separators, proclaims Google</title><link href="https://simonwillison.net/2008/Aug/13/underscores/#atom-tag" rel="alternate"/><published>2008-08-13T13:06:16+00:00</published><updated>2008-08-13T13:06:16+00:00</updated><id>https://simonwillison.net/2008/Aug/13/underscores/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://news.cnet.com/8301-10784_3-9748779-7.html"&gt;Underscores are now word separators, proclaims Google&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I missed this story last year—the change was announced by Matt Cutts at WordCamp 2007.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hyphens"&gt;hyphens&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/matt-cutts"&gt;matt-cutts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/underscores"&gt;underscores&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/wordcamp"&gt;wordcamp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/wordpress"&gt;wordpress&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="hyphens"/><category term="matt-cutts"/><category term="seo"/><category term="underscores"/><category term="wordcamp"/><category term="wordpress"/></entry><entry><title>Search Engine Optimization Through Hoax News</title><link href="https://simonwillison.net/2008/May/22/search/#atom-tag" rel="alternate"/><published>2008-05-22T18:09:27+00:00</published><updated>2008-05-22T18:09:27+00:00</updated><id>https://simonwillison.net/2008/May/22/search/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blogoscoped.com/archive/2008-05-22-n16.html"&gt;Search Engine Optimization Through Hoax News&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Devious new black-hat SEO technique: invent a news story that’s pure link-bait. The recent “13 year old steals dad’s credit card to buy hookers” story was a hoax: it was a pure play for PageRank.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blackhat"&gt;blackhat&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pagerank"&gt;pagerank&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="blackhat"/><category term="google"/><category term="pagerank"/><category term="seo"/></entry><entry><title>Some thoughts on Mahalo</title><link href="https://simonwillison.net/2007/Aug/20/some/#atom-tag" rel="alternate"/><published>2007-08-20T17:23:46+00:00</published><updated>2007-08-20T17:23:46+00:00</updated><id>https://simonwillison.net/2007/Aug/20/some/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.skrenta.com/2007/08/some_thoughts_on_mahalo.html"&gt;Some thoughts on Mahalo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Rich Skrenta with notes on running a large site that lives and dies by SEO traffic.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mahalo"&gt;mahalo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rich-skrenta"&gt;rich-skrenta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="mahalo"/><category term="rich-skrenta"/><category term="seo"/></entry><entry><title>Quoting Rich Skrenta</title><link href="https://simonwillison.net/2007/Apr/7/early/#atom-tag" rel="alternate"/><published>2007-04-07T00:32:46+00:00</published><updated>2007-04-07T00:32:46+00:00</updated><id>https://simonwillison.net/2007/Apr/7/early/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.skrenta.com/2007/04/early_adopter_pilotfish_pornog.html"&gt;&lt;p&gt;If you're designing social media systems, you should be keeping an eye on the $2B industry that sells links from your site to their clients.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.skrenta.com/2007/04/early_adopter_pilotfish_pornog.html"&gt;Rich Skrenta&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/rich-skrenta"&gt;rich-skrenta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;



</summary><category term="rich-skrenta"/><category term="seo"/></entry><entry><title>Why people hate SEO... (and why SMO is bulls$%t)</title><link href="https://simonwillison.net/2007/Feb/8/smo/#atom-tag" rel="alternate"/><published>2007-02-08T07:47:19+00:00</published><updated>2007-02-08T07:47:19+00:00</updated><id>https://simonwillison.net/2007/Feb/8/smo/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.calacanis.com/2007/02/07/why-people-hate-seo-and-why-smo-is-bulls-t/"&gt;Why people hate SEO... (and why SMO is bulls$%t)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jason Calacanis explains SMO, or “Social Media Optimisation”—digg spamming now has its own TLA.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/jason-calacanis"&gt;jason-calacanis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smo"&gt;smo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/spam"&gt;spam&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tla"&gt;tla&lt;/a&gt;&lt;/p&gt;



</summary><category term="jason-calacanis"/><category term="seo"/><category term="smo"/><category term="spam"/><category term="tla"/></entry><entry><title>The dangers of PageRank</title><link href="https://simonwillison.net/2004/Feb/6/dangers/#atom-tag" rel="alternate"/><published>2004-02-06T16:58:23+00:00</published><updated>2004-02-06T16:58:23+00:00</updated><id>https://simonwillison.net/2004/Feb/6/dangers/#atom-tag</id><summary type="html">
    &lt;p&gt;A well documented side effect of the weblog format is that it brings Google PageRank in almost absurd quantities. I'm now the 5th result for &lt;a href="http://www.google.com/search?q=simon" title="Google Search: simon"&gt;simon&lt;/a&gt; on Google, and I've been the top result for &lt;a href="http://www.google.com/search?q=simon+willison"&gt;simon willison&lt;/a&gt; almost since the day I launched. High rankings however are not always a good thing, especially when combined with a comment system. A growing number of bloggers have found themselves at the top position for terms of little or no relevance to the rest of their sites, which in turn can attract truly surreal comments from visitors from search engines who may never have encountered a blog before.&lt;/p&gt;

&lt;p&gt;I know of a couple of entries on my own blog that are attracting this kind of traffic. The most interesting is probably &lt;a href="/2003/Aug/13/artificialDiamonds/"&gt;this entry&lt;/a&gt; on &lt;a href="http://www.google.com/search?q=artificial+diamonds" title="Google Search: artificial diamonds"&gt;artifical diamonds&lt;/a&gt;, which has attracted comments from both buyers and sellers of artificial gems. My &lt;a href="/2002/Dec/09/badInterfaceDesignFromMicrosof/"&gt;entry&lt;/a&gt; on MSN messenger usability problems from 2002 has drawn a steady stream of hilarious comments, no doubt caused in part by its top rating on Google for &lt;a href="http://www.google.com/search?msn+messenger+sucks" title="Google Search: msn messenger sucks"&gt;msn messenger sucks&lt;/a&gt;. Amusingly, for a long time &lt;a href="http://search.msn.com/"&gt;Microsoft's own search engine&lt;/a&gt; was giving my page a high rank for a wide variety of less negative messenger related terms.&lt;/p&gt;

&lt;p&gt;My own experiences of this phenomenon pale in to significance to some of the others I've seen. The most impressive example has to be Jason Kottke's &lt;a href="http://www.kottke.org/03/05/the-matrix-reloaded"&gt;brief review&lt;/a&gt; of the Matrix Reloaded, which drew over 900 comments from Google strays, developed its own micro-community and resulted in Jason pondering &lt;a href="http://www.kottke.org/03/06/own-conversation"&gt;who owns the conversation on my web site?&lt;/a&gt; Jason eventually deciding to close and archive the thread after the page grew to more than a megabyte in size.&lt;/p&gt;

&lt;p&gt;The problem can take on a far more disturbing twist. I won't link directly to these entries for fear of adding to their predicaments, but searches for &lt;a href="http://www.google.com/search?q=crime+scene+cleanup" title="Google Search: crime scene cleanup"&gt;crime scene cleanup&lt;/a&gt; and &lt;a href="http://www.google.com/search?q=suicide+chat+rooms" title="Google Search: suicide chat rooms"&gt;suicide chat rooms&lt;/a&gt; both return blogs in the first two results. The former thread is mostly crime scene cleanup companies marketing their services, but the latter is quite frankly disturbing. It's certainly lead me to double check the titles of my entries before posting them.&lt;/p&gt;

&lt;p&gt;Thankfully, avoiding this kind of unwanted comment traffic is pretty simple. One way is to simply disable comments for entries older than a certain time (generally a couple of weeks), although personally I like to see the occasional comment on old entries. A neater solution proposed by Russell Beattie last year is to simply &lt;a href="http://www.beattie.info/notebook/1003990.html" title="Googler Comments"&gt;hide comments from search engine referrals&lt;/a&gt;, thus ensuring that random strays won't leave their mark without understanding the nature of your site first.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jason-kottke"&gt;jason-kottke&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pagerank"&gt;pagerank&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="jason-kottke"/><category term="pagerank"/><category term="seo"/></entry></feed>