<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: revcanonical</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/revcanonical.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2009-04-14T16:34:42+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting Ian Hickson</title><link href="https://simonwillison.net/2009/Apr/14/rev/#atom-tag" rel="alternate"/><published>2009-04-14T16:34:42+00:00</published><updated>2009-04-14T16:34:42+00:00</updated><id>https://simonwillison.net/2009/Apr/14/rev/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-November/017292.html"&gt;&lt;p&gt;We did some studies and found that the attribute was almost never used, and most of the time, when it was used, it was a typo where someone meant to write rel="" but wrote rev="". To be precise, the most commonly used value was rev="made", which is equivalent to rel="author" and thus was not a convincing use case. The second most common value was rev="stylesheet", which is meaningless and obviously meant to be rel="stylesheet".&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-November/017292.html"&gt;Ian Hickson&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hixie"&gt;hixie&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html5"&gt;html5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ian-hickson"&gt;ian-hickson&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markup"&gt;markup&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rev"&gt;rev&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;&lt;/p&gt;



</summary><category term="hixie"/><category term="html5"/><category term="ian-hickson"/><category term="markup"/><category term="rev"/><category term="revcanonical"/></entry><entry><title>Counting the ways that rev="canonical" hurts the Web</title><link href="https://simonwillison.net/2009/Apr/14/mnotus/#atom-tag" rel="alternate"/><published>2009-04-14T14:11:58+00:00</published><updated>2009-04-14T14:11:58+00:00</updated><id>https://simonwillison.net/2009/Apr/14/mnotus/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.mnot.net/blog/2009/04/14/rev_canonical_bad"&gt;Counting the ways that rev=&amp;quot;canonical&amp;quot; hurts the Web&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mark Nottingham complains about misapplied trust (a page can falsely claim to be the canonical URL for another page), the easy confusion between rev and rel and the lack of discussion with relevant communities.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mark-nottingham"&gt;mark-nottingham&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/standards"&gt;standards&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="mark-nottingham"/><category term="revcanonical"/><category term="standards"/><category term="urls"/></entry><entry><title>Quoting Les Orchard</title><link href="https://simonwillison.net/2009/Apr/14/nostalgia/#atom-tag" rel="alternate"/><published>2009-04-14T08:57:03+00:00</published><updated>2009-04-14T08:57:03+00:00</updated><id>https://simonwillison.net/2009/Apr/14/nostalgia/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://decafbad.com/blog/2009/04/13/i-like-revcanonical"&gt;&lt;p&gt;You guys are moving on this stuff too fast! Welcome to 2002, when lots of us had more spare time than employment and we deployed new crap like this on our blogs and sites daily.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://decafbad.com/blog/2009/04/13/i-like-revcanonical"&gt;Les Orchard&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/les-orchard"&gt;les-orchard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nostalgia"&gt;nostalgia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;&lt;/p&gt;



</summary><category term="les-orchard"/><category term="nostalgia"/><category term="revcanonical"/></entry><entry><title>I like rev="canonical"</title><link href="https://simonwillison.net/2009/Apr/13/like/#atom-tag" rel="alternate"/><published>2009-04-13T10:41:40+00:00</published><updated>2009-04-13T10:41:40+00:00</updated><id>https://simonwillison.net/2009/Apr/13/like/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://decafbad.com/blog/2009/04/13/i-like-revcanonical"&gt;I like rev=&amp;quot;canonical&amp;quot;&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Les Orchard summarises the current debate over what colour to paint the rev=“canonical” bikeshed.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/les-orchard"&gt;les-orchard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="les-orchard"/><category term="revcanonical"/><category term="urls"/></entry><entry><title>django-shorturls</title><link href="https://simonwillison.net/2009/Apr/13/jacobians/#atom-tag" rel="alternate"/><published>2009-04-13T09:31:13+00:00</published><updated>2009-04-13T09:31:13+00:00</updated><id>https://simonwillison.net/2009/Apr/13/jacobians/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/jacobian/django-shorturls/tree/master"&gt;django-shorturls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jacob took my self-admittedly shonky shorter URL code and turned it in to a proper reusable Django application.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/djangoshorturls"&gt;djangoshorturls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jacob-kaplan-moss"&gt;jacob-kaplan-moss&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="djangoshorturls"/><category term="jacob-kaplan-moss"/><category term="python"/><category term="revcanonical"/></entry><entry><title>Quoting Kellan Elliott-McCrea</title><link href="https://simonwillison.net/2009/Apr/12/flickr/#atom-tag" rel="alternate"/><published>2009-04-12T16:00:41+00:00</published><updated>2009-04-12T16:00:41+00:00</updated><id>https://simonwillison.net/2009/Apr/12/flickr/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://revcanonical.wordpress.com/2009/04/12/revcanonical-bookmarklet-and-designing-shorter-urls/"&gt;&lt;p&gt;We’re using the same trick on flic.kr to avoid having to maintain a look up database, though we’re using base 58.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://revcanonical.wordpress.com/2009/04/12/revcanonical-bookmarklet-and-designing-shorter-urls/"&gt;Kellan Elliott-McCrea&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/base58"&gt;base58&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/flickr"&gt;flickr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kellan-elliott-mccrea"&gt;kellan-elliott-mccrea&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="base58"/><category term="flickr"/><category term="kellan-elliott-mccrea"/><category term="revcanonical"/><category term="urls"/></entry><entry><title>A rev="canonical" HTTP Header</title><link href="https://simonwillison.net/2009/Apr/12/chris/#atom-tag" rel="alternate"/><published>2009-04-12T12:33:48+00:00</published><updated>2009-04-12T12:33:48+00:00</updated><id>https://simonwillison.net/2009/Apr/12/chris/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://shiflett.org/blog/2009/apr/a-rev-canonical-http-header"&gt;A rev=&amp;quot;canonical&amp;quot; HTTP Header&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Chris Shiflett proposes optionally exposing rev=canonical information in an HTTP header, thus allowing sites to discover shorter URLs using just a HEAD request and removing the need to parse HTML. The pingback specification also uses this shortcut.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/chris-shiflett"&gt;chris-shiflett&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/head"&gt;head&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/headers"&gt;headers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pingback"&gt;pingback&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;&lt;/p&gt;



</summary><category term="chris-shiflett"/><category term="head"/><category term="headers"/><category term="http"/><category term="pingback"/><category term="revcanonical"/></entry><entry><title>Revving up</title><link href="https://simonwillison.net/2009/Apr/12/adactio/#atom-tag" rel="alternate"/><published>2009-04-12T12:29:25+00:00</published><updated>2009-04-12T12:29:25+00:00</updated><id>https://simonwillison.net/2009/Apr/12/adactio/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://adactio.com/journal/1568"&gt;Revving up&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jeremy Keith advocates adding the revcanonical attribute to regular A elements as well as / instead of hiding it in the head of the document, following the microformats design principle that invisible metadata is less valuable than augmenting visible links. I’ve updated my shorten bookmarklet to handle this case.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/jeremy-keith"&gt;jeremy-keith&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/metadata"&gt;metadata&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microformats"&gt;microformats&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;&lt;/p&gt;



</summary><category term="jeremy-keith"/><category term="metadata"/><category term="microformats"/><category term="revcanonical"/></entry><entry><title>rev=canonical bookmarklet and designing shorter URLs</title><link href="https://simonwillison.net/2009/Apr/11/revcanonical/#atom-tag" rel="alternate"/><published>2009-04-11T17:37:55+00:00</published><updated>2009-04-11T17:37:55+00:00</updated><id>https://simonwillison.net/2009/Apr/11/revcanonical/#atom-tag</id><summary type="html">
    &lt;p&gt;I've watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it's very unlikely that all of these new services will still be around in twenty years time. Last month &lt;a href="http://simonwillison.net/2009/Mar/8/twitter/"&gt;I suggested&lt;/a&gt; that the Internet Archive start mirroring redirect databases, and last week I was &lt;a href="http://simonwillison.net/2009/Apr/3/tinyurl/"&gt;pleased to hear&lt;/a&gt; that Archiveteam, a different organisation, had &lt;a href="http://archiveteam.org/index.php?title=TinyURL"&gt;already started crawling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most recent discussion was kicked off by &lt;a href="http://joshua.schachter.org/2009/04/on-url-shorteners.html"&gt;Joshua Schachter&lt;/a&gt; and &lt;a href="http://www.scripting.com/stories/2009/03/07/solvingTheTinyurlCentraliz.html"&gt;Dave Winer&lt;/a&gt;, and &lt;a href="http://laughingmeme.org/2009/04/03/url-shortening-hinting/" title="URL Shortening Hinting"&gt;a solution has emerged&lt;/a&gt; driven by some lightning fast hacking by Kellan Elliott-McCrea. The idea is simple: sites get to chose their preferred source of shortened URLs (including self-hosted solutions) and specify it from individual pages using &lt;code&gt;&amp;lt;link rev="canonical" href="... shorter URL here ..."&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By hosting their own shorteners, the reliability should match that of the host site - and the amount of damage caused by a major shortener going missing can be dramatically reduced.&lt;/p&gt;

&lt;p&gt;I've been experimenting with this new pattern today. Here are a few small contributions to the wider discussion.&lt;/p&gt;

&lt;h4&gt;A URL shortening bookmarklet&lt;/h4&gt;

&lt;p&gt;Kellan's &lt;a href="http://revcanonical.appspot.com/"&gt;rev=canonical service&lt;/a&gt; exposes rev=canonical links using a server-side script running on App Engine. An obvious next step is to distil that logic in to a bookmarklet. I decided to combine the rev=canonical logic with my &lt;a href="http://simonwillison.net/2008/Aug/27/jsontinyurl/"&gt;json-tinyurl&lt;/a&gt; web service (also on App Engine), which allows browsers to lookup or create TinyURLs using a cross-domain JSONP request. The resulting bookmarklet will display the site's rev=canonical link if it exists, or create and display a TinyURL link otherwise:&lt;/p&gt;

&lt;p&gt;Bookmarklet: &lt;a href="javascript:(function(){var url=document.location;var links=document.getElementsByTagName('link');var found=0;for(var i = 0, l; l = links[i]; i++){if(l.getAttribute('rev')=='canonical'||(/alternateshort/).exec(l.getAttribute('rel'))) {found=l.getAttribute('href');break;}}if (!found) {for (var i = 0; l = document.links[i]; i++) {if (l.getAttribute('rev') == 'canonical') {found = l.getAttribute('href');break;}}}if (found) {prompt('URL:', found);} else {window.onTinyUrlGot = function(r) {if (r.ok) {prompt('URL:', r.tinyurl);} else {alert('Could not shorten with tinyurl');}};var s = document.createElement('script');s.type='text/javascript';s.src='http://json-tinyurl.appspot.com/?callback=onTinyUrlGot&amp;amp;url=' +document.location;document.getElementsByTagName('head')[0].appendChild(s);}})();"&gt;Shorten&lt;/a&gt; (drag to your browser toolbar)&lt;/p&gt;

&lt;p&gt;You can also grab the &lt;a href="http://gist.github.com/93591"&gt;uncompressed source code&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Designing short URLs&lt;/h4&gt;

&lt;p&gt;I've also implemented rev=canonical on this site. I ended up buying a new domain for this, since simonwillison.net is both difficult to spell and 17 characters long. I ended up going with swtiny.eu - 9 characters, and keeping tiny in the domain helps people guess the nature of the site from just the URLs it generates. Be warned: the DNS doesn't appear to have finished resolving yet.&lt;/p&gt;

&lt;p&gt;For the path component, I turned to a variant of base 62 encoding. Decimal integers are represented using 10 digits (0-9), but base 62 uses those digits plus the letters of the alphabet in both lower and upper case. A 13 character integer such as 7250397214971 compresses down to just 8 characters (CDeIPpOD) using base62. My &lt;a href="http://www.djangosnippets.org/snippets/1431/"&gt;baseconv.py module&lt;/a&gt; implements base62, among others. I considered using base 57 by excluding o, O, 0, 1 and l as being too easily confused but decided against it.&lt;/p&gt;

&lt;p&gt;This site has three key types of content: entries, blogmarks and quotations. Each one is a separate Django model, and hence each has its own underlying database table and individual ID sequence. Since the IDs overlap, I need a way of separating out the shortened URLs for each content type.&lt;/p&gt;

&lt;p&gt;I decided to spend a byte on namespacing my shortened URLs. A prefix of E means an entry, Q means a quotation and B means a blogmark. For example:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/EZ8&lt;/samp&gt;: Entry with ID 1584&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/BBEQ&lt;/samp&gt;: Blogmark with ID 4108&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/QE5&lt;/samp&gt;: Quotation with ID 279&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By using upper case letters for the prefixes, I can later define custom paths starting with a lower case letter. I also have another 23 upper case prefix letters reserved in case I need them.&lt;/p&gt;

&lt;p&gt;I &lt;a href="http://twitter.com/simonw/status/1496864191"&gt;asked on Twitter&lt;/a&gt; and consensus opinion was that a 301 permanent redirect was the right thing to do (as opposed to a 302), both for SEO reasons and because the content will never exist at the shorter URL.&lt;/p&gt;

&lt;h4&gt;Implementation using Django and nginx&lt;/h4&gt;

&lt;p&gt;I run all of my Django sites using Apache and &lt;a href="http://code.google.com/p/modwsgi/"&gt;mod_wsgi&lt;/a&gt;, proxied behind &lt;a href="http://nginx.net/"&gt;nginx&lt;/a&gt;. Each site gets an Apache running on a high port, and nginx deals with virtual host configuration (proxying each domain to a different Apache backend) and static file serving. I didn't want to set up a full Django site just to run swtiny.eu, especially since my existing blog engine was required in order to resolve the shortened URLs.&lt;/p&gt;

&lt;p&gt;Instead, I implemented the shortened URL direction as just another view within my existing site: &lt;samp&gt;http://simonwillison.net/shorter/EZ8&lt;/samp&gt;. I then configured nginx to invisibly requests to &lt;samp&gt;swtiny.eu&lt;/samp&gt; through to that URL. The correct incantation took a while to figure out, so here's the relevant section of my nginx.conf:&lt;/p&gt;

&lt;pre&gt;&lt;code class="nginx-conf"&gt;server {
    listen 80;
    server_name www.swtiny.eu swtiny.eu;
    location / {
        rewrite (.*) /shorter$1 break;
        proxy_pass http://simonwillison.net;
        proxy_redirect off;
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;proxy_redirect off&lt;/code&gt; is needed to prevent nginx from replacing &lt;samp&gt;simonwillison.net&lt;/samp&gt; in the resulting location header with &lt;samp&gt;swtiny.eu&lt;/samp&gt;. My Django view code is relatively shonky, but if you're interested you can &lt;a href="http://www.djangosnippets.org/snippets/1430/"&gt;find it here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The nice thing about this approach is that it makes it trivial to add custom URL shortening domains to other projects - a quick view function and a few lines of nginx configuration are all that is needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The bookmarklet now supports the rev attribute on A elements as well - &lt;a href="http://simonwillison.net/2009/Apr/11/revcanonical/#c44088"&gt;thanks for the suggestion&lt;/a&gt;, Jeremy.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bookmarklets"&gt;bookmarklets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dave-winer"&gt;dave-winer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/joshua-schachter"&gt;joshua-schachter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kellan-elliott-mccrea"&gt;kellan-elliott-mccrea&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tinyurl"&gt;tinyurl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="bookmarklets"/><category term="dave-winer"/><category term="django"/><category term="joshua-schachter"/><category term="kellan-elliott-mccrea"/><category term="projects"/><category term="python"/><category term="revcanonical"/><category term="tinyurl"/><category term="urls"/></entry></feed>