<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: joshua-schachter</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/joshua-schachter.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2009-04-11T17:37:55+00:00</updated><author><name>Simon Willison</name></author><entry><title>rev=canonical bookmarklet and designing shorter URLs</title><link href="https://simonwillison.net/2009/Apr/11/revcanonical/#atom-tag" rel="alternate"/><published>2009-04-11T17:37:55+00:00</published><updated>2009-04-11T17:37:55+00:00</updated><id>https://simonwillison.net/2009/Apr/11/revcanonical/#atom-tag</id><summary type="html">
    &lt;p&gt;I've watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it's very unlikely that all of these new services will still be around in twenty years time. Last month &lt;a href="http://simonwillison.net/2009/Mar/8/twitter/"&gt;I suggested&lt;/a&gt; that the Internet Archive start mirroring redirect databases, and last week I was &lt;a href="http://simonwillison.net/2009/Apr/3/tinyurl/"&gt;pleased to hear&lt;/a&gt; that Archiveteam, a different organisation, had &lt;a href="http://archiveteam.org/index.php?title=TinyURL"&gt;already started crawling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most recent discussion was kicked off by &lt;a href="http://joshua.schachter.org/2009/04/on-url-shorteners.html"&gt;Joshua Schachter&lt;/a&gt; and &lt;a href="http://www.scripting.com/stories/2009/03/07/solvingTheTinyurlCentraliz.html"&gt;Dave Winer&lt;/a&gt;, and &lt;a href="http://laughingmeme.org/2009/04/03/url-shortening-hinting/" title="URL Shortening Hinting"&gt;a solution has emerged&lt;/a&gt; driven by some lightning fast hacking by Kellan Elliott-McCrea. The idea is simple: sites get to chose their preferred source of shortened URLs (including self-hosted solutions) and specify it from individual pages using &lt;code&gt;&amp;lt;link rev="canonical" href="... shorter URL here ..."&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By hosting their own shorteners, the reliability should match that of the host site - and the amount of damage caused by a major shortener going missing can be dramatically reduced.&lt;/p&gt;

&lt;p&gt;I've been experimenting with this new pattern today. Here are a few small contributions to the wider discussion.&lt;/p&gt;

&lt;h4&gt;A URL shortening bookmarklet&lt;/h4&gt;

&lt;p&gt;Kellan's &lt;a href="http://revcanonical.appspot.com/"&gt;rev=canonical service&lt;/a&gt; exposes rev=canonical links using a server-side script running on App Engine. An obvious next step is to distil that logic in to a bookmarklet. I decided to combine the rev=canonical logic with my &lt;a href="http://simonwillison.net/2008/Aug/27/jsontinyurl/"&gt;json-tinyurl&lt;/a&gt; web service (also on App Engine), which allows browsers to lookup or create TinyURLs using a cross-domain JSONP request. The resulting bookmarklet will display the site's rev=canonical link if it exists, or create and display a TinyURL link otherwise:&lt;/p&gt;

&lt;p&gt;Bookmarklet: &lt;a href="javascript:(function(){var url=document.location;var links=document.getElementsByTagName('link');var found=0;for(var i = 0, l; l = links[i]; i++){if(l.getAttribute('rev')=='canonical'||(/alternateshort/).exec(l.getAttribute('rel'))) {found=l.getAttribute('href');break;}}if (!found) {for (var i = 0; l = document.links[i]; i++) {if (l.getAttribute('rev') == 'canonical') {found = l.getAttribute('href');break;}}}if (found) {prompt('URL:', found);} else {window.onTinyUrlGot = function(r) {if (r.ok) {prompt('URL:', r.tinyurl);} else {alert('Could not shorten with tinyurl');}};var s = document.createElement('script');s.type='text/javascript';s.src='http://json-tinyurl.appspot.com/?callback=onTinyUrlGot&amp;amp;url=' +document.location;document.getElementsByTagName('head')[0].appendChild(s);}})();"&gt;Shorten&lt;/a&gt; (drag to your browser toolbar)&lt;/p&gt;

&lt;p&gt;You can also grab the &lt;a href="http://gist.github.com/93591"&gt;uncompressed source code&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Designing short URLs&lt;/h4&gt;

&lt;p&gt;I've also implemented rev=canonical on this site. I ended up buying a new domain for this, since simonwillison.net is both difficult to spell and 17 characters long. I ended up going with swtiny.eu - 9 characters, and keeping tiny in the domain helps people guess the nature of the site from just the URLs it generates. Be warned: the DNS doesn't appear to have finished resolving yet.&lt;/p&gt;

&lt;p&gt;For the path component, I turned to a variant of base 62 encoding. Decimal integers are represented using 10 digits (0-9), but base 62 uses those digits plus the letters of the alphabet in both lower and upper case. A 13 character integer such as 7250397214971 compresses down to just 8 characters (CDeIPpOD) using base62. My &lt;a href="http://www.djangosnippets.org/snippets/1431/"&gt;baseconv.py module&lt;/a&gt; implements base62, among others. I considered using base 57 by excluding o, O, 0, 1 and l as being too easily confused but decided against it.&lt;/p&gt;

&lt;p&gt;This site has three key types of content: entries, blogmarks and quotations. Each one is a separate Django model, and hence each has its own underlying database table and individual ID sequence. Since the IDs overlap, I need a way of separating out the shortened URLs for each content type.&lt;/p&gt;

&lt;p&gt;I decided to spend a byte on namespacing my shortened URLs. A prefix of E means an entry, Q means a quotation and B means a blogmark. For example:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/EZ8&lt;/samp&gt;: Entry with ID 1584&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/BBEQ&lt;/samp&gt;: Blogmark with ID 4108&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/QE5&lt;/samp&gt;: Quotation with ID 279&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By using upper case letters for the prefixes, I can later define custom paths starting with a lower case letter. I also have another 23 upper case prefix letters reserved in case I need them.&lt;/p&gt;

&lt;p&gt;I &lt;a href="http://twitter.com/simonw/status/1496864191"&gt;asked on Twitter&lt;/a&gt; and consensus opinion was that a 301 permanent redirect was the right thing to do (as opposed to a 302), both for SEO reasons and because the content will never exist at the shorter URL.&lt;/p&gt;

&lt;h4&gt;Implementation using Django and nginx&lt;/h4&gt;

&lt;p&gt;I run all of my Django sites using Apache and &lt;a href="http://code.google.com/p/modwsgi/"&gt;mod_wsgi&lt;/a&gt;, proxied behind &lt;a href="http://nginx.net/"&gt;nginx&lt;/a&gt;. Each site gets an Apache running on a high port, and nginx deals with virtual host configuration (proxying each domain to a different Apache backend) and static file serving. I didn't want to set up a full Django site just to run swtiny.eu, especially since my existing blog engine was required in order to resolve the shortened URLs.&lt;/p&gt;

&lt;p&gt;Instead, I implemented the shortened URL direction as just another view within my existing site: &lt;samp&gt;http://simonwillison.net/shorter/EZ8&lt;/samp&gt;. I then configured nginx to invisibly requests to &lt;samp&gt;swtiny.eu&lt;/samp&gt; through to that URL. The correct incantation took a while to figure out, so here's the relevant section of my nginx.conf:&lt;/p&gt;

&lt;pre&gt;&lt;code class="nginx-conf"&gt;server {
    listen 80;
    server_name www.swtiny.eu swtiny.eu;
    location / {
        rewrite (.*) /shorter$1 break;
        proxy_pass http://simonwillison.net;
        proxy_redirect off;
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;proxy_redirect off&lt;/code&gt; is needed to prevent nginx from replacing &lt;samp&gt;simonwillison.net&lt;/samp&gt; in the resulting location header with &lt;samp&gt;swtiny.eu&lt;/samp&gt;. My Django view code is relatively shonky, but if you're interested you can &lt;a href="http://www.djangosnippets.org/snippets/1430/"&gt;find it here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The nice thing about this approach is that it makes it trivial to add custom URL shortening domains to other projects - a quick view function and a few lines of nginx configuration are all that is needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The bookmarklet now supports the rev attribute on A elements as well - &lt;a href="http://simonwillison.net/2009/Apr/11/revcanonical/#c44088"&gt;thanks for the suggestion&lt;/a&gt;, Jeremy.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bookmarklets"&gt;bookmarklets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dave-winer"&gt;dave-winer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/joshua-schachter"&gt;joshua-schachter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kellan-elliott-mccrea"&gt;kellan-elliott-mccrea&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/revcanonical"&gt;revcanonical&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tinyurl"&gt;tinyurl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="bookmarklets"/><category term="dave-winer"/><category term="django"/><category term="joshua-schachter"/><category term="kellan-elliott-mccrea"/><category term="projects"/><category term="python"/><category term="revcanonical"/><category term="tinyurl"/><category term="urls"/></entry></feed>