<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: hadoop</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/hadoop.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2009-11-30T11:30:12+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting Facebook Data Team</title><link href="https://simonwillison.net/2009/Nov/30/hive/#atom-tag" rel="alternate"/><published>2009-11-30T11:30:12+00:00</published><updated>2009-11-30T11:30:12+00:00</updated><id>https://simonwillison.net/2009/Nov/30/hive/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.facebook.com/note.php?note_id=114588058858"&gt;&lt;p&gt;Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.facebook.com/note.php?note_id=114588058858"&gt;Facebook Data Team&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/facebook"&gt;facebook&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hive"&gt;hive&lt;/a&gt;&lt;/p&gt;



</summary><category term="facebook"/><category term="hadoop"/><category term="hive"/></entry><entry><title>Introducing Cloudera Desktop</title><link href="https://simonwillison.net/2009/Oct/21/cloudera/#atom-tag" rel="alternate"/><published>2009-10-21T18:48:36+00:00</published><updated>2009-10-21T18:48:36+00:00</updated><id>https://simonwillison.net/2009/Oct/21/cloudera/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.cloudera.com/blog/2009/10/01/introducing-cloudera-desktop/"&gt;Introducing Cloudera Desktop&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It’s a GUI for Hadoop, and under the hood is a whole stack of open source software, including Python, Django, MooTools, Twisted, lxml, CherryPy, Mako, Java and AspectJ.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/aspectj"&gt;aspectj&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cherrypy"&gt;cherrypy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudera"&gt;cloudera&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/java"&gt;java&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lxml"&gt;lxml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mako"&gt;mako&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mootools"&gt;mootools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twisted"&gt;twisted&lt;/a&gt;&lt;/p&gt;



</summary><category term="aspectj"/><category term="cherrypy"/><category term="cloudera"/><category term="django"/><category term="hadoop"/><category term="java"/><category term="lxml"/><category term="mako"/><category term="mootools"/><category term="open-source"/><category term="python"/><category term="twisted"/></entry><entry><title>Finding similar items with Amazon Elastic MapReduce, Python, and Hadoop streaming</title><link href="https://simonwillison.net/2009/Apr/7/amazon/#atom-tag" rel="alternate"/><published>2009-04-07T09:19:38+00:00</published><updated>2009-04-07T09:19:38+00:00</updated><id>https://simonwillison.net/2009/Apr/7/amazon/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://developer.amazonwebservices.com/connect/entry!default.jspa?categoryID=265&amp;amp;externalID=2294&amp;amp;fromSearchPage=true"&gt;Finding similar items with Amazon Elastic MapReduce, Python, and Hadoop streaming&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tutorial for running Hadoop jobs on Elastic MapReduce using Python and the 2005 Audioscrobbler dataset.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/amazon"&gt;amazon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/amazon-web-services"&gt;amazon-web-services&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/audioscrobbler"&gt;audioscrobbler&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/elasticmapreduce"&gt;elasticmapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="amazon"/><category term="amazon-web-services"/><category term="audioscrobbler"/><category term="elasticmapreduce"/><category term="hadoop"/><category term="mapreduce"/><category term="python"/></entry><entry><title>Amazon Elastic MapReduce</title><link href="https://simonwillison.net/2009/Apr/2/mapreduce/#atom-tag" rel="alternate"/><published>2009-04-02T10:25:37+00:00</published><updated>2009-04-02T10:25:37+00:00</updated><id>https://simonwillison.net/2009/Apr/2/mapreduce/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://aws.amazon.com/elasticmapreduce/"&gt;Amazon Elastic MapReduce&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hadoop as a service. Basically a web based GUI around Hadoop—you could roll this yourself on EC2 but for a small markup on regular EC2 prices you get to avoid the extra work setting everything up. Data processing scripts can be written in Java, Ruby, Perl, Python, PHP, R, or C++ and are loaded in to S3 before firing off the job.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://joedrumgoole.com/blog/2009/04/02/amazon-web-services-adds-map-reduce/"&gt;Joe Drumgoole&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/amazon"&gt;amazon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/amazon-web-services"&gt;amazon-web-services&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloud-computing"&gt;cloud-computing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ec2"&gt;ec2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/s3"&gt;s3&lt;/a&gt;&lt;/p&gt;



</summary><category term="amazon"/><category term="amazon-web-services"/><category term="cloud-computing"/><category term="ec2"/><category term="hadoop"/><category term="mapreduce"/><category term="s3"/></entry><entry><title>Cascading</title><link href="https://simonwillison.net/2008/Oct/1/about/#atom-tag" rel="alternate"/><published>2008-10-01T13:22:19+00:00</published><updated>2008-10-01T13:22:19+00:00</updated><id>https://simonwillison.net/2008/Oct/1/about/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.cascading.org/about.html"&gt;Cascading&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A Java API abstraction layer over Hadoop that lets developers think in terms of pipes and filters rather than map/reduce. The Cascading developers claim that this model is easier to understand and less error prone.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cascading"&gt;cascading&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/java"&gt;java&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pipesfilters"&gt;pipesfilters&lt;/a&gt;&lt;/p&gt;



</summary><category term="cascading"/><category term="hadoop"/><category term="java"/><category term="mapreduce"/><category term="pipesfilters"/></entry><entry><title>3 and 1/2 minutes to sort a Terabyte, and a look at Hadoop's code structure</title><link href="https://simonwillison.net/2008/Jul/7/bill/#atom-tag" rel="alternate"/><published>2008-07-07T14:15:23+00:00</published><updated>2008-07-07T14:15:23+00:00</updated><id>https://simonwillison.net/2008/Jul/7/bill/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.dehora.net/journal/2008/07/06/3-12-minutes-to-sort-a-terabyte-hadoops-code-structure/"&gt;3 and 1/2 minutes to sort a Terabyte, and a look at Hadoop&amp;#x27;s code structure&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Bill de hÓra uses some clever static analysis tools to explore Hadoop’s 100,000+ lines of code.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bill-de-hora"&gt;bill-de-hora&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/java"&gt;java&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/static-analysis"&gt;static-analysis&lt;/a&gt;&lt;/p&gt;



</summary><category term="bill-de-hora"/><category term="hadoop"/><category term="java"/><category term="static-analysis"/></entry><entry><title>Python + Hadoop = Flying Circus Elephant</title><link href="https://simonwillison.net/2008/May/31/lastfm/#atom-tag" rel="alternate"/><published>2008-05-31T14:14:56+00:00</published><updated>2008-05-31T14:14:56+00:00</updated><id>https://simonwillison.net/2008/May/31/lastfm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.last.fm/2008/05/29/python-hadoop-flying-circus-elephant"&gt;Python + Hadoop = Flying Circus Elephant&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Last.fm have released Dumbo, a Python module that lets you easily write Hadoop map/reduce tasks using Python and generators.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/dumbo"&gt;dumbo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generators"&gt;generators&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lastfm"&gt;lastfm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="dumbo"/><category term="generators"/><category term="hadoop"/><category term="lastfm"/><category term="mapreduce"/><category term="python"/></entry><entry><title>Writing An Hadoop MapReduce Program In Python</title><link href="https://simonwillison.net/2007/Oct/9/writing/#atom-tag" rel="alternate"/><published>2007-10-09T11:33:58+00:00</published><updated>2007-10-09T11:33:58+00:00</updated><id>https://simonwillison.net/2007/Oct/9/writing/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python"&gt;Writing An Hadoop MapReduce Program In Python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hadoop (the open source map/reduce framework) can interact with any program that reads from stdin and outputs on stdout—so it’s trivial to drop in Python scripts for the map and reduce steps.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="hadoop"/><category term="mapreduce"/><category term="python"/></entry><entry><title>Hadoop</title><link href="https://simonwillison.net/2006/Aug/23/hadoop/#atom-tag" rel="alternate"/><published>2006-08-23T08:36:14+00:00</published><updated>2006-08-23T08:36:14+00:00</updated><id>https://simonwillison.net/2006/Aug/23/hadoop/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://lucene.apache.org/hadoop/"&gt;Hadoop&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Open-source Google File System / map-reduce equivalent. Apparently scales amazingly well.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;&lt;/p&gt;



</summary><category term="hadoop"/></entry></feed>