Hadoop - Installation

2010-01-06 14:31:04

Since we've had so much fun with multiple cores running at once, how about upping the game to play with multiple servers? Hadoop is a framework for distributed computing, which lets us process jobs on multiple servers at once giving more power *grunt*. In this first post I'll run through how to set up your first Hadoop server running in a VirtualBox using Arch.



Why Arch?

I'm doing these experiments on my tiny Macbook Pro laptop, so I want my Linux installation in the VBox to be as lean and clean as possible. Arch strikes a perfect balance between functionality and bloat and for something as simple as running a Hadoop server it's very easy to set up.

I think its a beautiful thing when a cleanly installed linux replies "No entries" to the "netstat -lnput" after installation. Arch lets you build your system from the ground up and although that takes a little longer than Ubuntu, it might just make for a better end result.


Why Hadoop?

Clojure is an excellent language for writing data parsers et al, so what could be more fun than taking our regular code and process it on a multiserver network? In industry, many tasks are of such dimensions that its pointless to run it on a single server, so if you have something like Flightcaster in mind, you need to get comfortable with distributed computing. Secondly its Java based, meaning that to get my hands all the way from Clojure into the Engine Room is very doable.

Worth mentioning as well is the fact that there is already a couple of Clojure Interfaces out in the open. As most people know the crew behind Flightcaster released Crane and secondly Stuart Sierra released the creatively named clojure-hadoop library.


The Installation

Thanks to the kind donations I was able to purchase Vimeo Plus, so that you can now follow the screencasts in HD, hopefully giving you a clearer rendering of the text! If you know all there is to know about installing Arch and getting Hadoop up an running in Pseudo Distributed Mode, then feel free to skip this entire post. It's a mandatory first stop for me, to ensure that everyone can follow future experiments using Hadoop.

Since this is HD 2x click for fullscreen or go to the Vimeo site.


The Video (16 min)

(double click for full-screen - if you're not seeing it, try hitting F5 or using Firefox)


Configuration

For your own set up, these are the things you need to change:

/etc/hosts.allow

sshd: ALL: ALLOW

java: ALL: ALLOW

/etc/rc.conf   (to autostart services)

daemons=(... sshd rsyncd ...)

~/hadoop/conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Hadoop XML configs

~/hadoop/conf/core-site.xml

Pseudo xml:  property: name: fs.default.name value: hdfs://localhost:9000

~/hadoop/conf/hdfs-site.xml

Pseudo xml: property: name: dfs.replication value: 1

~/hadoop/conf/mapred-site.xml

Pseudo xml: property: name: mapred.job.tracker value: localhost:9001

All of the XML configuration files are 6 lines long - I hope everybody is cool with that :)


Next Up

This was this obligatory step which we just have to get over with. The next step is making/using some kind of Clojure Interface with Hadoop in order to run jobs on it. Stay tuned for round #2.


Bernhard
2010-01-09 00:13:51
Dear Lau,

please stop doing these screencasts and write splendid articles, like you did in the past. Screencasts are like univeristy lectures. You get forced into the speed of the lecturer and have to waste 90 min. for something you can learn in your own speed in 10 (from books etc.). 

I like to browse through your articles. Skim the easy parts, ponder the hard ones, jump back to reread etc.

Best

Bernhard
Lau
2010-01-09 09:10:53
Dear Bernhard,

Thanks for stopping by :) I'll take your feedback into account, but I have to cater to both sides, those who like screencasts and those who don't. For something as simple as setting up a Compojure site or installing Hadoop I prefer screencasts, because then I can move on to more advanced stuff without leaving anyone behind. For the coming posts on Hadoop which are naturally more advanced, I would definitely prefer to write explanatory posts.
Ronen
2010-01-09 22:23:36
Very nice screen cast Lau, 
Im an avid Linux user but hadn't attempted to play around with arch yet, looking forward for the Clojure part to follow.
Hubert
2010-01-13 00:50:13
Hi Lau,

Though I'm a VIM user for years now I enjoyed Emacs joke :)
This post walked through all setup easily and with no errors, great.

Keep it up,
Hubert