Posts tagged hadoop

Hadoop — Feeding Reddit to Hadoop

hadoop-feeding-reddit-to-hadoop

With Hadoop installed on our lean mean Arch machine, we’re ready to fire up the first computations. Hadoop opens a world of fun with the promise of some heavy lifting and in order to feed the beast I’ve written a Reddit-scraper in just 30 lines of Clojure. More >

Hadoop — Installation

hadoop-installation

Since we’ve had so much fun with multiple cores running at once, how about upping the game to play with multiple servers? Hadoop is a framework for distributed computing, which lets us process jobs on multiple servers at once giving more power *grunt*. In this first post I’ll run through how to set up your first Hadoop server running in a VirtualBox using Arch.

More >