Today we have many interesting ways to approach webdevelopment. I've heard people say "I love PHP because its so productive, I get tons of code hammered down in a matter of hours" - And while thats a nice experience for new programmers it can tend to slur the facts a little bit. Let me try to contrast Clojures (+ Compojures + ClojureQLs) properties with PHPs.
Sticking with the quote above, whats in a line of code? First off, there's the potential for a bug. So the good feeling you have after getting 5.000 lines of code written down needs to be balanced by the functionality you get out and the number of potential bugs that you've introduced. In my experience I've never seen conciseness, clearness and functionality as perfectly balanced as it is in Clojure - but thats the topic for a seperate blogpost.
Secondly, in a line of code you have a maintenance cost, which is closely related to the clearness of code I just mentioned. I can't remember ever seeing a Clojure function larger than 50 lines (though they may exist), but I'll never forget the Oracle integration routine I saw in PHP which weighed in at 700 lines. Tell me who can skim that and get a good understanding of that routine. Odds are, that every time you need to come back and maintain that code, you'll waste valuable resources getting reacquainted with every single line.
In general I want this from my code
To be as concise as possible, without compromising neither elegance nor clearness
When you see snippets of my code below you can judge for yourselves if I succeeded or not.
First off. How far can you get using PHP? Not very far. I've never met a PHP programmer that didn't need to go in and out of PHP and SQL interchangeably throughout their code. This might seem like second nature to most of you, but for a supervisor of a department it's quite demanding to assess the skills of your employees when you're doing so on multiple fronts. For example you can have a programmer who continuously provides solid PHP code but once his code contains SQL you will need to do a code-review.
Since Lisps are famously good at DSLs this is in no way a concern for Clojurians. It's like Hal Abelson said
Lisp is not the right language to solve anything. It's the language with which you write the language to solve your problem.
There's much truth to this and in a business setting its a notable advantage if you only have to support one language.
When I launched the BestInClass website initially, I couldn't shake the feeling that launching a JVM instance on a vhost was overkill for a website which was basically a company profile. But it raises a valid question: What are the major advantages of taking the servlet route compared to something like PHP?
Well for one thing your server lives and breathes between clients, PHP sites don't. That means that every connection, every resource they load, every service they call, open and close with the coming and going of every visitor. That's a huge waste of resources!
When I first implemented a traffic logger, my first thought was similar to the PHP approach. I knew I could leverage ClojureQL to seamlessly drop the request information into my MySql database without much hassle. Since my handler for the request is a first class object, it would be simple to wrap it in a lambda (anonymous function) and automatically log everything which is requested. So roughly, I implemented it like this:
(defn with-logging [handler] (fn [request] (log-to-sql request) (handler request))) (run-server {:port 5453} "/*" (servlet (with-logging main-servlet)))
For non Clojurians thats a clever way of wrapping a function (handler) in a function (fn) in order to peek at the parameters. But we're still in Flintstones mode, as every single request to my server will open and close an SQL connection, compile an sql statement, extract information from the request etc etc.
Since I control the server, Jetty in this case, I can spawn as many threads as I want on the OS, and it would be a good idea to leverage that fact. The JVM can handle roughly 2000 concurrent threads so until my website reaches that level of fame, I wont worry about a fixed size thread pool. So lets multi-thread this app!
(defn with-logging [handler] (fn [request] (send-off (agent 0) (forge-sql-statement request)) (handler request)))
Now we're in business! Every call to my webserver spawns a separate thread which looks at the request in which I have a host of information:
And since I'm no longer in a rush to get this information committed to my database, I have some time to look up some extra information. This could be anything from the visitors website, home country, facebook pictures, you name it :) For now, I'll stick with this:
bestinclass> (get-origin "74.125.77.104") "US"
That'll give me enough to plot the geographics of my visitors, but really the possibilities are endless. Inside my logger 'forge-sql-statement' is really where the good stuff happens. Instead of flushing every SQL statement directly to my database server, I make a sequence of the requests in memory. I also have another thread, which runs every 5 minutes that dumps this list to SQL and resets the sequence.
This is where concurrency minded developers will smell a potential problem! Those of you reading who have been burned by concurrency will start to get a little nervous that I'm taking such a big step as introducing multiple threads working on the same data structure - And those concerns are very valid!
Lets say this happens:
What happens? If we don't employ any kind of locking, Thread 3 will add the request in vain since S will be emptied once the SQL job has finished thus the data in that request is lost. In my case I could probably live with this if I were not a perfectionist, but for a high-traffic site it could mean dozens of requests going down the drain in 5 minute intervals.
This complicates things a little bit. Firstly locks are a pain to implement and error prone. Secondly locking usually comes with a huge performance penalty. Ideally I would find some way of emptying S in 1 CPU op, that is to say, Atomically.
So, if we didn't have this concurrency issue to deal with, we would flush like this:
(defn log-dumper [a] (Thread/sleep *write-delay*) (let [sql-statements (let [tmp @*sql-buffer*] (ref-set *sql-buffer* []) tmp)] (doseq [stmt sql-statements] (run *mysql-connection* stmt)) (send-off *agent* log-dumper)))
To explain this real quick, I'll list the steps
This is where the rubber meets the road. Basically what we need is to make the assignment to 'tmp' and the following ref-set happen atomically, so that no thread has the opportunity to append an item to *sql-buffer*. In PHP it would be impossible, in C it probably take 6 months to implement, in Clojure I'll do it like this:
(let [sql-statements (dosync (let [tmp @*sql-buffer*] (ref-set *sql-buffer* []) tmp))] ....
By simply wrapping the expressions in dosync, Clojure's STM implementation will take care of my data! Who said High Level languages weren't great?
Whats gained? Well if we compare the number of SQL Transactions of PHP and Clojure respectively against the number of visitors the following graph emerges:
SQL Transactions / hour

As Clojure doesn't run extra laps for extra visitors it remains at a steady 12 transactions per hour. How much longer do you think this server can be in production before you need to upgrade? At least for high traffic sites there's a definate business-case.
So my first question remains, what does it cost to stick with the old technologies and disregard concurrency, servlets, etc...? Put blunty: Whats the cost of being old-school?
Facebook does not disclose the number of servers it operates. But research firm Data Center Knowledge puts the tally at about 10,000. [Facebook is rumoured to be buying 50,000 more servers with a recent debt raising of US$100 million.]
There's no 'one answer fits all', so I'll let each of you draw your own conclusions.
/Lau