In my last post I tried to outline some of the contrasts between the popular Python and Clojure. A repeated comment from the Python community was 'Why dont you let the code speak for itself, go to Rosetta'....So I did and this is what I found.
The Python community is very active and sizeable, which is good. Quite a few of them let me know that the examples which I had picked from the PyEuler project were not showing off Pythons best side and looking at their counter-examples I cannot deny that. On the encouragement of several Python users I've now visisted Rosetta code for the first time and here I'll share my experiences, solving 3 of their challenges.
If you read the comments in the last post you'll see that tempers tend to rise when you're contrasting languages and that's not what I'm going for. I want this to be a fun apples-to-apples comparison, where hopefully both Python and Clojure users take away something useful. But please remember before you start ripping my comment-box in pieces: Inverting the example does not always invert the conclusion. If we determine that certain codepaths are error prone and 'dangerous' providing a safe/correct example won't necessarily change the principle, so let's try to keep the debate principle oriented. Makes sense?
Now I won't claim any expertise on the Rosetta project I visisted it yesterday for the first time. It's basically a place where you can go and show off solutions in every programming language you can think of, which is quite insightful. I'll try to comment on size, readability, boilerplate and correctness in the following examples. My method was simple: I saw a link to a task from the front page, solved that, clicked around until I found something that had a Python solution, solved that and so on.
If the name rings a bell and you can't quite place it, perhaps it's because it relates to Mersenne primes which were mentioned on Slashdot earlier this week. The 48th number was found, winning some $100.000,- USD to a lucky (patient?) university. Mathematically speaking, the test is this:
Lucas-Lehmer Test: for p a prime, the Mersenne number 2p − 1 is prime if and only if 2p − 1 divides S(p − 1) where S(n + 1) = (S(n))2 − 2, and S(1) = 4.
So it's an iterative walk through S to determine if p is a Mersenne prime, its some Big Integer math which (until the next release) is a bit slow on the JVM. We'll start with the Python teams solution:
from sys import stdout
from math import sqrt, log
def is_prime ( p ):
if p == 2: return True # Lucas-Lehmer test only works on odd primes
elif p <_ in="in" range3="range3" no="no" print="print" _-="_-" range2="range2" intsqrtp1="intsqrtp1" mersenne="mersenne" else:_="else:_" _="1" true="true" or="or" log10="log10" finding="finding" count="count" upb_prime1:_="upb_prime1:_" _0="_0" p1:_="p1:_" upb_prime="int(" return="return" _2="_2" _1="_1" _2:_="_2:_" _0:_="_0:_" mprimes="mprimes" printmdp="printmdp" of="of" and="and" primes="primes" log2="log2" was="was" i="i" requested="requested" is_mersenne_prime="is_mersenne_prime" int="int" is_primep="is_primep" bits="bits" decimal="decimal" mp-1="mp-1" if="if" m_p="m_p" find="find" _:_="_:_" def="def" places="places" m2..d:upb_prime="m2..d:upb_prime" stdout.flush="stdout.flush" _45="_45" upb_count="45" maximum="maximum" for="for" number="number" false="false" unsigned="unsigned" long_bits_width="long_bits_width" is_mersenne_primep:_="is_mersenne_primep:_" enough="enough" p="p" given="given" s="=" precision="20000">= upb_count: break
print
Weighing in at 36 lines Team Python has produced a working solution. In the last post we we're specifically looking at Pythons functional programming (FP) capabilities, but since this is a comparison in a broader sense I won't deduct points for mutability and/or side effects :) Methodically speaking their prime tester is similar to mine, there's a couple of special cases but otherwise we look for divisors up until the squareroot of n. The biggest difference in both testers, is that Python code often relies on breaking out of nesting loops when certain conditions occur, but apart from that we're using the same approach. To new-comers their translation
2p - 1 to (1 << p) - 1
Might be a little confusing, but it's a great performance booster. << means bit-shift-left and
n << x = x*2^n
So, using the same methodology, allow me to present 11 lines of Clojure:
(defn prime? [i]
(cond (< i 4) (>= i 2)
(zero? (rem i 2)) false
:else (not-any? #(zero? (rem i %)) (range 3 (inc (Math/sqrt i))))))))
(defn mersenne? [p] (or (= p 2)
(let [mp (dec (bit-shift-left 1 p))]
(loop [n 3 s 4]
(if (> n p)
(zero? s)
(recur (inc n) (rem (- (* s s) 2) mp)))))))
(filter mersenne? (filter prime? (iterate inc 1)))
One thing you need to be mindful of, is that functions suffixed "?" in Clojure is often (by convention) used as predicates, meaning they will return either true of false. My prime? asks "are we below 4? If we are we're looking at numbers 1, 2 and 3, whereof only the first isn't a prime, so the question is really "are we looking at 2 or 3"? Shortened to (>= i 2). Mersenne? is exactly similar to Pythons and the filters should read like plain english. (iterate inc 1) iterates the function inc(rement) starting with one, 1.2.3.4.5.6.......n.
Python: Reads easy, well layed out - 36 lines is a bit much though
Clojure: Reads easy, 11 lines is excellent, no boilerplate.
If we're agreed, Clojure takes a tender lead of 1 - 0. If we're not agreed, I'm sure you'll let me know :)
It's worth noting though, that in the world outside Clojure and Python we see a C++ solution weighing about 45 lines, Java at about 40 lines and Ruby at 42 - So Clojure's 11 lines are impressive.
This was a fun little challenge. Basically you get a URL which feeds you some XML giving you titles of all Programmings tasks on Rosetta, you then crawl these looking for examples, counting them as you go along. This is one of the times where I have to say Team Pythons solution really impressed me. It weighs in at a trimmed 12 lines:
import urllib, xml.dom.minidom
x = urllib.urlopen("http://www.rosettacode.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Programming_Tasks&cmlimit=500&format=xml")
tasks = []
for i in xml.dom.minidom.parseString(x.read()).getElementsByTagName("cm"):
t = i.getAttribute('title').replace(" ", "_")
y = urllib.urlopen("http://www.rosettacode.org/w/index.php?title=%s&action=raw" % t)
tasks.append( y.read().lower().count("{{header|") )
print t.replace("_", " ") + ": %d examples." % tasks[-1]
print "\nTotal: %d examples." % sum(tasks)
Quite nice dont you think? First they get the xml into x and then they define an array called tasks. Here they inject the number of examples they crawl in their for-loop. They took the advice of Rosetta and match "{{header|" in all subpages, which I've found actually doesn't give quite the correct result, because not everybody uses this tag like they should! Shame on them.
Now, this had me pressed up against the wall in terms of line-numbers, so I knew I had to come up with something good. May I present 11 long lines of Clojure:
(use net.cgrand.enlive-html)
(let [xml "http://www.rosettacode.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Programming_Tasks&cmlimit=500&format=xml"
title-url "http://www.rosettacode.org/w/index.php?title="
task-count #(try
(dec (count (select (html-resource (java.net.URL. (str title-url %))) [[:h2]])))
(catch Exception 0))
results (pmap #(assoc % :tasks (task-count (.replace (:title %) \space \_)))
(map :attrs (select (html-resource (java.net.URL. xml-url)) [:cm])))]
(doseq [r results] (println (:title r) (apply str (repeat (- 55 (count (:title r))) \space)) (:tasks r)))
(println "Total: " (reduce + (map :tasks results))))
I know that I couldn't possible cram more code into that small space :) Allow me to explain:
I define task-count as being an anonymous function which counts the number of -h2- tags on a given URL and decrements 1 from this count. The reason for this is that when looking through the pages it's obvious that the '{{header' tag is being misused, but it was also obvious that the h2 tag is reserved for 2 things, 1x 'Content' and code-examples, so subtract the 1 and you got the number of examples.
Then comes the fun part - I run a parallel-map (that means multithreaded) on a function which builds a hash-map. assoc(iate) means to connect a key-value pair. Here the hash-map is whatever it's being fed, meaning I work directly on the one select returns, the key is :tasks and the value is (task-count) of pulling out the :title tag from parsing the attributes of :cm tags on the url. Cryptic? Let me show you a boiled down example:
user> (first (select (html-resource (java.net.URL. xml-url)) [:cm])) {:tag :cm, :attrs {:title "100 doors", :ns "0", :pageid "2151"}, :content nil} user> (:attrs (first (select (html-resource (java.net.URL. xml-url)) [:cm]))) {:title "100 doors", :ns "0", :pageid "2151"}
Line 1: The select methods returns as many as those maps are there are occurances of [:cm] in the XML. Then I pull out the attributes, giving me :title, :ns and :pageid. When that's fed into the main pmap I replace spaces with underscores making "100 doors" => "100_doors". That title is then fed to task-count, which returns the number of tasks on that page. That number is then tied to the hash-map with the key :tasks.
Then finally you have the printing which is straightforward, except I threw in that (repeat (- 55 (count (:title r))) \space) expression, which just aligns everything very nicely, injecting as many spaces as is needed to print 55 characters in. (it's a hack, but it's much prettier than the alternative). As an encouragement, know that once you're comfortable with Clojure, these routines take 2 - 5 minutes to write and half of that to read - but if you're coming from another language, I know that it's harder.
Python: Gets almost correct count, with very concise and readable code - zero boilerplate.
Runtime: 1 hour 53 minutes
Clojure: Gets almost correct count, with very concise and somewhat complex code, zero boilerplate
Runtime: 12 minutes !!
Behold the power of pmap! Deciding on a multi threaded strategy was key to such a huge performance gain, opening as many threads as makes sense on my duo-core system (pmap takes care of that automatically). Also note, I say 'almost correct' on with Clojure as well because user generated html is user generated, it's error prone. Although I tested dozens of pages manually finding Clojure to be right every time, that is no guarantee.
Sidenote: For this problem which both languages solved beautifully in ~10 lines, Java solves in 60+ lines, Perl about 50 and Ruby at about 35. So quite impressive on both counts.
Ok, last one.
This one is more of a data juggling act than anything else. You get some data and is asked to represent that in a way that's idiomatic to your language, not mutating it at run-time (which I did first). The data consists of 4 columns and n rows. Team Python weighs in at 12 lines (not counting data definition):
from collections import defaultdict
data = [('Employee Name', 'Employee ID', 'Salary', 'Department'),
('Tyler Bennett', 'E10297', 32000, 'D101'),
('John Rappl', 'E21437', 47000, 'D050'),
('George Woltman', 'E00127', 53500, 'D101'),
('Adam Smith', 'E63535', 18000, 'D202'),
('Claire Buckman', 'E39876', 27800, 'D202'),
('David McClellan', 'E04242', 41500, 'D101'),
('Rich Holcomb', 'E01234', 49500, 'D202'),
('Nathan Adams', 'E41298', 21900, 'D050'),
('Richard Potter', 'E43128', 15900, 'D101'),
('David Motsinger', 'E27002', 19250, 'D202'),
('Tim Sampair', 'E03033', 27000, 'D101'),
('Kim Arlich', 'E10001', 57000, 'D190'),
('Timothy Grove', 'E16398', 29900, 'D190')]
departments = defaultdict(list)
for rec in data[1:]:
departments[rec[-1]].append(rec)
N = 3
format = "%-15s " * len(data[0])
for department, recs in departments.iteritems():
print "Department", department
print " ", format % data[0]
for rec in sorted(recs, key=lambda rec: -rec[-2])[:N]:
print " ", format % rec
print
Starting with the data definition, then in Clojure terms that would be a vector containing lists. I'm not sure what a Python user would call it. It's low ceremony, but not being used to all the commas they kinda stick out.
They sort the data using 'sorted' which takes a lambda as the comperator. I think that's what Graham would call a 'near Lisp experience' and it is quite cool. Despite coming in at 12 lines, it's a slim read because all the functionality is packed into that 1 line - very powerful. (albeit a little Perl like "rec: -rec[-2])[:N]:")
Weighing in at 6 lines, we have Clojure:
(def data
[{:name "Tyler Bennett" :id "E10297" :salary 32000 :department "D101"}
{:name "John Rappl" :id "E21437" :salary 47000 :department "D050"}
{:name "George Woltman" :id "E00127" :salary 53500 :department "D101"}
{:name "Adam Smith" :id "E63535" :salary 18000 :department "D202"}
{:name "Claire Buckman" :id "E39876" :salary 27800 :department "D202"}
{:name "David McClellan" :id "E04242" :salary 41500 :department "D101"}
{:name "Rich Holcomb" :id "E01234" :salary 49500 :department "D202"}
{:name "Nathan Adams" :id "E41298" :salary 21900 :department "D050"}
{:name "Richard Potter" :id "E43128" :salary 15900 :department "D101"}
{:name "David Motsinger" :id "E27002" :salary 19250 :department "D202"}
{:name "Tim Sampair" :id "E03033" :salary 27000 :department "D101"}
{:name "Kim Arlich" :id "E10001" :salary 57000 :department "D190"}
{:name "Timothy Grove" :id "E16398" :salary 29900 :department "D190"}])
(let [n 3
departments (reduce #(assoc %1 (keyword (:department %2))
(conj ((keyword (:department %2)) %1) %2)) {}
(sort-by :salary data))]
(doseq [d departments]
(println (key d) (map #(str "[" (:name %) ": " (:salary %) "]") (take n (val d))))))
The data is a vector containing hash-maps. In the 'let' statement I bind n to 3, this is a requirement of the task and it states how many employees we extract (max). Then I do all my data-mangling in the binding to 'departments' and if you're comfortable with Clojure it should read like plain english...well almost :) Sort gets the raw data and compare the :salary values asking if the first is smaller than the next. This returns all the above data sorted by salary, highest first. Then that gets passed to reduce, which produces a hashmap which groups all the departments, like {:D101 [old-entries new-entries]}, growing it everytime a new employee from that department comes through.
And with that little functional beauty I'm actually done. The entries went in sorted by size, so when grouped by departments they're already lined up the way I want. All I have to do is call (take n (:D101 ..)) and I've got the result I wanted.
Python: Very concise, suffers a little on readability (-rec[:-2]) for instance, no boilerplate
Clojure: Very concise and readable, basically 2 expressions handle everything
So for a job such as this I'd say they are neck and neck. In the same amount of space I show off heavy use of reduce, assoc and sort, so I feel I should get an extra point for that :)
Where both languages packed all their power into 1 expression/statement, notice how C++ gets over 70 lines for this simple problem (again, not counting definitions), that's 700% more possibility for bugs, 69 lines wasted on ceremony, perhaps 2 hours straight into the waste basket. I only spent a few minutes on my solution and I assume it's the same for Team Python.
So I felt it was good to do a little more informal comparison after my first post earlier this week. Both Clojure and Python are tools and so I don't want to create an atmosphere we're people get all worked up about tools. There are some facts that we should consider and then there's an element of style and taste.
If I had to ascribe a 'coolness' factor to Python, I'd say it's pretty cool. Without hesitation I would pick it above C++, Perl, Ruby, TCL etc, just to name a few odd balls. It can come in very concise chunks and it's easy to dive into. Regarding boiletplate, clearness of syntax and ceremony it's virtually unmatched. When I reviewed Scala I was told that it had very thin boilerplate, but if that's true then Pythons is invisible. But like I also told one disgruntled Scala user, it's not always fun being compared to Clojure and I think a few Python users felt that they got more competition than they hoped for. Oh well, as long as the competition is friendly, its mutually beneficial.
Not forgetting, I was asked to let the code speak for itself, so I won't say another word...
I hope you enjoyed the read :)