Simplicity on Steroids

2010-02-04 21:00:07

Today I had a lot of fun reading about a for-constructs on steroids. The author of that post explores the possibilities which come with the built-in pattern matching in Scalas for-construct, so I'll do the same with Clojure.



Preface

Before reading this post I hope you'll at least skim the code in the article I linked above, just to get a feel for the complexity involved. But complex or not, with Scala you're able to do pattern matching directly in your for-loop, yielding results as you move along. As the author in this post demonstrates, here's one way to go:

case class Person(firstName:String, lastName: String);

val people = List(
  Person(“Jane”, “Smith”),
  Person(“John”, “Doe”),
  Person(“Jane”, “Eyre”));

for (Person(“Jane”, last) <- people) yield “Ms. ” + last;

» List(“Ms. Smith”, “Ms. Eyre”)

So is that handy? Yes that is absolutely handy, but Clojure doesn't do steroids, in fact Lisps are almost solely identified by their powerful idioms which you use again and again, basically forming lists/datastructures in new and creative ways. Clojure hosts 2 functions (macros) which act in the same domain (looping/comprehension), but are 2 different beasts entirely.


First we have doseq:

(doseq [i (range 5)]
    (println i))
 0
 1
 2
 3
 4
 nil    ;; return

Notice the final nil? Thats your return from a doseq, since it only acts on the list for the purpose of side-effects, in this case printing. Though it supports the same binding options as for (ie. :let, :while: when) its purpose is totally different. Most people coming from imperative languages will recognize this immediately as it is your typical loop-construct, similar to foreach. In the same arena you'll find dotimes which is handy for running a body of code multiple times and for that purpose it's faster than (doseq [i (range 10)]).


For on the other hand

...is list-comprehension at its best:

(for [i (range 5)]
  i)
(0 1 2 3 4)

This is implicitly doing what the yield statement does in Scala and is a nice way to functionally build up a datastructure, ie. totally different from doseq. Don't mix for with side-effects - both because it's bad practice but also because for is lazy. But lets say that we want to do matching like the Scala and only see those items with the first name 'Jane':

(def people [["Jane" "Smith"] ["John" "Doe"] ["Jane" "Eyre"]])

(for [person people :let [[name surname] person] :when (= "Jane" name)]
  (str "Ms. " surname))

>> ("Ms. Smith" "Ms. Eyre")

So the :when clause only evaluates the body when the predicate (= "Jane" name) is true, however it does not halt the process when the predicate is false, it simply moves on to the next item.

Since for is lazy nothing gets evaluated unless it needs to. Throughout some programs it makes a lot of sense to keep things lazy, because although each item gets some overhead you can usually architect the code so that you end up saving computations. Lets imagine that we're chewing our way through an endless stream of computational results, but for our final analysis we need only the first 5000 samples:

(for [sample samples :while (< (:id sample) 5000)]
  (:value sample))

That will run 5000 times and then abort. This could also be achieved like so

(take 5000 (for [sample samples ...

(take-while #(< (:id %) 5000) (for [sample ...

But getting comfortable with the full power for/doseq usually ends up giving you a nicer end result. Through all the examples, you're seeing the power of the seq-abstraction. Lets say you need to work on a nested strucuture, only working on the innermost data - double bound for is your friend. Imagine you have a named set, where each name is bound to  a vector containing the scores achieved that day. Parsing them in a way which handles empty vectors, nil values, int, floats, doubles, whatever you throw at it, looks like this:

(let  [scoreboard {"Allison" [10 11 12 1e3]
                   "Franky"  [5 2.34 1/4]
                   "John"    nil}
       scores     (for [[player scores] scoreboard, score scores]
                    score)]
  (/ (reduce + scores) (count scores)))

>> 148.65571428571428

Notice how the core idioms of Clojure all work together neatly - We're using for in many ways, adding predicates, abort conditions and also destructuring in breaking down scoreboard into both player and scores, then iterating each score individually. Of course the values could also have been expressed using another idiom:

(mapcat val scoreboard)

But we'll save mapcat for a rainy day.


Idioms

If you've read the first chapter of Joy of Clojure, you have seen this quote

The only difference between Shakespeare and you was the size of his idiom list - not the size of his vocabulary.

-- Alan Perlis

Clojure packs many extremely powerful idioms for working with data and in Lisp there's nothing but data. All of Clojure implicitly treats data as immutable, meaning no freely roaming state, and contrary to many languages you have to be explicit if you want mutability. This greatly reduces the incidental complexity of our programs, leading to a more robust software. Since we are in no shape or form introducing mutability (like iterators), but instead building all our data functionally we don't wind up with MatchError Exceptions or anything else which results from unexpected state.


Conclusion

This blogpost serves as a primer for using all that for has, but also as a reminder that mutable state should always be handled very very carefully as results quickly can become unpredictable. As you move into the higher layers of programming (read high-level languages), sometimes the fear sets in that you'll end up missing certain low-level features which used to be so crucial for your application, but I'm very pleased to see that Clojure  is deep enough that you won't find yourself missing anything, any time soon. On the other hand, while low-level can feel comfy and safe (because we've all done it for many years)  it does make for many areas where you need to pay extreme attention to ensure stable bug-free programs.


David
2010-02-05 01:17:09
Thanks for this.  Looking at the double bound for example gives me a nice glimpse into how mind-bending Lisp/Clojure can be.
Carson
2010-02-05 07:11:21
This part:
(for [person people :let [[name surname] person] :when (= "Jane" name)]
  (str "Ms. " surname))

actually looks rather "declarative" or mathematical. I mean it as it's closer to what I would be thinking if I were to write in math:

For each p in P with (n, s) = p such that n = "Jane", construct string "Ms. " s.

Intuitively, I would have the entire list of such strings in a list in my head. But when I convert my thought to a declarative for-loop, it gets farther away from what I'm thinking.

So this Clojure thing. Starting to like it. :)
Carson
2010-02-05 07:14:22
uh, that should read imperative for-loop...
Scott
2010-02-05 07:25:41
Instead of:

(for [person people :let [[name surname] person] :when (= "Jane" name)]
     (str "Ms. " surname))

you can just write:

(for [[name surname] people :when (= "Jane" name)]
      (str "Ms. " surname))
Lau
2010-02-05 08:10:16
@Scott - Thats true and as you can see I'm slowly building my way up to the climatic finish where I use all the tricks in one pass :)
Miki
2010-02-05 18:19:45
(mapcat vals scoreboard) does not work, should be
(apply concat (vals scoreboard))
Alex
2010-02-05 23:34:42
Miki: (mapcat vals scoreboard) does not work

True, but (mapcat val scoreboard) does.
Lau
2010-02-06 09:19:39
Miki: Thanks for pointing out the typo - I added an excessive 's' to val when finishing the post.
Stefan
2010-02-06 16:24:43
Hi Lau, thanks for your great blog posts - makes me wanna learn Clojure!

I gave mapcat a short try:

(def scoreboard {"Allison" [10 11 12 1000.0], "Franky" [5 2.34 1/4], "John" nil})

(defn use-for [board]
  (for [[player scores] board, score scores] score))

(defn use-mapcat [board]
  (mapcat val board))

(time (dotimes [_ 1e6] (use-for scoreboard)))
"Elapsed time: 53.466 msecs"

(time (dotimes [_ 1e6] (use-mapcat scoreboard)))
"Elapsed time: 1034.883 msecs"

So if performance matters...
I used Clojure 1.1.0 on a Mac.
Lau
2010-02-06 17:09:53
Stefan,

Keep in mind that <strong>for</strong> is lazy, so you're not actually doing any work - Try changing your definition of use-for to (doall (for [[player ......)))

Ps: Glad you like that blog :)
Stefan
2010-02-06 18:43:07
Hey Lau, good point!

So I did it again, this time with doall:

(time (dotimes [_ 1e6] (use-for scoreboard)))
"Elapsed time: 3182.929 msecs"

mapcat seems to be 'half' lazy:

(time (dotimes [_ 1e6] (use-mapcat scoreboard)))
"Elapsed time: 2686.442 msecs"

ps: Sorry to bother, I should have more patience and waiting for a rainy day ;-)
Lau
2010-02-06 19:19:40
Stefan, its no problem-  the blogosphere is a good place to practice :)
Tom
2010-02-23 18:50:59
I'm learning a lot from your posts.  Thank you!
Stan
2010-02-24 00:55:04
So how does 'for' work under the covers? Can you compare and contrast it with Scala's for, which is a syntactic sugar for a series of bindings within a monad?