In my last post I set out to solve a classic concurrency problem called ‘The sleeping barber’ and contrasted an STM solution with an Actor based solution. It occurred to me afterwards, that the interest in concurrency is quite huge these days (and for good reason), so I’ve decided to walk through the deadlock/livelock/starvation trap called The dining philosophers.
If you don’t know the challenge, it’s basically this
You have 5 (or n) philosophers sitting round a table. On the table lies 5 forks. Every philosophers behaves is in this way: Either he wants to think, or he wants to eat. If he decides to eat, he must secure both forks, the one to his left and the one to his right. The philosophers never communicate with one another.
For a full explanation read: this. But otherwise I hope this image will give you an idea of the scenario:
So that doesn’t sound too complicated does it? Well the trick is, that this challenge is actually ideal for seeing whether or not you can navigate around dead/live-locks. If Philosopher #1, 3 and 4 all have 1 fork in their hands, you risk that they’ll all sit and wait on each other, that’s a deadlock, nobody makes any progress. A livelock on the other hand is when all the processes seem to be making some progress, but never reach their goal.
There’s also the more subtle pitfall of starvation, where 2 excessive eaters (quick processes) can consume the entire meal (cpu time), leaving the remaining 3 philsophers (processes) starved. It’s a nice pun :)
Wikipedia suggests 3 way to solve this, which I will briefly outline here.
Since the philosophers don’t speak with each other one solution is to have them speak to a waiter which keeps an overview of the table, he can then decide who gets forks and when.
Resource hierarchy solution
The resource hierarchy basically consists of ordering the forks by number. They will then always be requested by order and released in the reverse order. Read the Wiki for the full explanation.
Chandy / Misra solution
And finally, we have a solution wherein you describe the forks as either being clean or dirty and based on this categorization the philosophers requests the forks from one another. In my book, this request equals communication and so the restraint inherent to the problem has been violated – So this is really no solution at all.
I’ve writen up a solution in Clojure, which is quite idiomatic, runs without any outside coordination and doesn’t ever enter a dead-/live-lock. It weighs in at 34 lines of code, so lets walk through them.
Firstly, we need to describe our data.
(def rounds (ref 20000)) (def philosophers (doall (map #(agent %) (repeat 5 0)))) (def forks (doall (map #(ref [% true]) (range (count philosophers))))) (def logger (agent 0))
Due to a food limitation in the universe, I’ve limited the number of rounds (portions) to 20000. Everytime a philsopher eats, this is decremented.
Second I have my philosophers, they’re all agents (that is, threads) and are spawned by applying the agent fn to 0, giving me a number of agents who’s initial state is 0. Notice, this is the only time in the entire program where I write “5”, if you change this number the number of philosophers and forks will change, and the show will go, that’s quite flexible.
Thirdly we need to describe the forks and since they’re our shared resources we (ref ..) them, making them only modifiable from within a transaction. What you’re seeing is a referenced vector for each fork, the first item being its id the second being a bool. The boolean describes availability, true = You can pick me up, false = don’t touch.
Lastly, I have a logger which allows me to dispatch logging as separate threads. Because I’m doing I/O (printing) from within my dosyncs this prevents you from seeing retries as threads are also dispatched upon succesfull commit.
This is the data, now we need some helpers to manipulate it.
(defn debug [_ id msg r] (println id \space msg "(" r ")") (flush))
For debugging purposes, I’ve made this little function which takes the id of the philosopher a message and the number of remaining rounds. That’ll give us an indication of whether or not the program is running as it should. The first argument _ is simply there, because when you send/send-off threads the first argument is automatically the state of that agent.
(defn my-forks [id] (map #(nth (cycle forks) (+ (count forks) %)) [id (dec id)]))
This little helper gives me the state of my forks. It works and it only took 3 seconds to write and I guess you can tell that by looking at it :) Basically we have our philosophers named 0 – 4, and we have our forks named 0 – 4. So we know that philsopher #1 has forks 4 and 0. Philsopher #2 has forks 1 and 0. #3 has 2 and 1 and so on. The trick here is, that when I supply an id n, I want forks n and n-1 returned, but if I pass id 0, that’ll give me 0 and -1, which doesn’t exist. So my work around was to produce a cycle of the numbers as demonstrated here:
user> (take 15 (cycle [1 2 3])) (1 2 3 1 2 3 1 2 3 1 2 3 1 2 3)
Basically I get an infinite sequence of my arguments cycled. So by offsetting my request with the length of my argument, I can pull ids with [id (dec id)] which is [n (n – 1)], like so with n=0 :
So adding the offset to n, lests us do a look-behind without causing an IndexOutOfBounds exception and as for ressource-waste, there is virtually none. Everything after the red “4” is never computated.
Now being able to identify my forks, I want to know if they’re available:
(defn got-forks? [id] (every? #(= true (second (deref %))) (my-forks id)))
This will work no matter how many forks you have, so if we later want to extend our code to a more complex scenario, we haven’t written anything so far which will need alterations. (my-forks …) returns a sequence like [1 true] [2 false] and now I want to know if all my forks are available: (is) every? (value of second in % true) from (my-forks) ? Because % will be every item returned from my-forks and all those items are refs, I call (deref %) to get at the value. (could have been : @%)
Great, this is the sensory logic for our philosophers, they can now see what they need to see in order to decide what to do, there’s just one thing missing: How do they interact?
(defn handle-forks [id action] (doseq [fork (my-forks id)] (ref-set fork [(first @fork) (condp = action :take false :release true)])))
I want to pass an ID and an action and have that played out. The trick is, we want to do this for an arbitrary number of forks and with a user-supplied action. For that reason I wrap my ref-set statement in a doseq, walking over any item (my-forks..) returns, and instead of manually saying true or false, I add a conditional (condp) statement, which applies = to action and acts accordingly.
Bonus: Notice how in an imperative language, you’d find something like
if action == true: ref-set(forks, vector, n, true); else: ref-set(forks, vector, n, false);
With Clojure we can avoid writing (ref-set forks n …) twice, because we’re not doing statements but expressions and the last value rolls up through the preceding expressions.
Ok, now we’ve got it all. Our philosophers can see the forks. Pick them up or release them. Now we need to work on our behavior model, which potentially is a complex nest of locks and conditionals. Lets take a look, and I’ll explain:
(defn behave [a id] (dosync ; Initiate transaction (when (pos? (ensure rounds)) ; Is there more food? (if (> 5 (rand-int 10)) ; Do I want to eat or think? (when (got-forks? id) ; Are both of my forks available? (handle-forks id :take) (alter rounds dec) (send-off logger debug id "ate " @rounds) (handle-forks id :release)) (send-off logger debug id "thinks " @rounds)) (send-off *agent* behave id)))) ; Repeat above
The “;” comments almost say it all. Within a transaction we decide what we want to do. If we think, we think and if we eat, why try to get the forks and if we’re successful we eat. Are you thinking, was that it? :)
Well, yes it is. You see by ensuring the rounds in the beginning of our transaction, we effectively put a reader/writer lock on that reference, giving us time to play. When I modify ’rounds’ and ‘forks’ from within my loop Clojures STM makes sure that I have a consistent view of the world. This is critical! It means that if I perceive that I’ve got 2 forks available and can eat, then I don’t risk that condition being violated before I interact with my reference. Everything happens atomically, even if 2 threads try this at the exact same time. Why?
Because time has been cloned
In our normal perception of time, things can happen at the exact same time. In the STMs perception of time, they can’t. Our program will run similar to this:
t1=got-forks? yes: grap ’em, eat, release ’em
t2=got-forks? no: retry
t3=got-forks? yes: grap ’em, eat, release ’em
On the STMs clock, these actions are in fact atomic, so the dreaded “grap the first fork, somebody else grabs the second fork, wait, wait, waiiiiiit…deadlock” situation can’t take place.
Now the attentive reader might be thinking that this comes with a huge performance penalty, but it doesn’t. The reason being that Clojures primary datastructures are all immutable, they can’t change. For this reason ‘reader threads’ never wait, they sit and look at their own snapshot of the world, which they pulled out the moment they started their transaction.
It’s true however, that in scenarios with extreme contention you will see transactions retrying which comes with a penalty. It’s a challenge to model your architecture correctly to avoid/minimize this and it’s a challenge for Rich to resolve it quickly when it does happen.
The GIL springs to mind. Pythons Global Interpreter Lock – That’s a lock if I’ve ever seen one. Since Pythons data is mutable, the only way to help your users through concurrency is to restrict access, so in effect the GIL will force all computations to run serially. In the real world a multi threaded Python application will at best leave your quadcore-server with 80% idle time and at it’s worst will cripple your computation because of the threads are fighting for the GILs attention. This is a case of locks gone bad, wherein parallel computing and multi threading becomes a bad idea.
In contrast, I’ve seen the STM tested on a 600 core Azul system, where all the cores were working hard and in parallel up until a certain point, where adding more threads caused performance to crater – so that’s a challenge for the STM.
I think it’s interesting how in the year 2009 we can now solve a concurrency problem like ‘Dining philosophers’ without writing manual locks at all. For Clojurians concurrency is first and foremost a challenge of applying the tools correctly. If you want to give it a whirl yourself, I’ll paste the entire code which you can load in SLIME and it’ll start and finish automatically.
Below you’ll see my solution, but if you have a nice idiomatic take on this problem using another language, you’re very welcome to send it in and I’ll post it here on the site. The only requirements are that it must run flawlessly and not use the GIL 8) Happy hacking!
(def rounds (ref 20000)) (def philosophers (doall (map #(agent %) (repeat 5 0)))) (def forks (doall (map #(ref [% true]) (range (count philosophers))))) (def logger (agent 0)) (defn debug [_ id msg r] (println id \space msg "(" r ")") (flush)) (defn my-forks [id] (map #(nth (cycle forks) (+ (count forks) %)) [id (dec id)])) (defn got-forks? [id] (every? #(= true (second (deref %))) (my-forks id))) (defn handle-forks [id action] (doseq [fork (my-forks id)] (ref-set fork [(first @fork) (condp = action :take false :release true)]))) (defn behave [a id] (dosync ; Initiate transaction (when (pos? (ensure rounds)) ; Is there more food? (if (> 5 (rand-int 10)) ; Do I want to eat or think? (when (got-forks? id) ; Are both of my forks available? (handle-forks id :take) (alter rounds dec) (send logger debug id "ate " @rounds) (handle-forks id :release)) (send logger debug id "thinks " @rounds)) (send-off *agent* behave id)))) (doseq [i (range (count philosophers))] (send logger debug i "being sent off to dinner" @rounds) (send-off (nth philosophers i) behave i))