Yes you read that right. Did you think Fluid Dynamic sims were reserved for the C++ crowd? In this blogpost I'll take a stab at this CPU intensive exercise wielding both Criterium, JVisualVM Profiler and a former Epic Games Engine developer and show how Functional Fluid Dynamic simulations are very possible using Clojure.
Fluid Dynamics equations are notoriously hard to solve. Try to attack that problem from a High Level language like Clojure and you've doubled your trouble. Fortunately for me Per Vognsen could spare a few moments of his time during last week to help me sort out some of the more complicated issues. He has previously worked as an Engine Developer for Epic Games as well as Technology Engineer at NVIDIA - So suffice it to say, this wasn't a first for him. Per said, that Fluid Dynamics starts with reading a paper on the subject by Jos Stam, said to be the best FD guy in the business.
His paper is incredibly approachable even for a non-game developer liFke me and provides source code snippets in the archaic C language. The weird thing about C is that its not even prefixed like Clojure (+ 2 2), instead all operators are scattered between the arguments! Since this is heavy duty math, I then had to go look up the precedence rules for math operators in C and convert it into beautiful prefixed Lisp notation.
Well yes in a way. There are many definitions of what Functional Programming is and Rich Hickey says that an FP language stresses "immutable datastructures and pure functions" - I can give you half. Almost every part of this blogpost is designed to tweak mutable data for optimal performance, but the functions are all pure in that they take a mutable thing and return another mutable thing - The key is, the return values are always used and no side-effects are employed.
Navier-Stokes is a set of equations which describe how densities (the fluid) and velocities (the forces) move and react in a confined grid. The entire concept of Fluid Dynamics in computer graphics (as opposed to accurate analysis) can be cooked down to 3 disciplines:
We'll walk through these one at a time and finally combine them into a working simulator with a Swing GUI. But before we do, there's the tiny detail of performance.
As you all know, Clojure defaults to persistent immutable datastructures but in a way which is very fast. However... Its not nearly fast enough. So we'll just drop down a notch and use transients right? Wrong, I tried transients and they're much too slow. Since performance is crucial for this simulation I'm digging directly into Javas Arrays. And this is where the Clojure compiler shows its young age, as it provides little to no help.
When you're working with Java Arrays there is a performance killer which will slow a routine of a few milliseconds down to several seconds, its called reflection and it happens whenever the compiler needs to go look up which type of object you're sending its way. Normally you can initialize your programs with
(set! *warn-on-reflection* true)
But as Christophe Grand points out in his fantastic blogpost on the matter, the Clojure compiler doesn't support this type of reflection warning... yet. Fortunately Christophe provides 2 very good macros which take away much of the pain of this Java interop, but while his macros are good they are too advanced for a fluid-sim primarily because I know which types I'm sending in every single time so I've adopted them in a 10% faster form, but now hardcoded for 2 indices of the int type and one value of the double type.
(defmacro aget! ([array y] `(aget ~(vary-meta array assoc :tag 'doubles) ~y)) ([array x & ys] `(let [a# (aget ~(vary-meta array assoc :tag 'objects) ~@ys)] (aget! a# ~x)))) (defmacro aset! [array x y v] (let [nested-array `(aget ~(vary-meta array assoc :tag 'objects) ~y) a-sym (with-meta (gensym "a") {:tag 'doubles})] `(let [~a-sym ~nested-array] (aset ~a-sym ~x (double ~v)))))
In case you're wondering why everything is done with macros its simply this: All Clojure functions return some kind of boxed object and if we're ever going to be fast enough we need to work with primitives, ie. no boxing. Macros are a developers best friend.
In response to a previous blogpost I did on Benchmarking, Hugo Duncan mentioned that he was working on a Clojure version of Haskells Criterion, called Criterium and this has been a friend indeed. Criterium benchmarks your functions, measures variance and much much more and its super simple to use.
Diffusion is the most used function in a FD sim, because both densities and forces diffuse, so the function is applied both in solving the density equation and the velocity equation. The main principle is quite simple: In every time-step a cell gives away a portion of its density to its 4 closest neighbors as well as receives part of theirs. Visualized like so:
So basically you have this formula where you add the values of the 5 cells in question multiply the neighboring values by some diffusion factor relative to the boards size and then you divide that number with a constant - Very simple! The trick to being fast is then abstracting away as much of the board I/O as possible, so that I can keep the logic simple and the low-level functions fast. So lets just imagine that we have a macro which takes a board, the description of a cell, like (+ left right) and a lookup board for referencing the neighbor-values as its arguments and then applies that to our board, then we could describe diffusion as such:
(defn diffuse-board [board b diffusion dt] (let [sz (count board) dr (double (* dt diffusion sz sz)) dconst (double (* 4 dr))] (map! (/ (+ self (* dr (+ (+ above below) (+ left right)))) dconst) board)))
First of all, you should know that the + operator works different depending on the number of arguments. 2 arguments might inline where more will call reduce instead:
user> (do (time (dotimes [i 1e6] (+ 1 2 3 4))) (time (dotimes [i 1e6] (+ (+ 1 2) (+ 3 4))))) "Elapsed time: 496.621851 msecs" "Elapsed time: 171.294196 msecs"
Note also that I'm coercing a couple of values to doubles, this is to avoid boxing. Although this description of the diffusion algorithm is simple, its also accurate so if we can just make map! fast enough it would be the best way to go. Here's map!:
(defmacro map! [f lookup] `(let [[w# h#] [(count ~lookup) (-> ~lookup first count)]] (let [board# (make-array Double/TYPE h# w#)] (do-board [w# h#] (let [~'above (aget! ~lookup (unchecked-dec ~'i) ~'j) ~'below (aget! ~lookup (unchecked-inc ~'i) ~'j) ~'left (aget! ~lookup ~'i (unchecked-dec ~'j)) ~'right (aget! ~lookup ~'i (unchecked-inc ~'j)) ~'self (aget! ~lookup ~'i ~'j)] (aset! board# ~'i ~'j ~f))) (set-bounds board# 0))))
As you can see, map! calls the still undefined do-board which walks over all the cells of our array, exposing x and y values as i and j respectively. map! then looks up neighbors and exposes their values as above,below,left,right respectively. Set-Bounds is just a helper function, which makes sure that the borders of the board are kept intact. So we're all set to roll out Criterium and see how fast this puppy goes:
user> (let [dd (new-array 80 80)] (bench (diffuse-board dd 1 2 1))) Evaluation count : 29880 Execution time mean : 2.005787 ms 95.0% CI: (2.005737 ms, 2.005829 ms) Execution time std-deviation : 115.162227 us 95.0% CI: (113.903283 us, 116.213038 us) Found 2 outliers in 60 samples (3.3333 %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
It averages at about 2 milliseconds and that result is 95% undisturbed by variance (good), with only 2 outliers in 60 samples. This is a very good result and similar in speed to Christophes Brians Brain sim, which also did 4 neighbor look-ups per cell in about 2 ms. But while Criterium tells us how fast an algorithm is, it doesn't tell us how fast it can be (though Hugo has been seen downloading the source to JVisualVM, so who knows, it may come). To answer that question we need to re-run diffuse-board while peeking at the inner workings through a profiler. So now I'll start it like you see below, then connect the profiler and let it perform the instrumentation, once thats done I hit ENTER and the actual profiling begins. **IMPORTANT** Do this in the Inferior Lisp buffer and not SLIME.
user=> (let [dd (new-array 80 80)] (read-line) (diffuse-board dd 1 2 3)) ;; ;; hit ENTER 1 time to evaluate the code, then attach the profiler! Profiler Agent: JNI On Load Initializing... Profiler Agent: JNI OnLoad Initialized succesfully Profiler Agent: Waiting for connection on port 5140 (Protocol version: 8) Profiler Agent: Established local connection with the tool Profiler Agent: 250 classes cached. Profiler Agent: 250 classes cached. Profiler Agent: 250 classes cached. Profiler Agent: Redefining 100 classes at idx 0, out of total 1403 Profiler Agent: Redefining 100 classes at idx 100, out of total 1403 Profiler Agent: Redefining 100 classes at idx 200, out of total 1403 ..... Profiler Agent: Redefining 3 classes at idx 1400, out of total 1403 ;; hit ENTER the 2.nd time to pass the (read-line) and diffuse the board! <double><double> ;; NOW STOP THE PROFILER
So now we can have a peek at diffuse-boards internals:
The first 4 lines look great, everything is named with its correct type Object[] for the array, int for the index etc, but at line #5 there's a problem. Numbers.add(object,object) means that 2 Clojure objects are being added after unboxing and this is eating 8% of my CPU time - I dont entirely trust the ms count as everything runs differently once the profiler is attached, but its a clear indication that something can be made faster!
Since it's Numbers.add which is complaining, the problem can only be in the (+ (+ left right) (+ above below)) line since this is the only place where addition is applied. Those 4 cells all get their values from aget! and since the board contains only doubles, the return is expected to be doubles right? Wrong. The problem is the calls to aget! in map!. We know that these return doubles (as the array contains only doubles), but like I said earlier no primitive can cross from one function to another, so Clojure is boxing it for me. A faster map! would then look like so:
(defmacro map! [f lookup] `(let [[w# h#] [(count ~lookup) (-> ~lookup first count)]] (let [board# (make-array Double/TYPE h# w#)] (do-board [w# h#] (let [~'above (double (aget! ~lookup (unchecked-dec ~'i) ~'j)) ~'below (double (aget! ~lookup (unchecked-inc ~'i) ~'j)) ~'left (double (aget! ~lookup ~'i (unchecked-dec ~'j))) ~'right (double (aget! ~lookup ~'i (unchecked-inc ~'j))) ~'self (double (aget! ~lookup ~'i ~'j))] (aset! board# ~'i ~'j ~f))) (set-bounds board# 0))))
And then we can roll out my trusty helper Criterium once again:
user> (let [dd (new-array 80 80)] (bench (diffuse-board dd 1 2 1))) Evaluation count : 93360 Execution time mean : 642.540222 us 95.0% CI: (642.427827 us, 642.661291 us) Execution time std-deviation : 553.156977 us 95.0% CI: (547.242040 us, 559.962465 us) Found 4 outliers in 60 samples (6.6667 %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 2 (3.3333 %) Variance from outliers : 9.4533 % Variance is slightly inflated by outliers
Wow, we're down from 2 milliseconds to 600 microseconds, 400% faster. Notice how much time is saved by wielding both JVisualVM and Criterium? Powerful cocktail.
This little helper is not heavy on board I/O since its basically just trimming edges, but almost every other function calls it at least once so it needs to be super fast. Here's a naive implementation:
(defn set-bounds [arr b] (let [N (int (- (count arr) (int 2))) n1 (unchecked-inc N) z (int 0) cswap! (fn [x y old p] (aset! arr x y (if p (- old) old)))] (loop [i (int 1)] (when (<= i N) (cswap! z i (aget! arr 1 i) (= b 1)) (cswap! n1 i (aget! arr N i) (= b 1)) (cswap! i z (aget! arr i 1) (= b 2)) (cswap! i n1 (aget! arr i N) (= b 2)) (recur (unchecked-inc i)))) (aset! arr z z (* 0.5 (+ (aget! arr 1 z) (aget! arr z 1)))) (aset! arr z n1 (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N)))) (aset! arr n1 z (* 0.5 (+ (aget! arr N z) (aget! arr n1 1)))) (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N)))) arr))
Seems very straight forward - I add a little garbage by defining an fn within the function itself, but its a small price for the simplified array access. Lets test-drive it:
user> (let [dd (new-array 80 80) sources (atom [[40 40]]) vv [(new-array 80 80) (new-array 80 80)]] (bench (set-bounds dd 0))) Evaluation count : 4980 Execution time mean : 12.193434 ms 95.0% CI: (12.192258 ms, 12.194447 ms) Execution time std-deviation : 1.322431 ms 95.0% CI: (1.284516 ms, 1.332446 ms) Found 6 outliers in 60 samples (10.0000 %) low-severe 3 (5.0000 %) low-mild 3 (5.0000 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
12 ms normally wouldn't be a cause of great concern, but in this case its terrible - We've just seen that an entire board can be scanned 5 times in 600 microseconds, so this shouldn't be slower than that. The problem, as you might have guessed, is that the integers passed to cswap! are being reflected, so lets give the compiler a hint:
(defn set-bounds [arr b] (let [N (int (- (count arr) (int 2))) n1 (unchecked-inc N) z (int 0) cswap! (fn [#^Integer x #^Integer y old p] (aset! arr x y (if p (- old) old)))] (loop [i (int 1)] (when (<= i N) (cswap! z i (aget! arr 1 i) (= b 1)) (cswap! n1 i (aget! arr N i) (= b 1)) (cswap! i z (aget! arr i 1) (= b 2)) (cswap! i n1 (aget! arr i N) (= b 2)) (recur (unchecked-inc i)))) (aset! arr z z (* 0.5 (+ (aget! arr 1 z) (aget! arr z 1)))) (aset! arr z n1 (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N)))) (aset! arr n1 z (* 0.5 (+ (aget! arr N z) (aget! arr n1 1)))) (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N)))) arr)) user> (let [dd (new-array 80 80) sources (atom [[40 40]]) vv [(new-array 80 80) (new-array 80 80)]] (bench (set-bounds dd 0))) WARNING: Final GC required 1.3085841065177402 % of runtime Evaluation count : 3449880 Execution time mean : 17.289233 us 95.0% CI: (17.288072 us, 17.290026 us) Execution time std-deviation : 24.574420 us 95.0% CI: (24.128797 us, 24.812547 us) Found 11 outliers in 60 samples (18.3333 %) low-severe 3 (5.0000 %) low-mild 8 (13.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Nice - Down from 12.2 milliseconds to 17 microseconds, incredible speedup. But I cant help but wonder how much faster it would be if let cswap! be a macro instead:
(defmacro cswap! [arr x y old p] `(aset! ~arr ~x ~y (if ~p (- ~old) ~old))) (defn set-bounds [arr b] (let [N (int (- (count arr) (int 2))) n1 (unchecked-inc N) z (int 0)] (loop [i (int 1)] (when (<= i N) (cswap! arr z i (aget! arr 1 i) (= b 1)) (cswap! arr n1 i (aget! arr N i) (= b 1)) (cswap! arr i z (aget! arr i 1) (= b 2)) (cswap! arr i n1 (aget! arr i N) (= b 2)) (recur (unchecked-inc i)))) (aset! arr z z (* 0.5 (+ (aget! arr 1 z) (aget! arr z 1)))) (aset! arr z n1 (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N)))) (aset! arr n1 z (* 0.5 (+ (aget! arr N z) (aget! arr n1 1)))) (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N)))) arr)) user> (let [dd (new-array 80 80) sources (atom [[40 40]]) vv [(new-array 80 80) (new-array 80 80)]] (bench (set-bounds dd 0))) WARNING: Final GC required 1.784657538184101 % of runtime Evaluation count : 4821600 Execution time mean : 12.454852 us 95.0% CI: (12.454578 us, 12.455117 us) Execution time std-deviation : 14.294381 us 95.0% CI: (14.249554 us, 14.361445 us) Found 3 outliers in 60 samples (5.0000 %) low-severe 1 (1.6667 %) low-mild 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
The macro brings it down to 12.45 microseconds about 9794x faster than the unhinted version, and 1.4x faster than the hinted version. In a sim like this all the little numbers quickly add up so its important to keep it as fast as possible at every turn.
The last little helper is do-board. This is quite similar to how you would think about dotimes, but I added it to avoid corner cases, so it starts from 1 and stops at N-1, while all the timing exposing i and j as representatives for the current x,y coordinates.
(defmacro do-board [[w h] & body] `(let [w# (int (dec ~w)) h# (int (dec ~h))] (loop [~'i (int 1)] (when (< ~'i w#) (loop [~'j (int 1)] (when (< ~'j h#) (do ~@body) (recur (unchecked-inc ~'j)))) (recur (unchecked-inc ~'i))))))
Given what we now know about primitives there shouldn't be any surprises and this can walk an 80x80 board and look-up every cell at about 90 microseconds.
Now that we have implemented diffusion, we should be able to add a fluid source to our board and watch if slowly diffuse unto the entire board. Initially I tested this just using an Ascii art renderer, but as I quickly learned (And Per repeatedly mentioned) your tools for testing and debugging are of the ultimate importance. For the same reason I've added a Swing frontend which renders the board and allows single clicks to add a perpetual source. If the addition of the fluid was just a one-off thing, it would quickly fade to black. Here's what it looks like:
I clicked once in each corner and the fluids keeping pumping into the sim, until they finally meet at the middle and then proceed to whiten the entire board. Thats diffusion.
Advection is where we start to get a feel of the physical dynamics, in that it takes the fluid (density) and moves along a grid of force vectors. For the sake of performance I've also used a multi-dim array of doubles to represent the forces - That means 1 array for the vertical forces and another for the horizontal. The math is simple is somewhat simple and the only really concern is that now densities might interact with neighbors farther away than 1 cell, so I narrow the width of the field I'm working in with 1 additional pixel.
(defn advection [densities velocities b] (let [sz (count densities) sz1 (- sz (int 2)) retr (new-array sz sz) half (double 0.5) [u v] velocities] (do-board [sz sz] (let [x (- (int i) (* sz1 (double (aget! u i j)))) y (- (int j) (* sz1 (double (aget! v i j)))) x (cond (< x half) half (> x (+ half sz1)) (unchecked-dec sz1) :else (double x)) y (cond (< y half) half (> y (+ half sz1)) (unchecked-dec sz1) :else (double y)) i0 (int x) i1 (unchecked-inc i0) j0 (int y) j1 (unchecked-inc j0) s1 (double (- x i0)) s0 (double (- (int 1) s1)) t1 (double (- y j0)) t0 (double (- (int 1) t1))] (aset! retr i j (+ (* s0 (+ (* t0 (double (aget! densities i0 j0))) (* t1 (double (aget! densities i0 j1))))) (* s1 (+ (* t0 (double (aget! densities i1 j0))) (* t1 (double (aget! densities i1 j1))))))))) (set-bounds retr b)))
I think this is about as fast as it can go, but if you've noticed something which can be optimized please let me know. One key to speed when working with primitive ints is the unchecked-* functions which go quite a bit faster than your regular increment, decrement, multiply, divide, subtract, negate, remainder and add functions:
user> (let [x (int 1)] (time (dotimes [i 1e9] (inc x))) (time (dotimes [i 1e9] (unchecked-inc x)))) "Elapsed time: 5047.601382 msecs" "Elapsed time: 3520.41237 msecs"
With advection in place, its a simple matter of combining that with diffusion to provide a solution to the density equation:
(defn density-step [d v sources] (when-let [s (seq @sources)] (swap! d add-sources s)) (swap! d diffuse-board 0 1 1) (swap! d advection v 0))
To test advection I've made a few different velocity initializers that apply some kind of force which I can then see the effect of. In the picture below I've used (up-array) which sets all forces to point upwards, so clicking at the bottom make a cone of smoke rise:
The final piece of the puzzle is solving the velocity equation, which fortunately can be done almost entirely using the functions you've seen above. The exception is the fact that the velocity grid actually moves along itself, so that a right-ward force is pushed across the field until fading out and disappearing - That process is called projection. If you email me claming that it could be implemented more elegantly than what I've shown here I won't put up a fight - This works, but it neither elegant nor beautiful
(defn project [velocities] (let [[u v] velocities [w h] [(count u) (-> u first count)] r (double (/ 1.0 h)) p (new-array w h) div (new-array w h) half (double 0.5)] (do-board [w h] (aset! div i j (* (double -0.5) r (+ (- (double (aget! u (unchecked-inc i) j)) (double (aget! u (unchecked-dec i) j))) (- (double (aget! v i (unchecked-inc j))) (double (aget! v i (unchecked-dec j)))))))) (let [div (set-bounds div 0) p (set-bounds p 0)] (do-board [w h] (aset! p i j (/ (+ (+ (double (aget! div i j)) (double (aget! p (unchecked-dec i) j))) (+ (double (aget! p (unchecked-inc i) j)) (double (aget! p i (unchecked-dec j)))) (double (aget! p i (unchecked-inc j)))) (double 4)))) (let [p (set-bounds p 0)] (do-board [w h] (aset! u i j (- (double (aget! u i j)) (/ (* half (- (double (aget! p (unchecked-inc i) j)) (double (aget! p (unchecked-dec i) j)))) r))) (aset! v i j (- (double (aget! v i j)) (/ (* half (- (double (aget! p i (unchecked-inc j))) (double (aget! p i (unchecked-dec j))))) r)))) [(set-bounds u 1) (set-bounds v 2)]))))
Fortunately, it doesn't need to be either of those things in order to work, it just has to be fast:
user> (let [v [(new-array 80 80) (new-array 80 80)]] (bench (project v))) Evaluation count : 10560 Execution time mean : 5.728711 ms 95.0% CI: (5.727787 ms, 5.729790 ms) Execution time std-deviation : 1.725619 ms 95.0% CI: (1.697002 ms, 1.759226 ms) Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 10.9791 % Variance is moderately inflated by outliers
And clocking in at 5.7 milliseconds per iteration of an 80x80 board I'd say its pretty fast.
Now the density-step has already been shown and the velocity step is a trivial extension of that. Because I need both 1 thread to keep advancing the discrete timeline and another to do the rendering, I'll define yet another macro to simpify this work. This will run until the global atom running is set to false, which it automatically is when the Swing UI is closed:
(defmacro defworker [name args & body] " Defines a worker which discontinues looping his main body if the global atom 'running' is non-true. Exceptions are caught and printed " `(defn ~name ~args (while @running (try ~@body (catch Exception e# (-> e# .getMessage println))))))
When running things in Threads, Clojure tends to fail silently (at least run from SLIME), meaning that you can have processes that break down and die but never tell you about it. By adding the try/catch combo to this macro, you will see a println if something breaks (Thanks Hiredman). The rendering and physics are then plugged in like so:
(defn density-step [d v sources] (when-let [s (seq @sources)] (swap! d add-sources s)) (swap! d diffuse-board 0 1 1) (swap! d advection v 0)) (defn velocity-step [velocities] (let [drefv @velocities [u v] drefv] (reset! velocities [(diffuse-board u 1 1 1) (diffuse-board v 2 1 1)]) (swap! velocities project) (reset! velocities [(advection u drefv 1) (advection v drefv 2)]) (swap! velocities project))) (defworker dynamics [densities velocities sources panel] (velocity-step velocities) (density-step densities @velocities sources)) (defworker render [panel fps] (.repaint panel) (Thread/sleep (* 1000 (/ 1 fps))))
There's nothing fancy about the UI implementation so I'll not explain its implementation. Suffice it to say that you can drag your mouse around the surface to add densitity and force to the simulation. A single click (no motion) adds a perpetual source which constantly emits fluids. A Right-Click renders the velocity grid on top of the densities. But before you go play, lets see how we did performance-wise and ask Criterium to give us the grand-total:
user> (let [dd (atom (new-array 80 80)) sources (atom [[40 40]]) vv (atom [(new-array 80 80) (new-array 80 80)])] (bench (do (density-step dd @vv sources) (velocity-step vv)) :verbose)) i386 Linux 2.6.31-20-generic 2 cpu(s) Java HotSpot(TM) Server VM 14.1-b02 Runtime arguments: -Xms256m -Xmx256m -Dcom.sun.management.jmxremote -Djava.library.path=/home/lau/coding/lisp/tools/penumbra/native/linux/x86/ Evaluation count : 2280 Execution time mean : 26.436417 ms 95.0% CI: (26.435481 ms, 26.437417 ms) Execution time std-deviation : 770.305046 us 95.0% CI: (768.461031 us, 792.481366 us) Found 3 outliers in 60 samples (5.0000 %) low-severe 1 (1.6667 %) low-mild 2 (3.3333 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers nil user> (float (/ 1000 26.43)) 37.835793
There are really cool ways in which you can extend this to run across several processors, but for a single-core Lisp implementation I think 37 FPS is pretty good. Granted there are finer levels of detail you can apply to this algorithm. If you read Mr. Stams paper you'll notice that he applied something called an Gauss-Seidel relaxation twice, which basically means re-run the inner loop in Diffuse-board and the middle loop in Project 20 or so times, which I gracefully skipped. This is what it looks like:
Clojure can be very fast, but the lack of reflection warning and silent failures in threads are annoying. Thankfully Criterium and JVisualVM provide great insight into how the code compiles and with a little tweaking it can be made to go extremely fast. If you're interested in Fluid Dynamics I highly recommend reading Jos Stams paper, after which you can extend these principles to 3D space as well - the possibilities are really endless, but all of them are cool to watch.
I would like to thank Chouser, CGrand and Per Vognsen for sparring with me on this! Thanks for reading!
Code: here