Functional Fluid Dynamics in Clojure

2010-03-29 20:00:28

Yes you read that right. Did you think Fluid Dynamic sims were reserved for the C++ crowd? In this blogpost I'll take a stab at this CPU intensive exercise wielding both Criterium, JVisualVM Profiler and a former Epic Games Engine developer and show how Functional Fluid Dynamic simulations are very possible using Clojure.


Fluid Dynamics equations are notoriously hard to solve. Try to attack that problem from a High Level language like Clojure and you've doubled your trouble. Fortunately for me Per Vognsen could spare a few moments of his time during last week to help me sort out some of the more complicated issues. He has previously worked as an Engine Developer for Epic Games as well as Technology Engineer at NVIDIA - So suffice it to say, this wasn't a first for him. Per said, that Fluid Dynamics starts with reading a paper on the subject by Jos Stam, said to be the best FD guy in the business.

His paper is incredibly approachable even for a non-game developer liFke me and provides source code snippets in the archaic C language. The weird thing about C is that its not even prefixed like Clojure (+ 2 2), instead all operators are scattered between the arguments! Since this is heavy duty math, I then had to go look up the precedence rules for math operators in C and convert it into beautiful prefixed Lisp notation.

Functional Fluid Dyanmics?

Well yes in a way. There are many definitions of what Functional Programming is and Rich Hickey says that an FP language stresses "immutable datastructures and pure functions" -  I can give you half. Almost every part of this blogpost is designed to tweak mutable data for optimal performance, but the functions are all pure in that they take a mutable thing and return another mutable thing - The key is, the return values are always used and no side-effects are employed.


Navier-Stokes is a set of equations which describe how densities (the fluid) and velocities (the forces) move and react in a confined grid. The entire concept of Fluid Dynamics in computer graphics (as opposed to accurate analysis) can be cooked down to 3 disciplines:

We'll walk through these one at a time and finally combine them into a working simulator with a Swing GUI. But before we do, there's the tiny detail of performance.


As you all know, Clojure defaults to persistent immutable datastructures but in a way which is very fast. However... Its not nearly fast enough. So we'll just drop down a notch and use transients right? Wrong, I tried transients and they're much too slow. Since performance is crucial for this simulation I'm digging directly into Javas Arrays. And this is where the Clojure compiler shows its young age, as it provides little to no help.

Array Access

When you're working with Java Arrays there is a performance killer which will slow a routine of a few milliseconds down to several seconds, its called reflection and it happens whenever the compiler needs to go look up which type of object you're sending its way. Normally you can initialize your programs with

(set! *warn-on-reflection* true)

But as Christophe Grand points out in his fantastic blogpost on the matter, the Clojure compiler doesn't support this type of reflection warning... yet. Fortunately Christophe provides 2 very good macros which take away much of the pain of this Java interop, but while his macros are good they are too advanced for a fluid-sim primarily because I know which types I'm sending in every single time so I've adopted them in a 10% faster form, but now hardcoded for 2 indices of the int type and one value of the double type.

(defmacro aget!
  ([array y]      `(aget ~(vary-meta array assoc :tag 'doubles) ~y))
  ([array x & ys] `(let [a# (aget ~(vary-meta array assoc :tag 'objects) ~@ys)]
                     (aget! a# ~x))))

(defmacro aset! [array x y v]
  (let [nested-array `(aget ~(vary-meta array assoc :tag 'objects) ~y)
        a-sym         (with-meta (gensym "a") {:tag 'doubles})]
    `(let [~a-sym ~nested-array]
       (aset ~a-sym ~x (double ~v)))))

In case you're wondering why everything is done with macros its simply this: All Clojure functions return some kind of boxed object and if we're ever going to be fast enough we need to work with primitives, ie. no boxing. Macros are a developers best friend.


In response to a previous blogpost I did on Benchmarking, Hugo Duncan mentioned that he was working on a Clojure version of Haskells Criterion, called Criterium and this has been a friend indeed. Criterium benchmarks your functions, measures variance and much much more and its super simple to use.


Diffusion is the most used function in a FD sim, because both densities and forces diffuse, so the function is applied both in solving the density equation and the velocity equation. The main principle is quite simple: In every time-step a cell gives away a portion of its density to its 4 closest neighbors as well as receives part of theirs. Visualized like so:


So basically you have this formula where you add the values of the 5 cells in question multiply the neighboring values by some diffusion factor relative to the boards size and then you divide that number with a constant - Very simple! The trick to being fast is then abstracting away as much of the board I/O as possible, so that I can keep the logic simple and the low-level functions fast. So lets just imagine that we have a macro which takes a board, the description of a cell, like (+ left right) and a lookup board for referencing the neighbor-values as its arguments and then applies that to our board, then we could describe diffusion as such:

(defn diffuse-board
  [board b diffusion dt]
  (let [sz         (count board)
        dr         (double (* dt diffusion sz sz))
        dconst     (double (* 4 dr))]
    (map! (/ (+ self (* dr (+ (+ above below) (+ left right))))
             dconst) board)))

First of all, you should know that the + operator works different depending on the number of arguments. 2 arguments might inline where more will call reduce instead:

user> (do (time (dotimes [i 1e6] (+ 1 2 3 4)))
          (time (dotimes [i 1e6] (+ (+ 1 2) (+ 3 4)))))
"Elapsed time: 496.621851 msecs"
"Elapsed time: 171.294196 msecs"

Note also that I'm coercing a couple of values to doubles, this is to avoid boxing. Although this description of the diffusion algorithm is simple, its also accurate so if we can just make map! fast enough it would be the best way to go. Here's map!:

(defmacro map! [f lookup]
  `(let [[w# h#] [(count ~lookup) (-> ~lookup first count)]]
     (let [board# (make-array Double/TYPE h# w#)]
       (do-board [w# h#]
                 (let [~'above  (aget! ~lookup (unchecked-dec ~'i) ~'j)
                       ~'below  (aget! ~lookup (unchecked-inc ~'i) ~'j)
                       ~'left   (aget! ~lookup ~'i (unchecked-dec ~'j))
                       ~'right  (aget! ~lookup ~'i (unchecked-inc ~'j))
                       ~'self   (aget! ~lookup ~'i ~'j)]
                   (aset! board# ~'i ~'j ~f)))
         (set-bounds board# 0))))

As you can see, map! calls the still undefined do-board which walks over all the cells of our array, exposing x and y values as i and j respectively. map! then looks up neighbors and exposes their values as above,below,left,right respectively. Set-Bounds is just a helper function, which makes sure that the borders of the board are kept intact. So we're all set to roll out Criterium and see how fast this puppy goes:

user> (let [dd (new-array 80 80)]
           (bench (diffuse-board dd 1 2 1)))
Evaluation count             : 29880
Execution time mean          : 2.005787 ms  95.0% CI: (2.005737 ms, 2.005829 ms)
Execution time std-deviation : 115.162227 us  95.0% CI: (113.903283 us, 116.213038 us)

Found 2 outliers in 60 samples (3.3333 %)
        low-severe       1 (1.6667 %)
        low-mild         1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

It averages at about 2 milliseconds and that result is 95% undisturbed by variance (good), with only 2 outliers in 60 samples. This is a very good result and similar in speed to Christophes Brians Brain sim, which also did 4 neighbor look-ups per cell in about 2 ms. But while Criterium tells us how fast an algorithm is, it doesn't tell us how fast it can be (though Hugo has been seen downloading the source to JVisualVM, so who knows, it may come). To answer that question we need to re-run diffuse-board while peeking at the inner workings through a profiler. So now I'll start it like you see below, then connect the profiler and let it perform the instrumentation, once thats done I hit ENTER and the actual profiling begins. **IMPORTANT** Do this in the Inferior Lisp buffer and not SLIME.

user=> (let [dd (new-array 80 80)]
          (diffuse-board dd 1 2 3)) ;; 
;; hit ENTER 1 time to evaluate the code, then attach the profiler!
Profiler Agent: JNI On Load Initializing...
Profiler Agent: JNI OnLoad Initialized succesfully
Profiler Agent: Waiting for connection on port 5140 (Protocol version: 8)
Profiler Agent: Established local connection with the tool
Profiler Agent: 250 classes cached.
Profiler Agent: 250 classes cached.
Profiler Agent: 250 classes cached.
Profiler Agent: Redefining 100 classes at idx 0, out of total 1403
Profiler Agent: Redefining 100 classes at idx 100, out of total 1403
Profiler Agent: Redefining 100 classes at idx 200, out of total 1403
Profiler Agent: Redefining 3 classes at idx 1400, out of total 1403

;; hit ENTER the 2.nd time to pass the (read-line) and diffuse the board!


So now we can have a peek at diffuse-boards internals:


The first 4 lines look great, everything is named with its correct type Object[] for the array, int for the index etc, but at line #5 there's a problem. Numbers.add(object,object) means that 2 Clojure objects are being added after unboxing and this is eating 8% of my CPU time - I dont entirely trust the ms count as everything runs differently once the profiler is attached, but its a clear indication that something can be made faster!

Since it's Numbers.add which is complaining, the problem can only be in the (+ (+ left right) (+ above below)) line since this is the only place where addition is applied. Those 4 cells all get their values from aget! and since the board contains only doubles, the return is expected to be doubles right? Wrong. The problem is the calls to aget! in map!. We know that these return doubles (as the array contains only doubles), but like I said earlier no primitive can cross from one function to another, so Clojure is boxing it for me. A faster map! would then look like so:

(defmacro map! [f lookup]
  `(let [[w# h#] [(count ~lookup) (-> ~lookup first count)]]
     (let [board# (make-array Double/TYPE h# w#)]
       (do-board [w# h#]
                 (let [~'above  (double (aget! ~lookup (unchecked-dec ~'i) ~'j))
                       ~'below  (double (aget! ~lookup (unchecked-inc ~'i) ~'j))
                       ~'left   (double (aget! ~lookup ~'i (unchecked-dec ~'j)))
                       ~'right  (double (aget! ~lookup ~'i (unchecked-inc ~'j)))
                       ~'self   (double (aget! ~lookup ~'i ~'j))]
                   (aset! board# ~'i ~'j ~f)))
         (set-bounds board# 0))))

And then we can roll out my trusty helper Criterium once again:

user> (let [dd (new-array 80 80)]
           (bench (diffuse-board dd 1 2 1)))
Evaluation count             : 93360
Execution time mean          : 642.540222 us  95.0% CI: (642.427827 us, 642.661291 us)
Execution time std-deviation : 553.156977 us  95.0% CI: (547.242040 us, 559.962465 us)

Found 4 outliers in 60 samples (6.6667 %)
        low-severe       1 (1.6667 %)
        low-mild         1 (1.6667 %)
        high-mild        2 (3.3333 %)
 Variance from outliers : 9.4533 % Variance is slightly inflated by outliers

Wow, we're down from 2 milliseconds to 600 microseconds, 400% faster. Notice how much time is saved by wielding both JVisualVM and Criterium? Powerful cocktail.


This little helper is not heavy on board I/O since its basically just trimming edges, but almost every other function calls it at least once so it needs to be super fast. Here's a naive implementation:

(defn set-bounds [arr b]
  (let [N      (int (- (count arr) (int 2)))
        n1     (unchecked-inc N)
        z      (int 0)
        cswap! (fn [x y old p]
                 (aset! arr x y (if p (- old) old)))]
    (loop [i (int 1)]
      (when (<= i N)
        (cswap! z  i (aget! arr 1 i) (= b 1))
        (cswap! n1 i (aget! arr N i) (= b 1))
        (cswap! i  z (aget! arr i 1) (= b 2))
        (cswap! i n1 (aget! arr i N) (= b 2))
        (recur (unchecked-inc i))))
    (aset! arr z z   (* 0.5 (+ (aget! arr 1 z)  (aget! arr z 1))))
    (aset! arr z n1  (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N))))
    (aset! arr n1 z  (* 0.5 (+ (aget! arr N z)  (aget! arr n1 1))))
    (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N))))

Seems very straight forward - I add a little garbage by defining an fn within the function itself, but its a small price for the simplified array access. Lets test-drive it:

user> (let [dd     (new-array 80 80)
           sources (atom [[40 40]])
            vv     [(new-array 80 80) (new-array 80 80)]]
        (bench (set-bounds dd 0)))
Evaluation count             : 4980
Execution time mean          : 12.193434 ms  95.0% CI: (12.192258 ms, 12.194447 ms)
Execution time std-deviation : 1.322431 ms  95.0% CI: (1.284516 ms, 1.332446 ms)

Found 6 outliers in 60 samples (10.0000 %)
        low-severe       3 (5.0000 %)
        low-mild         3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

12 ms normally wouldn't be a cause of great concern, but in this case its terrible - We've just seen that an entire board can be scanned 5 times in 600 microseconds, so this shouldn't be slower than that. The problem, as you might have guessed, is that the integers passed to cswap! are being reflected, so lets give the compiler a hint:

(defn set-bounds [arr b]
  (let [N      (int (- (count arr) (int 2)))
        n1     (unchecked-inc N)
        z      (int 0)
        cswap! (fn [#^Integer x #^Integer y old p]
                 (aset! arr x y (if p (- old) old)))]
    (loop [i (int 1)]
      (when (<= i N)
        (cswap! z  i (aget! arr 1 i) (= b 1))
        (cswap! n1 i (aget! arr N i) (= b 1))
        (cswap! i  z (aget! arr i 1) (= b 2))
        (cswap! i n1 (aget! arr i N) (= b 2))
        (recur (unchecked-inc i))))
    (aset! arr z z   (* 0.5 (+ (aget! arr 1 z)  (aget! arr z 1))))
    (aset! arr z n1  (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N))))
    (aset! arr n1 z  (* 0.5 (+ (aget! arr N z)  (aget! arr n1 1))))
    (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N))))

user> (let [dd     (new-array 80 80)
           sources (atom [[40 40]])
            vv     [(new-array 80 80) (new-array 80 80)]]
        (bench (set-bounds dd 0)))
WARNING: Final GC required 1.3085841065177402 % of runtime
Evaluation count             : 3449880
Execution time mean          : 17.289233 us  95.0% CI: (17.288072 us, 17.290026 us)
Execution time std-deviation : 24.574420 us  95.0% CI: (24.128797 us, 24.812547 us)

Found 11 outliers in 60 samples (18.3333 %)
        low-severe       3 (5.0000 %)
        low-mild         8 (13.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Nice - Down from 12.2 milliseconds to 17 microseconds, incredible speedup. But I cant help but wonder how much faster it would be if let cswap! be a macro instead:

(defmacro cswap! [arr x y old p]
  `(aset! ~arr ~x ~y (if ~p (- ~old) ~old)))

(defn set-bounds [arr b]
  (let [N      (int (- (count arr) (int 2)))
        n1     (unchecked-inc N)
        z      (int 0)]
    (loop [i (int 1)]
      (when (<= i N)
        (cswap! arr z  i (aget! arr 1 i) (= b 1))
        (cswap! arr n1 i (aget! arr N i) (= b 1))
        (cswap! arr i  z (aget! arr i 1) (= b 2))
        (cswap! arr i n1 (aget! arr i N) (= b 2))
        (recur (unchecked-inc i))))
    (aset! arr z z   (* 0.5 (+ (aget! arr 1 z)  (aget! arr z 1))))
    (aset! arr z n1  (* 0.5 (+ (aget! arr 1 n1) (aget! arr z N))))
    (aset! arr n1 z  (* 0.5 (+ (aget! arr N z)  (aget! arr n1 1))))
    (aset! arr n1 n1 (* 0.5 (+ (aget! arr N n1) (aget! arr n1 N))))

user> (let [dd     (new-array 80 80)
            sources (atom [[40 40]])
            vv     [(new-array 80 80) (new-array 80 80)]]
        (bench (set-bounds dd 0)))
WARNING: Final GC required 1.784657538184101 % of runtime
Evaluation count             : 4821600
Execution time mean          : 12.454852 us  95.0% CI: (12.454578 us, 12.455117 us)
Execution time std-deviation : 14.294381 us  95.0% CI: (14.249554 us, 14.361445 us)

Found 3 outliers in 60 samples (5.0000 %)
        low-severe       1 (1.6667 %)
        low-mild         2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

The macro brings it down to 12.45 microseconds about 9794x faster than the unhinted version, and 1.4x faster than the hinted version. In a sim like this all the little numbers quickly add up so its important to keep it as fast as possible at every turn.


The last little helper is do-board. This is quite similar to how you would think about dotimes, but I added it to avoid corner cases, so it starts from 1 and stops at N-1, while all the timing exposing i and j as representatives for the current x,y coordinates.

(defmacro do-board [[w h] & body]
  `(let [w# (int (dec ~w)) h# (int (dec ~h))]
     (loop [~'i (int 1)]
       (when (< ~'i w#)
         (loop [~'j (int 1)]
           (when (< ~'j h#)
             (do ~@body)
             (recur (unchecked-inc ~'j))))
         (recur (unchecked-inc ~'i))))))

Given what we now know about primitives there shouldn't be any surprises and this can walk an 80x80 board and look-up every cell at about 90 microseconds.

Test Drive #1

Now that we have implemented diffusion, we should be able to add a fluid source to our board and watch if slowly diffuse unto the entire board. Initially I tested this just using an Ascii art renderer, but as I quickly learned (And Per repeatedly mentioned) your tools for testing and debugging are of the ultimate importance. For the same reason I've added a Swing frontend which renders the board and allows single clicks to add a perpetual source. If the addition of the fluid was just a one-off thing, it would quickly fade to black. Here's what it looks like:


I clicked once in each corner and the fluids keeping pumping into the sim, until they finally meet at the middle and then proceed to whiten the entire board. Thats diffusion.


Advection is where we start to get a feel of the physical dynamics, in that it takes the fluid (density) and moves along a grid of force vectors. For the sake of performance I've also used a multi-dim array of doubles to represent the forces - That means 1 array for the vertical forces and another for the horizontal. The math is simple is somewhat simple and the only really concern is that now densities might interact with neighbors farther away than 1 cell, so I narrow the width of the field I'm working in with 1 additional pixel.

(defn advection [densities velocities b]
  (let [sz    (count densities)
        sz1   (- sz (int 2))
        retr  (new-array sz sz)
        half  (double 0.5)
        [u v] velocities]
    (do-board [sz sz]
      (let [x  (- (int i) (* sz1 (double (aget! u i j))))
            y  (- (int j) (* sz1 (double (aget! v i j))))
            x  (cond (< x half)        half
                     (> x (+ half sz1)) (unchecked-dec sz1)
                     :else (double x))
            y  (cond (< y half)        half
                     (> y (+ half sz1)) (unchecked-dec sz1)
                     :else (double y))
            i0  (int x)
            i1  (unchecked-inc i0)
            j0  (int y)
            j1  (unchecked-inc j0)
            s1  (double (- x i0))
            s0  (double (- (int 1) s1))
            t1  (double (- y j0))
            t0  (double (- (int 1) t1))]
        (aset! retr i j
               (+ (* s0 (+ (* t0 (double (aget! densities i0 j0)))
                           (* t1 (double (aget! densities i0 j1)))))
                  (* s1 (+ (* t0 (double (aget! densities i1 j0)))
                           (* t1 (double (aget! densities i1 j1)))))))))
    (set-bounds retr b)))

I think this is about as fast as it can go, but if you've noticed something which can be optimized please let me know. One key to speed when working with primitive ints is the unchecked-* functions which go quite a bit faster than your regular increment, decrement, multiply, divide, subtract, negate, remainder and add functions:

user> (let [x (int 1)]
           (time (dotimes [i 1e9] (inc x)))
           (time (dotimes [i 1e9] (unchecked-inc x))))
"Elapsed time: 5047.601382 msecs"
"Elapsed time: 3520.41237 msecs"

With advection in place, its a simple matter of combining that with diffusion to provide a solution to the density equation:

(defn density-step [d v sources]
  (when-let [s (seq @sources)]
   (swap! d add-sources s))
  (swap! d diffuse-board 0 1 1)
  (swap! d advection v 0))

To test advection I've made a few different velocity initializers that apply some kind of force which I can then see the effect of. In the picture below I've used (up-array) which sets all forces to point upwards, so clicking at the bottom make a cone of smoke rise:



The final piece of the puzzle is solving the velocity equation, which fortunately can be done almost entirely using the functions you've seen above. The exception is the fact that the velocity grid actually moves along itself, so that a right-ward force is pushed across the field until fading out and disappearing - That process is called projection. If you email me claming that it could be implemented more elegantly than what I've shown here I won't put up a fight - This works, but it neither elegant nor beautiful

(defn project [velocities]
  (let [[u v]    velocities
        [w h]    [(count u) (-> u first count)]
        r        (double (/ 1.0 h))
        p        (new-array w h)
        div      (new-array w h)
        half     (double 0.5)]
    (do-board [w h]
       (aset! div i j
              (* (double -0.5) r
                 (+ (- (double (aget! u (unchecked-inc i) j))
                       (double (aget! u (unchecked-dec i) j)))
                    (- (double (aget! v i (unchecked-inc j)))
                       (double (aget! v i (unchecked-dec j))))))))
    (let [div (set-bounds div 0)
          p   (set-bounds p   0)]
      (do-board [w h]
         (aset! p i j
                (/  (+ (+  (double (aget! div i j))
                           (double (aget! p (unchecked-dec i) j)))
                       (+  (double (aget! p (unchecked-inc i) j))
                           (double (aget! p i (unchecked-dec j))))
                       (double (aget! p i (unchecked-inc j))))
                   (double 4))))
      (let [p (set-bounds p 0)]
        (do-board [w h]
                  (aset! u i j
                            (double (aget! u i j))
                            (/  (* half
                                   (- (double (aget! p (unchecked-inc i) j))
                                      (double (aget! p (unchecked-dec i) j))))
                  (aset! v i j
                         (-  (double (aget! v i j))
                             (/  (* half
                                    (- (double (aget! p i (unchecked-inc j)))
                                       (double (aget! p i (unchecked-dec j)))))
        [(set-bounds u 1) (set-bounds v 2)]))))

Fortunately, it doesn't need to be either of those things in order to work, it just has to be fast:

user> (let [v [(new-array 80 80) (new-array 80 80)]]
          (bench (project v)))
Evaluation count             : 10560
Execution time mean          : 5.728711 ms  95.0% CI: (5.727787 ms, 5.729790 ms)
Execution time std-deviation : 1.725619 ms  95.0% CI: (1.697002 ms, 1.759226 ms)

Found 3 outliers in 60 samples (5.0000 %)
        low-severe       3 (5.0000 %)
 Variance from outliers : 10.9791 % Variance is moderately inflated by outliers

And clocking in at 5.7 milliseconds per iteration of an 80x80 board I'd say its pretty fast.

Putting the pieces together

Now the density-step has already been shown and the velocity step is a trivial extension of that. Because I need both 1 thread to keep advancing the discrete timeline and another to do the rendering, I'll define yet another macro to simpify this work. This will run until the global atom running is set to false, which it automatically is when the Swing UI is closed:

(defmacro defworker [name args & body]
  " Defines a worker which discontinues looping his main body
    if the global atom 'running' is non-true. Exceptions are
    caught and printed "
  `(defn ~name ~args
     (while @running
            (try ~@body (catch Exception e#
                          (-> e# .getMessage println))))))

When running things in Threads, Clojure tends to fail silently (at least run from SLIME), meaning that you can have processes that break down and die but never tell you about it. By adding the try/catch combo to this macro, you will see a println if something breaks (Thanks Hiredman). The rendering and physics are then plugged in like so:

(defn density-step [d v sources]
  (when-let [s (seq @sources)]
   (swap! d add-sources s))
  (swap! d diffuse-board 0 1 1)
  (swap! d advection v 0))

(defn velocity-step [velocities]
  (let [drefv @velocities
        [u v] drefv]
    (reset! velocities [(diffuse-board u 1 1 1) (diffuse-board v 2 1 1)])
    (swap!  velocities project)
    (reset! velocities [(advection u drefv 1)   (advection v drefv 2)])
    (swap!  velocities project)))

(defworker dynamics [densities velocities sources panel]
  (velocity-step velocities)
  (density-step  densities @velocities sources))

(defworker render [panel fps]
  (.repaint panel)
  (Thread/sleep (* 1000 (/ 1 fps))))

Functional Fluid Dynamics

There's nothing fancy about the UI implementation so I'll not explain its implementation. Suffice it to say that you can drag your mouse around the surface to add densitity and force to the simulation. A single click (no motion) adds a perpetual source which constantly emits fluids. A Right-Click renders the velocity grid on top of the densities. But before you go play, lets see how we did performance-wise and ask Criterium to give us the grand-total:

user> (let [dd     (atom (new-array 80 80))
           sources (atom [[40 40]])
            vv     (atom [(new-array 80 80) (new-array 80 80)])]
        (bench (do (density-step dd @vv sources)
                   (velocity-step vv)) :verbose))
i386 Linux 2.6.31-20-generic 2 cpu(s)
Java HotSpot(TM) Server VM 14.1-b02
Runtime arguments: -Xms256m -Xmx256m -Djava.library.path=/home/lau/coding/lisp/tools/penumbra/native/linux/x86/
Evaluation count             : 2280
Execution time mean          : 26.436417 ms  95.0% CI: (26.435481 ms, 26.437417 ms)
Execution time std-deviation : 770.305046 us  95.0% CI: (768.461031 us, 792.481366 us)

Found 3 outliers in 60 samples (5.0000 %)
        low-severe       1 (1.6667 %)
        low-mild         2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
user> (float (/ 1000 26.43))

There are really cool ways in which you can extend this to run across several processors, but for a single-core Lisp implementation I think 37 FPS is pretty good. Granted there are finer levels of detail you can apply to this algorithm. If you read Mr. Stams paper you'll notice that he applied something called an Gauss-Seidel relaxation twice, which basically means re-run the inner loop in Diffuse-board and the middle loop in Project 20 or so times, which I gracefully skipped. This is what it looks like:


Clojure can be very fast, but the lack of reflection warning and silent failures in threads are annoying. Thankfully Criterium and JVisualVM provide great insight into how the code compiles and with a little tweaking it can be made to go extremely fast. If you're interested in Fluid Dynamics I highly recommend reading Jos Stams paper, after which you can extend these principles to 3D space as well - the possibilities are really endless, but all of them are cool to watch.

I would like to thank Chouser, CGrand and Per Vognsen  for sparring with me on this! Thanks for reading!

Code: here

Glen Stampoultzis
2010-03-30 11:14:25
Great work.  It's a shame you have to use so many tricks to get good numeric performance.  Hopefully some of that can be improved going forward.
2010-03-30 11:15:30
Thanks Glen,

The tricks are to be expected from such a young compiler - What I hope people notice is that using macros almost all of that can be abstracted away. I might do a follow up post later, where I actually do that :)
Ruben Berenguel
2010-03-30 12:43:35
Hi, I copy here a comment I posted on reddit, just in case you won't read it there, because I really want to know one detail.

Quoting my own comment from reddit
"Quite a good job on implementation. The problem: I didn't read it in the text, but the images show 80x80 arrays. And I won't put it bluntly... But 80 is quite a small number in a discretisation. The only discretisation problem I solved ever (as an assignment for a course) was at least 1000x1000, and this was just because the teacher didn't want us to bother about specific linear algebra decompositions.

Your work is good, and you have shown that it is possible to do it in clojure... It just won't scale where C can take you when you approach work-scale discretisations, or non-uniform meshes.

But as I have already said two times... It is a good work what you have done.


2010-03-30 13:18:43
Hey Ruben,

The 80x80 board is small and I'm working off a laptop with only 256 MB RAM dedicated to this experiment, so your mileage will vary. Secondly, if you ask Zach Tellman author of Penumbra the OpenGL integration for Clojure, I'm quite sure that he could have this pulling 60+ FPS on a 2000x2000 grid without any problems. Thirdly, if anything, scaling is where Clojure earns its stars as this will go beyond anything C can offer you.
Mark Hoemmen
2010-03-30 17:11:22
A little spelling mistake:  "Gauss-Seidel".

If you are interested in pursuing this further, i would recommend looking at recent work on domain-specific languages and compilers for stencil computations.  See, for example, Shoaib Kamil's recent  papers:

i would also recommend looking at the SISAL compiler:

which was specifically designed for high-performance scientific computations.
Sam Aaron
2010-03-30 17:26:47
Have you considered offshoring some of the calculations to the graphics card? This might be an interesting place to start looking:
2010-03-30 17:27:48
Hey Sam,

Of course - The Brians Brain sim which ran at 1.6ms 80x80 on the CPU, goes at 1.6ms 2000x2000 on the GPU - The only thing holding me back, is my GPU - Or lack there of :)
Sam Aaron
2010-03-30 22:26:45
Lau, cool. I think it will be exciting times when we can start to rely upon GPUs for certain types of calculation.

BTW, I'm only slowly starting to work my way through this dense but highly interesting article and here's the first of my questions (I hope that you don't mind)...

In your diffuse-board implementation, you define the param list as:   [board b diffusion dt]. I can see board, diffusion and dt being used within the function, but not b. Am I missing something here?

Also, within the same function, where would you expect the vars self, above, below, left and right to be bound?
2010-03-30 22:35:32
Hey Sam,

I'm really looking forward to unleashing the GPU myself.

I don't mind questions at all but some of the answers might be a little embarrassing as it shows that this is my first FD sim. The algorithm driving diffuse-board is almost directly reused later in the code, so I started out with a more general solution, but because I wanted to show out some of the handyness of macros, ie map!, i ended up only using it for diffuse-board as its the only function which has predictable neighbor lookups - b then became garbage which I should have removed (it was previous passed to set-bounds). The ~' notation in the macro, means that above,left,right,below become literal symbols within the caller. Try running (macroexpand-1 '(map! (+ left right) lookup) and you'll see how they're bound. The expansion is the code which the compiler sees.
Sam Aaron
2010-03-30 22:52:59
Hey, you should have absolutely no concern of embarrassment - this is really interesting stuff. 

My questions are those from a Clojure novice, so perhaps something that doesn't seem obvious to me is not to you :-)

I think I grok the ~’ notation, however I can't yet see where the symbols left, right etc are bound - i.e. in which macro. It seems that from the fn diffuse-board, they must be bound to values outside of the lexical scope of the function. Or maybe not? :-)
Sam Aaron
2010-03-30 22:54:02
Oops, I meant that perhaps something that doesn't seem obvious to me *is* obvious to you :-)
2010-03-30 22:59:19
I hope I'm understanding your question. Everything adheres to lexical scoping. Consider a simpler example

(defmacro outer [&amp; body]
  `(let [~'x 5]
     (do ~@body)))

(defn inner []
  (outer (+ x 5)))

&gt; 10

When the reader walks through that, inner compiles as:
(defn inner []
   (let [x 5]
      (+ x 5))

The macro (in the original case, <a href="" rel="nofollow">map!</a>) binds all the symbols, so by the time my code is calling (+ left right) those are bound within diffuse-boards scope  - The reason being, that the macro compiles to code (ie clojure-code, not bytecode). That was what I was hoping the macroexpand exercise would show you.

Hope that helps, if not feel free to keep pinging me until I understand the question :)
2010-04-16 21:47:12
Man, you're a genius. Thanks a lot!