beautiful-code-the-manifesto

In times past much has been said about beautiful code. Covering the topics of quality, correctness and expressiveness — more needs to be said. Opinions differ, so how can we judge what quality is and is not? I’ll try to outline some indicators which help pinpoint where on the spectrum your code lies and how to write beautiful code.



Preface

I’ve both written and commented on much code while running the blog. I’ve received both critique and praise. Through many debates it has become clear that we all view code and the quality thereof, differently. Having no common jargon, no definitive ‘language’ for describing code, conclusions are hardly ever reached. This makes for many unproductive discussions.

In as short a space as possible, I’ll try to describe the language which describes programming languages, setting forth a standard. I’ll try to keep it language agnostic, because beautiful code is not restricted to any one language.

” I once lived beside a forest, so I created a language to describe the trees. I recorded the paths of their branches, the patterns made on the ground by their fallen leaves. I mapped their growth, their life, their decline. ”

- The Alchemist, Zach Tellman

Like the Alchemist, we need a language wherein words and expressions take on meaning relevant for code. If you’ve read some of my previous posts you’ll know that I prefer concise code to bloated code, but is conciseness the ultimate goal? If not, we need to connect it with other qualities to get the full view. So lets starts with something tangible and work our way towards the engine room.

Short and to the point

Conciseness must first and foremost be weighed against clarity. By clarity I mean the ease of reading the code by a developer proficient in a given language. Clarity is essential for our micro-understanding of the project and affects the speed with which a team of developers work.

With every line of code comes a…

  • maintenance cost
  • development cost
  • degree of complexity
  • potential for failure/bugs

So every line of code is of course important as well as the number of lines. Counting lines can be fun while golfing, but it’s also a big factor in selecting which tool to use for the job. Using Clojure for outputting the contents of a directory would be overkill compared to shell-script for instance.

But we can’t go to the extreme of saying that we must then always use whatever tool which gives us the most concise code, since projects are their own eco-systems within an eco-system of developers, managers, designers, customers and more. So let’s try to balance the scales.

Example

Imagine a 20x20 grid where we need to compute the number of possible routes (without backtracking) to walk this grid from top left to bottom right. Mathematically, such a problem conforms to this equation

(2n)!
n!*n!

But it is also solvable by a manual gridwalk.

That’s a straight forward problem, which can be solved in a number of ways. If we set conciseness as the primary objective, forgetting the properties that it’s connected with, we come across the language k.

Solution in K:

 1. \p 0 f:*/1.+!: /factorial {f[x]%f[y]*f x-y}[40;20]

Both in counting lines and characters, it just does not get much shorter than that. But weighed against clarity we’ve made an expression which can scarcely be read by the k compiler itself. In a team of developers, code such as this will inevitably lead to misunderstandings, misreading but a single character dramatically changes your perception of the intent/algorithm.

Microsoft released C# (C-Sharp) several years ago to lend some competition to Java and to boost the speed of development. C# is the more concise and .Net integrated brother to C++. One C# developer didn’t believe that the above mentioned equation would compute as efficiently as a manual grid walk in C#, so he solved it in these 35 lines:

Solution in C#:

using System;

namespace Euler {
  class GridPaths {
    ulong[,] countCache = null;

    private GridPaths(int x, int y) {
      countCache = new ulong[x, y];
    }

    public static ulong Solve() {
      return Solve(20, 20);
    }

    public static ulong Solve(int x, int y) {
      GridPaths gp = new GridPaths(x, y);
      return gp.solve();
    }

    private ulong solve() {
      return countPaths(0, 0);
    }

    private ulong countPaths(int x, int y) {
      if (x > countCache.GetUpperBound(0) || y > countCache.GetUpperBound(1)) {
        return 1;
      } else { //not on edge
        if (countCache[x, y] == 0) { //count exists in cache?
          countCache[x, y] = countPaths(x + 1, y) + countPaths(x, y + 1);
        }

        return countCache[x, y];
      }
    }
  }
}
(note: I don’t agree with his performance concern, nor have I tested his code)

The actual logic of the code isn’t hard to follow, but using C# as the tool has added a huge degree of ceremony. For one thing, there’s no other reason to be implementing a class than the fact that this is the only place you can put methods and methods the only place where you can put code in C# — ditto C++ and Java. Besides the ‘class’ overhead, all methods add 2 – 3 lines of ceremony and all types must be explicitly declared.

So those were examples from both ends of the spectrum. It’s reasonable to conclude that the optimal tool and resulting code should be something which neither adds ceremony nor obfuscates our intention. In fact the code should as clearly as possible express just that: intent.

An example of that could be this Clojure snippet:

Solution in Clojure:

(reduce *
    (map / (range 21 41) (range 1 21)))

We have 2 lines, both of which are expressing intent. We have a clear compartmental understanding of our problem/code, greatly minimizing the possibility of error and maximizing our ability to reason about the behavior. Had that taken us 5 – 6 lines to get, it would have been worth it.

We have here 3 very different ways to solve the same problem. k is a mathematical language, not something you would build a software development project with, but I picked it up to demonstrate a principle. Both Clojure and C# are used in industry for both large and small projects.

In the C# example we are looking at a simple mathematical problem. In a real-world scenario this would only be a very small subset of the project as a whole and already we’re approaching 40 lines. It is then no wonder, that C# projects often times fall in the 20.000+ lines of code category — and with every one of those lines comes cost, complexity, probability of failure and again, cost.

Line-count makes a big difference and projects developed using functional programming are smaller than their imperative counterparts by an order of magnitude. Sony Ericsson started using a functional programming language for some of their projects and they saw a productivity increase ranging from 10x — 25x compared to their standard C development. That means worst case a project which is a year in development will only take about 1½ months, and best case 2 weeks. Their code-size shrunk somewhere between 2x — 10x, making it more maintainable. Beautiful.

Compartmentalized

But more important than line count is the compartmental understanding of the problem we are trying to solve. Imagine a problem where the challenge was this:

We have a set of assets that all carry a certain value. We want to walk through our assets calculating their ‘liquid value’ which is given by the production cost and a factor indicating supply/demand, so we know which products to start shipping.

When this task comes up in the first design session your charismatic manager draws out his plan on the white board stating how he wants the module structured:

flowchart

And that’s a certified flowchart, but we all know that although code might behave like that, it won’t look like it — riiight?. So how can we make this happen? An attempt in C# could look like this:

using System;
using System.Collections;

namespace liquidvalue
{
	class MainClass
	{
		public static void Main(string[] args, ArrayList assets)
		{
			ArrayList lvalues = new ArrayList();

		    foreach(AssetClass asset in assets) {
				double liquidValue = asset.SDFactor * asset.productionCost;

				if (liquidValue >= (asset.productionCost * 2))
					lvalues.Add(liquidValue);
			}

			double sumTotal = 0;

			foreach(double lvalue in lvalues) {
				sumTotal += lvalue;
			}

			Console.WriteLine(sumTotal.ToString());
		}
	}
}

This is a halfway house — almost compartmental. Where is the first item from the flowchart, the calculator? Well it’s in the first foreach loop. Where’s the filter? Also in the first foreach loop. And the accumulator is entirely in the last foreach. This is classical imperative programming. Make some lists, mutate them. Although a C# developer easily recognizes the code and can predict the resulting behavior the code is not naturally structured in a way which is analogous to the problem — hence the term ‘code’. The highest form of expression of the solution, would be to write a flow-chart compiler.

In Clojure we can express a solution like so:

(reduce #(+ %1 (:productioncost %2)) 0
     (filter #(> (:liquidvalue %) (* 2 (:productioncost %)))
           (map #(assoc % :liquidvalue (* (:sdfactor %) (:productioncost %))) assets)))

Or when properly refactored

(reduce (sum-field :productioncost) 0
     (filter doubleMargin?
           (map attach-liquid-value assets)))

But think about it. If most mainstream programming ends up looking totally different from flowcharts why do we keep drawing them? Because they provide the shortest route between our logical interpretation of the computing process and it’s written form, meaning flowcharts are intuitive to reason about. So it would be a huge benefit to code that way, because the code would be easier to reason about, therefore more sturdy and easier to maintain. Have a look at how C# (on top) reads compared to Clojure (on the bottom):

flowchartcmp(click to enlarge)

Even having stripped most of the ceremony, I think its obvious that the imperative approach does not come close to the same level of approachability as the functional variant. Functional code almost reads like a flow-chart. Because every line is an expression (not a command), I can skip the intermediate storage of data in variables. Because I don’t use any variables I have a natural ‘glue’ for my data-processing. Since there’s no mutation the code is robust and easy to reason about. There are no side-effects I need to contain or track, no state, just a description of the process. A CS professor once remarked

Imperative programming is telling the computer how to solve a given problem. Functional programming is telling it what the problem is.

That gives functional programming a lot of points in regards to readability and modularity. When we use functional expressions free of side-effects, we can also disregard the order of execution, giving us more ‘glue power’, where on the other hand imperative programs require the strictest attention to call-order, state, etc — adding many pitfalls which lead to bugs and higher maintenance cost and development time.

Avoiding frailty

A few months ago Tim Sweeney, CEO and Founder of Epic Games, gave a demonstration of why he thought the next mainstream language would be more functional than what we’re using now. Lets look at an example from game development:

Transforming vertices:

Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m)
{
  Vertex[] Result = new Vertex[Indices.length];

  for(int i=0; i < Indices.length; i++)
    Result[i] = Transform(m, Vertices[Indices[i]]);

  return Result;
};

Now this is easy to read and the intent is clear. You walk through one array of vertices, transforming them and assigning the result to a new container. But is it safe? If you commit this code to your local SCM then your routine build won’t fail, this compiles just fine. During runtime however the program could die with an exception

If…

  • Vertices is null
  • Indices is null
  • Indices contains values outside ‘vertex’ range
  • m is null

The “i <= Indices.length” could dereference a null pointer, the transformation can throw an ‘out of bounds’ exception.

Because of this imperative approach to programming, everything is volatile. In order to ensure that this method does not fail while the game is running, we need to scatter conditionals throughout, ensuring every operation gets the kind of arguments it expects. In the C# example we saw obvious ceremony but I won’t hesitate to label this hidden ceremony — covering over frail principles.

Here we gain a lot of momentum using sequence abstractions and non-explicit iteration. According to a PhD written at IBM many years ago, about 60%+ of our programs is data processing by traversal of sequences: enumerating, applying functions, filtering etc. So instead of extending the above example into a massive nest of conditionals, let us instead rewrite it to be more expressive:

Clojure:

(defn transform [vertices indices matrix]
    (map #(transform matrix %)
       (map #(nth vertices %)
          (remove nil? indices)))

(pseudo code)

What happens if ‘indices’ is empty? Nothing, map has no work. What if it has empty items? Nothing, remove strips them. There’s still a potential for a runtime error, but so is there in Sweeneys code, so I’ll leave it for the sake of comparison. By adopting a language which is modeled around sequential abstraction many error paths are eliminated, the code is cut down in size and we can safely move on to the next item. By calling ‘remove nil’ we can be certain that all our direct vertex manipulation always has data to work with.

In a lisp there is simply no need to define ‘Result’ and inject transformed values into it, because everything is an expression — they always return the result of their operation. With mutability out of the way, we can start reclaiming the ‘time’ aspect of our programs. When things mutate, state changes and runtime errors follow.

A good question would be, how does the calling code respond if it passes a ‘v’ containing 120.000 vertices, but only gets 119.500 back? That’s a choice the developer will have to consider intelligently, but just that fact alone tells you that we’ve started unifying programming and behavioral logic, stripping out the role of being the compilers tutor. We can now spend those valuable mental resources on better things — ie things which our clients/customers/users actually benefit from. But more on language-motivated design patterns in a later post.

Exposing Mutability

Mr. Sweeneys example is very concise, yet clearly demonstrate the challenges inherent to mutability. Values change, assumptions become wrong, programs die. The vast majority of software bugs come from complexity inherent to mutability. There are 2 ways to contain this.

The first being correct use of (many) conditionals, building fences around the methods handling the volatile values. When we do this, it affects our code by making it

  • bloated — grows enormously in size
  • unreadable — due to the size you easily loose ‘the big picture’
  • costly — ‘quick fixes’ are a thing of the past
  • error prone — when you work to remove errors by adding more code, you add new possibilities of failure

When reading these considerations it’s worth noting that we haven’t even thought about concurrency yet — when we throw that into the mix, everything I just said quadruples. The complexity grows almost exponentially. Epic Games however aren’t rookies, so I think they covered all their bases, letting the final codebase weigh in at 250.000 lines of code! So seeing that option #1 is not really an option if we want beautiful code, lets move on to #2.

In a functional language the default way to handle mutability is: You don’t. Nothing ever mutates, data structures are immutable. Having immutable (& persistent) data structures is a big deal and dramatically changes the way you architect your code. This is also where the learning-curve rears it’s face.

When the default is immutability, we as developers have to manually, intelligently consider edge-cases, making good choices. This again touches on unifying behavioral logic and code, where our focus is now the interaction of various modules and not the interaction between 2 variables.

If we don’t agree on immutability as a necessary default for beautiful code, then there’s a problem. If I look at your variable knowing that even while I am watching it, it could be changing, how can I make a decision based on my perception? Somehow we need to stop the process while I get my perception in the middle, pouring concurrency down the drain. From where I’m sitting that will never be beautiful, it will be frail and unnerving.

Beautiful code is…

So to sum up this first installment of beautiful code. We see that code must be concise. Bloated code obfuscates intent. It must not however be so concise that intent becomes unclear and misunderstandings become inevitable.

It must be expressive, analogous to flowcharts. I used the word ‘compartmental’ because had I said ‘modular’ most people would think OO which would violate both point #1 and #3. Every line of code must speak volumes about the behavior it results in. I like what Christophe once told me

Every line of code, whether it be PHP or Clojure has roughly the same potential for a bug. But 20 lines of Clojure may sometimes translate into 1000 lines of PHP.”

- Christophe Grand

And finally, it must be safe. When our code habitually mutates objects, it becomes frail and dangerous. When you mix and mash sound principles by allowing developers freely to play both teams you’re letting your team loose in a landmine field (see Scala for an example). If your datastructures can be mutated, it is dangerous to assume they won’t be. If our code is to survive a swim in concurrent waters, we must also be explicit about time and state unless we want our f(x) to become f(x,time).

Beautiful code is

Concise — Free from both obvious and hidden ceremony

Expressive — Compartmental in its architecture, showing intent

Safe — Being explicit about state and time, defaulting to immutability




Reddit Tweet this! Bookmark on Delicious Share on HackerNews