ClojureQL - Where are we (going) ?

2009-12-10 23:30:04


ClojureQL is now coming dangerously close to version 1.0. Despite its young age, its already been adopted several interesting places, among others in the Danish Health Care industry. Before we ship version 1.0, I want to walk through some of the features and design decisions as well as encourage comments, criticisms and patches.

(logo: SQL in parenthesis)



Why ClojureQL?

Why do we need another DSL is a good question, to which there is a good answer. If you look through a PHP project for instance, how much of the code is actually PHP? 85%? 65% perhaps? It varies from project to project of course, but its not a big number if 100% is to be our goal and the reason is, PHP has no one-size-fits-all abstraction over the common SQL commands.

Thats a big deal, because at a bare minimum the quality assurance team backing a PHP project has to support 2 languages in detail and although SQL can look simple on the cover don't let that fool you, there are many pitfalls. Then add shell-script for the OS integration and Ruby perhaps for some multithreading tasks and suddenly PHP becomes no more than glue.

ClojureQL enables the Clojurian to stay within his Clojure-domain even when interfacing with his SQL database, greatly simplifying the code-base. Its simpler partly because now you only have Clojure code and not Clojure + SQL, but also because SQL doesn't really exist. If I ask you to write out an SQL statement which creates a table with a single column containing an auto incrementing int, what would you do ?

If you were targeting a MySQL database, you could write

CREATE TABLE table1 (id int NOT NULL AUTO_INCREMENT, PRIMARY KEY ('col'));

But then half-way through the project, the Customer wants to target PostgreSQL instead, so you have to write

CREATE TABLE table1 (id int PRIMARY KEY DEFAULT nextval('serial'));

Or what if one service has to run against Oracle instead, or Derby, or Sqlite ? For all of the servers there are various smaller or larger differences. Some use double quotes ", others single. Sometimes its the entire construction of the statement which needs to be altered. Whatever the issue great or small, it means that you have to review your code and modify it accordingly.

(create-table table1 [id int]
    :non-nulls id, :primary-key id, :auto-inc id)

The code above is simple, flexible and will run on all supported backends. For version 1.0 we want to support 3 backends: MySql, Postgresql and Derby, but its not a complex matter to extend CQL to more than these. By having a modular backend, we've pretty much guaranteed that you can keep your team working in the same domain throughout your databasing needs, which is a big win. If the customer for whatever reason needs to change target or incorporate an entirely new target, you don't need to rewrite your code, ClojureQL will handle that:


cqlstructure

Code stays in the same domain, and the liquid concept of 'SQL code' becomes concrete: Lisp.

Any change in the target destination rarely warrants change in the user-code!


Our approach

Currently we have a nice and uniform frontend, which is what the developers will be working with on a daily basis. Its here you'll find all your SQL-like functions: query (SELECT), insert-into, update, create-table, group-by, etc. The front-end only has 1 job and thats to categorize your input in a format which the backend understands (call it AST).

Lets say I want to pick out the user Frank from my accounts table:

>> (query accounts [username passwords] (= username "frank"))
{:columns [username passwords], :tables [accounts],
 :predicates [= username "?"],  :column-aliases {},
 :table-aliases {}, :env ["frank"]}

As you can see, this doesn't actually do any work. All the information is seperated into keys in a hash-map for later compilation by the backend. In order to compile the statement you need an open connection to a database, because only through that can we determine exactly what the final SQL Statement should look like. You can test it like so

>> (with-connection [c (make-connection-info "mysql" "//localhost/cql" "cql" "cql")]
			 (compile-sql
			  (query accounts [username passwords] (= username "frank")) c))
"SELECT username,passwords FROM accounts WHERE (username = ?)"

That opens and closes a connection in order to compile the statement. ClojureQL always uses parameterized SQL Statements, meaning that variables appear as question marks in the text and are passed as parameters to the actual execution call. If you examine the AST output from Query, you'll see the parameters lined up sequentially in the :env field, in this case just "frank".

Query is pretty flexible accepting symbols like *, custom predicates, column aliases etc etc. Since everything is rolled in macros you have to explicitly tell CQL what you want evaluated:

(let [frank  "john"]
 (with-connection [c (make-connection-info "mysql" "//localhost/cql" "cql" "cql")]
   (prn (compile-sql (query accounts [username passwords] (= username frank)) c))
   (prn (compile-sql (query accounts [username passwords] (= username ~frank)) c))))
"SELECT username,passwords FROM accounts WHERE (username = frank)"  ; finds frank
"SELECT username,passwords FROM accounts WHERE (username = ?)"  ; finds john

All the macros call drivers, so everything is evaluated at run-time instead of compile-time. Oh and by the way: Pulling out and watching a compiled statement is more verbose than just running the code:

>> (run :mysql (query accounts * (= status "unpaid")))
{:id 22 :name "John the Debtor" :balance -225.25}
...

Since all of our statements initially are just AST representations, the way to modify their core behavior with functions such as Group-By or Order-By is to pass the AST around to these functions

(let [select-statement (query accounts *)]
			 (with-connection [c (make-connection-info "mysql" "//localhost/cql" "cql" "cql")]
			   (compile-sql (group-by select-statement username) c)))
"SELECT * FROM accounts GROUP BY username"

That hopefully quickly becomes second nature to you. I've considered rearranging the argument order, to make it easier to read but nothing is decided and I'm open to suggestions.

In broad-strokes we have implemented the following sql-statement-types in the frontend

query
join
ordered-query
grouped-query
having-query
distinct-query
union
intersect
difference
let-query
update
delete
create-table
create-view
drop-table
drop-view
alter-table
batch-statement
raw-statement

If you're already comfortable with SQL you should recognize most of those. I'll explain the exceptions:

Let Query: LetQuery is a way of binding a local to the result of a query, very similar to Clojure's 'let'.

(let-query [password   (query accounts [username password] (= username "frank"))]
    (println "Your password is: " password))

Batch Statement: Batch statements are (as you may have guessed) a cluster of statements which will execute sequentially with a single call to run. Initially we used a batch of create-table + alter-table to be db-agnostic in regards to table-creation, but we later sacrificed this approach by implementing backend modules.

Raw Statement: A Raw Statement is what you use, when you've read all the documentation and frontend.clj, without finding a function that does exactly what you need. It'ins a brute and I hope you won't use it:

(raw "SELECT * FROM accounts USING SQL_NINJAv9x(0x333);")

If you end up using Raw for something which isn't specific to your setup, please drop an email so I can schedule it for assimilation.

And finally, I put alter-table in italics because its not done yet, but  it will be for 1.0. No two SQL implementations even remotely agree on how to use alter and yet I want to expose it to you in a uniform way, so thats a challenge - Input is encouraged.


It's easy to get started!

Documentation is...under way, but generally we try to put all significant functions into the the demos, so that you can see them in action. To get started with something like Derby takes no effort, so I'll show MySql instead:

From your shell:

$ mysql -u root -p

 Enter password: *********

 Welcome to the MySQL monitor.  Commands end with ; or \g.

mysql> CREATE DATABASE cql;

 Query OK, 1 row affected (0,00 sec)

mysql> GRANT ALL PRIVILEGES ON cql.* TO "cql"@"localhost" IDENTIFIED BY 'cql';

 Query OK, 0 rows affected (0,29 sec)


Now we have a database called cql, a user called cql and his password is cql. Then fire up your Clojure REPL

user> (use :reload 'clojureql.demos.mysql)
nil
user> (load-file ".../clojureql/src/clojureql/demos/mysql.clj")
nil
user> (clojureql.demos.mysql/-main)
SELECT StoreName FROM StoreInformation
{:storename "Los Angeles"}
....

The demos all do the same thing, so first you might want to read through demos/mysql.clj to see how we load the driver, intialize the namespace and so on. MySql is special because it shows off a global persistent connection, which means it stays open for as long as your program runs or until you close it. The other demos open/close connections on every call, depending on your project you will prefer one of the two.

Once the driver is loaded and the connection-string defined, nothing seperates the demos for Derby, Mysql or Postgres and so they all load demos/common.clj. Once you've read and understood everything that goes on there, you'll have a good grasp on how to handle most situations with ClojureQL.


Building the thing

For Version 1.0 I want ClojureQL running on a quality build-system and have spent the past couple of days researching what that might be. Currently we are interfacing directly with Ant & Ivy using some complicated XML configurations. This setup goes against every principle I have because of its inflexibility, lack of elegance and complexity.

The 2 systems which I found most interesting, were Gradle and Leiningen. I get the impression that Leiningen is widely adopted in the Clojure Community already and rightfully so. It's a DSL which lets you configure your build using Clojure code and its easy to pick up and build with. On the down-side its very fresh off the press so I would have great reservations using it in a complex scenario of several projects and in case you need some Ruby, Python, whatever code Leiningen can not support you. So I opted for Gradle.

Current users: Please notice that during the move we removed 'dk.bestinclass' from the namespace declarations.

Why is Gradle so groovy?

Gradle is Groovy because it lets you write build-scripts in Groovy, but supports a multitude of languages. Its been around for a while and is now coming into maturity. It gracefully lets you handle multiple projects, projects written in multiple languages, dependencies, distribution, etc. Best of all, my co-pilot on ClojureQL Mr. Meikel Brandmeyer has written a plugin for Gradle called Clojuresque. Clojuresque enables Gradle to read Clojure-code well enough to identify namespaces etc, letting us AOT compile the project neatly into a Jar which is what we need. He's also added support for distributing Jars to the newly started Clojars site (Leiningen does this too). I don't have much to say about the Groovy build-scripts except they beat XML, and if you don't know Groovy is much like Java without much of the boilerplate.

To avoid any confusion, I'll quickly run you through how to install Gradle + Clojuresque. If you don't need this at the moment, feel free to bookmark the page and skip past it.

First pick a directory where you want to setup. Grab the latest Gradle which depends on nothing, it comes with a small Groovy installation:

wget http://dist.codehaus.org/gradle/gradle-0.8-all.zip
unzip gradle-0.8-all.zip && rm gradle-0.8-all.zip

Gradle really only needs your PATH variable to point to its /bin directory, but build-scripts sometimes ask for GRADLE_HOME, so set them up

export PATH=$PATH:/PATH/TO/gradle-0.8/bin
export GRADLE_HOME=/PATH/TO/gradle-0.8

Then with Gradle set up, you need to get Clojuresque and compile it

wget http://bitbucket.org/kotarak/clojuresque/get/v1.1.0.zip
unzip v1.1.0.zip && rm v1.1.0.zip
cd clojuresque
gradle build

-- Massive output, BUILD SUCCESSFUL, jar in: build/libs/clojuresque-1.1.0.jar

clojuresque-1.1.0.jar is all Gradle needs in order to understand your Clojure projects. If all you need Clojuresque for is building ClojureQL then don't bother fetching it, Gradle will handle that automatically once you build ClojureQL.


In closing

Now you've seen a little bit of ClojureQL and I hope it has caught your interest. We're dedicated to making this as stable, elegant and featureful as possible in order to get you talented Lispniks to stop writing SQL - That said, contributions (even if its just ideas) are most welcome. These are the facts you should know

ClojureQL is....


stand
2009-12-11 21:50:51
Thanks for the excellent writeup, Lau. I'm pretty excited about this project and am following closely! It's a bit ironic that we need to do things like this given that SQL was originally supposed to be a universal database communication language.

One thing you don't mention which I think is an advantage of this approach is that since your queries get built at the language level instead of at the string concatenation level it makes it easier to guard against sql injection attacks.

For that reason, I think it might make sense to have some sort of boolean that you can set that disallows (or warns about?) the use of raw though I'm not sure that would really make any difference. It certainly behooves you to be careful about what you allow raw to do when writing code.
Aaron Erickson
2009-12-12 03:39:50
Congrats - you invented Linq to SQL!

More seriously, nice accomplishment.  I love the idea - especially one not controlled by a specific vendor.
Lau
2009-12-12 12:38:25
@stand: Thanks :)

It might be worth considering some type of security module, but as you mention keeping the construction of statements at the language level as well as parameterizing everything does keep you pretty safe.

@Aaron: Thanks.
Meikel
2009-12-14 10:35:57
@Aaron: This project is based a paper about SchemeQL published on October 2002.
Tim Clark
2010-07-28 08:45:40
(justAroundTheCorner? "ClojureQL" "1.0")
com.clojureQL> ???