When the time came for pol.is to move away from our early prototype to a scalable infrastructure capable of city-size conversations, there were some big decisions to be made about what tools we'd use. With big data all the buzz these days, there was also a lot to consider.
For a while now, I've been a big fan of functional programming, largely for the elegance and challenge of it. But when I began looking into what languages we would use, I started wondering about what practical advantages they might afford us in our situation. It was then that I discovered this:
Originally there wasn't much interest in FP outside of academic circles because there was no functional killer app. Now we know that [the] killer app is concurrency.
-- Bartosz Milewskie
The biggest challenge with concurrency and parallelism is state; If you have mutable data shared between multiple threads, making sure changes to that data are orchestrated safely is a nightmare. The imperative solutions to this problem are generally fraught with performance hits, unpredictability, and risk of deadlocks. Functional languages simplify this picture by taking mutability out of the equation (or at the very least carefully guarding it).
This really caught my eye. With "more cores" being the way of Moore's Law for the foreseeable future, I saw huge benefit in maximizing the utility of these architectures for our computationally intensive math processes. Excited by what I'd learned, I started thinking about what functional languages we might consider.
Haskell was the first functional language I ever played with, and is still one of my favorite. Its rich type system and elegant pattern matching immediately appealed to me. However, it sadly dropped out of the running before even making it out of the gate. Its austere reputation was too much for my teammates to feel comfortable with in thinking about maintaining a development team.
OCaml initially looked like it might be an attractive candidate. I used it some during my time at FHCRC, and enjoyed doing so. It also has a reputation for being good at numerical computing, a definite plus. However, perhaps as an ulterior consequence of it's more lax treatment of state, it has become wedded to a GIL, greatly restricting its capacity for concurrency. Given that was one of the features I was most excited about leveraging from the functional world, I kept looking.
Python, while not being strictly functional, has more of a functional emphasis/flavor than a lot of languages, and certainly has a number of attractive characteristics. It's widely used in scientific computing communities, features some really nice numerical tools (numpy, scipy, ipython, pandas, etc.), and is generally quite pleasant. However, it too is bound to a GIL forcing any concurrent/parallel computation to happen in separate processes. And even what multiprocessing capabilities it does have through it's
multiprocessing library, I found to be rather cumbersome and error prone. Some other things that turned us away were bad library management, and the mess over the glacial adoption of python 3.
Julia (while also not purely functional) was intriguing. With a similar domain scope as R, but better performance, concurrency support, automatic type inference, and the feeling of a language that was actually designed for building a big system, it was tempting. The fatal flaw at this point was really that it's just a bit too immature still. However, it's definitely something I'll be keeping track of.
Scala we considered a contender for quite a while. It seemed strong numerically, had some of the cool typing characteristics of the Haskell/ML camp of languages, concurrency support, JVM interop, and boasted the well known distributed computing framework Akka. Despite all these strengths, we decided against it because it seemed too lax on state and multi-paradigm in nature. And while this came out after our decision, it certainly made us feel somewhat better about the direction we took.
And then there was Clojure...
I think I had a feeling we were going to be going with Clojure from early on. Clojure is JVM hosted Lisp which was built from the ground up to deal with highly concurrent and parallel computation. Underlying this strength is a principled emphasis on simplicity, immutable data, and programmatic power. However, it has managed to remain pragmatic enough to avoid the air of austerity that Haskell has (unfortunately) cultivated.
If there are two resources worth looking at about Clojure, it would be these two talks from author Rick Hickey: Simple Made Easy, and Are We There Yet (AWTY). I'd frankly recommend these for any programmer, not just those interested in Clojure; The talks actually don't even mention the language except perhaps in passing once or twice. They're really about philosophical concerns of language design, the problems Rich sees with many existing languages and practices, and solutions to them. As one learns Clojure you can see how deeply and cleverly embedded these principals are in the architecture of the language.
Some things that stood out as appealing to us:
- While emphasizing immutable data, it allows you to track a series of states through time via it's reference types. These reference types provide semantics for controlled, sane access to data across threads without the insanity of locks. Intelligent, principled, pragmatic, all in one. (See AWTY)
- Tight integration with Storm, which seemed to be well suited to our distribution/computational model.
- Being hosted on the JVM, like Scala, offers access to everything anyone could want from that world. In particular, for performance concerns, this meant we could always drop down to Java for compute intensive things if we couldn't get what we needed with pure Clojure code.
- As a LISP, Clojure has near limitless expressive power, owed in large part to its macro system. LISP macros are like functions that take code and turn it into other code at compile time. This allows one to effectively extend the language with new syntax, binding forms and control flow constructs. An impressive example of the power of macros is Clojure's core.async library, which adds many of the core features of Go as a library.
- The more I got into it, the more fun I had with it. And fun is important.
I would be remiss if I didn't also share the concerns we had with Clojure.
- Adoption: Scala has garnered somewhat better adoption.
- Interop: While JVM-interop for Clojure is strong, Scala's more Object Oriented nature allows you to interact more with the Java type system.
- Performance: While we'd seen from performance comparisons that Clojure can be quite competitive in various performance metrics at lower code burden (often within 3x of Java implementations, and sometimes neck in neck), it seemed that one did not get this without a bit of extra leg work. In contrast, Scala seemed to be the sort of language that tends to be pretty fast right out the gate.
Ultimately, we went with Clojure. The strengths of Scala were also related to some of its biggest weaknesses. The tighter Java interop came along with more of a multi-paradigm flavor, which we didn't like (for roughly these reasons). And with performance, we decided that as long as it's possible to speed it up when needed, being able to prototype quickly, and only spend time speeding up what's needed would be the ticket.
(I will not apologize for this typo)
As the post title has already likely hinted, I've loved using Clojure. The cleanliness and beauty of functional programming combined with the power of LISP makes for a really wonderful programming experience.
Part of what has made learning Clojure so interesting is seeing how different it is from other functional programming languages, such as Haskell. So much of what I enjoyed about Haskell was the rich and powerful typing system and pattern matching.
In contrast, Clojure eschews rigid typing in exchange for the flexibility of using core types to represent data. While I do sometimes wonder what Clojure would look like as a more strongly typed language (particularly as regards performance and programming patterns), I'm glad it's been decisive about what it is and isn't. In the end, it's enlightening to see functional programming from such a different angle.
There are however a few projects that are trying to close this gap: core.typed, Prismatic's schema, some stiching between the two (schema-typer), and core.match. I haven't yet spent much time using these yet, but perhaps for another post I'll be able to share some deeper thoughts on these.