Why Elixir May Be Kinetic Data’s Future
Why did we choose Elixir as one of our primary development tools? Several factors went into this decision. It may be helpful to start out with a little history.
The Kinetic Platform is written predominantly in Java. There’s a lot of reasons for this. Java is an industry leader, it has a large ecosystem of libraries, and so on. At Kinetic Data, even the parts of our Platform that are written in Ruby are technically executed in a Java JVM using JRuby.
Some people might ask, “why would you change your mind about Java?” This isn’t truly about what’s wrong with Java but more about what new tools exist in the industry to solve different issues.
As we onboard more customers into our cloud offering and our customer managed environments are taking on higher and higher workflow loads, we need to start thinking about the overall scalability and performance of our Platform. One thing that Java has traditionally made difficult to accomplish, or at the very least complicated to contend with is concurrency.
When you’re looking for a solution to the concurrency and distributed computing problem, it’s hard to ignore languages like Erlang and Elixir. We see blog posts and articles constantly with success stories for companies using Erlang and Elixir such as BleacherReport reducing their server farm from 150 to 8, or Discord handling 5,000,000 concurrent connections, or WhatsApp serving 1,000,000 concurrent connections with a single machine.
Do we need this level of scale at this point in time? Probably not. But it doesn’t hurt to be prepared.
What’s more important is that Erlang has been solving this problem effectively since the 1980’s. They fundamentally changed the way their software executes in order to cater to this particular problem. It means developers have to adopt a new way of thinking, but that new way of thinking also has other tangible benefits.
Things we love about Elixir
There’s a lot to love about Elixir. There’s the ever-growing community, the quality of the libraries and frameworks available, the truly impressive standard library, and so much more. But when it comes down to our day-to-day use, these attributes set it apart from most other languages we’ve evaluated: fault-tolerance, functional programming, Actor-based concurrency, and overall performance.
Fault Tolerance
Programming languages all have many ways of tackling the fault-tolerance problems. Many languages have several ways to address this. For example, a pure functional language like Haskell will attempt to solve the problem by ensuring that side-effects are contained and that everything is strictly typed in order to reduce or eliminate runtime errors.
Java, on the other hand, uses exceptions heavily to accomplish this problem. Being long time Java users we have noticed a lot of problems and anti-patterns with this approach, such as: how often do we ever do anything meaningful with an exception?
Elixir, however, embraces a completely different mantra, “let it crash.” Sure, it has a similar analog to exceptions from other languages, when you can affect the outcome of a failure you should. Ultimately Elixir instead relies on a Supervisor strategy where it attempts to gracefully recover from runtime failures.
As a developer, you can fine tune this behavior by controlling what restarts on a crash, whether the previous state is maintained, and more. As a matter of fact, BEAM (the Erlang virtual machine) uses this strategy to prevent the crashing of the whole VM—your entire application is a tree of supervisors effectively.
Functional Programming
Elixir embraces functional programming fully. Other alternative paradigms include procedural and object-oriented programming, which will be very familiar to most developers.
Joe Armstrong, one of the creators of Erlang (the language that Elixir is built on top of) is famously quoted as saying, “The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana, but what you got was a gorilla holding the banana and the entire jungle.”
More often than not, object-oriented programming brings in side-effects and state that can obscure the true behavior of an object or function. In functional programming languages, data and functions are first class. Functions are intended to be pure and idempotent, meaning effectively that whenever that function is called with specific arguments it will always return the same results.
This results in predictable, testable, and reusable code. Functional code tends to be simple code. Rich Hickey, creator of Clojure (another functional programming language), discusses the concept of simplicity in a fantastic Strange Loop presentation. But he really hammers the point forward in a discussion about HttpServletRequest. He talks about how object-oriented programming hides data to its detriment more often than not by obscuring the data with classes. It doesn’t always make it easier to use and understand.
Actor-based Concurrency
In the more traditional programming languages you are fairly limited on how you manage concurrent processing. You typically have two choices: locks and futures.
For many of us who started our development journey using languages like C and C++, the management of threads and locks was a constant battle and a total nightmare. More modernized languages like Java and C# made this experience incrementally better, but at the heart of it, this is a difficult problem for programmers to contend with—especially in a language which allows data that multiple threads are relying upon to be mutated.
When futures and promises came to the scene, the experience got a lot better; but futures are still based on top of threads and bring their own complexity and baggage. Other languages approached this problem with different methods.
Clojure, for example, uses channels, but in order to accomplish this, Clojure had to adopt certain principals in a language—functional and immutable principals. Scala was inspired by Erlang and utilized the concept of Actor-based concurrency. However, because they didn’t divorce the language from Java nearly as completely as Clojure did, you get the benefits of actors and the problems with futures.
Elixir, on the other hand, went all-in with Erlang and its actors. The system is deceptively simple. Everything within your running program is a process with its own process ID (or pid for short). Whenever a process wants to communicate with one another, they never do so directly; they cannot access a shared state.
Instead, the process sends a message to the mailbox of the other process. In return, once the receiving process handles that message, it may reply back. Since data within Elixir is immutable, the data sent to a different independent process cannot be mutated and thus cannot harm the functioning of each other. And since your mailbox is handled one message at a time, the need for artifacts like locks is unnecessary. The language itself communicates to processes using their mailboxes as well.
This simplicity means that most concurrent behaviors can be written in a way that makes them appear procedural without needing to managing blocking threads. A “function call” isn’t a blocking process but simply sending a message and then watching and waiting for a response. Erlang provides a framework, called OTP, that helps standardize how we manage logic in processes using patterns like the GenServer behavior.
Overall Performance
The overall performance of Elixir is very enticing to us. We don’t mean strictly the performance in terms of speed of execution, but also in developer experience and technical debt. Elixir provides a lot of the essential elements to make the developer experience better, such as elements in the language like the pipeline |>
operator, supervision trees, and more.
Elixir and Erlang are nearly impossible to fully crash, as even the language VM is supervised and will restart except in the most extraordinary of situations. The tools provided through the language for distribution and concurrency make it more nearly free to write concurrent code, in stark contrast to languages like Java.
Managing large pools of threads in Java can have severe negative impacts, while Elixir is capable of hundreds of thousands of processes. A process in Elixir only consumes 466 bytes of memory, plus its state.
By default, without JVM tuning a Java thread will consume a minimum of 1 Megabyte of memory in addition to its logic and state. Elixir’s built-in preemptive scheduler and actor pattern, as mentioned above, make it possible for a developer to safely utilize this increased capacity in processes.
It’s Not All Roses
So, it sounds like it checks all of the boxes, right? Well…not everything is great about Elixir, and nothing is perfect. There are several items to be concerned with from a language and an ecosystem perspective.
While Erlang has been around solving these problems since the 1980s, it is a fairly esoteric language. It suffers from many of the types of negative feedback that languages like Clojure suffers from. The language style is “foreign” and syntax is “weird” when compared to popular modern languages.
Elixir has gone a long way to bridging this gap by exposing the power of the Erlang BEAM using a Ruby-inspired syntax. The syntax still has to be unique in many ways to accommodate the immutable, functional nature of its underlying VM.
With this new language, new patterns, and a new syntax, the adjustment period for new developers to adopt it is higher. In many ways it is easier to cross-train someone experienced in C# to Java, and far less easy to cross-train them to a functional language like Elixir. Additionally, most modern developers are used to and accept the way that we handle concurrency with things like threads and futures, so adjusting to the actor/process mentality can be a significant leap conceptually.
The Elixir language is built on a macro system, which allows the language itself to be written nearly entirely in its own syntax. This macro functionality lends a lot of power to the language and can be tremendously useful.
Macros, though, come with a tremendous amount of responsibility. Scala, for example, allows developers to create custom operators which are quite frequently used to generate bespoke domain-specific languages (DSLs). Eventually the sheer number of custom operators becomes its own mental burden that is often poorly documented.
For this reason, languages like Elm eventually eliminated the ability for developers to define custom operators, instead preferring basic function calls. Elixir has a lot of bespoke DSLs defined by libraries and often with the best of intentions. For example, Ecto allows you to define SQL queries in Elixir code using macros and there’s certainly some magic occurring there with the goal of simplifying the experience.
The documentation has to be great and thorough for libraries utilizing macros for their own DSL, as the current LSPs aren’t very good at determining what the macros are doing during development and so your in-editor experience is not great. And when things go wrong, macros can make it very difficult to navigate the stack to determine the root cause of the error.
Finally, as a “recovering Java shop” using React as our front-end, we have been quite spoiled by the amount of talent available in the market when we are looking to add additional staff. Java and JavaScript have been around a long time and React is arguably the most popular JavaScript framework. Making a new language a primary part of our toolbelt brings with it a risk. Will there be enough people looking for work? Will they be experienced enough to be productive in some of our most complex code?
Why Now?
So, why wait so long to make this move? Why not use Erlang before Elixir existed? Why not adopt Elixir sooner than we have? Well, there’s a lot of reasons, many of them discussed above. But ultimately one of the biggest barriers to utilizing new languages is customer adoption.
In our history, we have been installed on customer-managed environments. This just means they provide a server and often also a J2EE container like WebLogic or Tomcat, and we have to play nice in the multitude of potential customer configurations. This also means that the customer has to be accepting of installing something new, and at the time we first experimented with Elixir it was very new.
We knew that we needed more flexibility both in terms of infrastructure but also in terms of horizontal scalability for our cloud offering, kinops.io. That journey was a long one, with many solutions attempted. But ultimately, we landed on Kubernetes as our solution to fill these gaps.
The beauty of a containerized environment is that we, Kinetic Data, can be responsible for everything happening inside of it. The customer no longer needs to be the one to harden and secure the environments, to maintain the versions of software and operating systems. Because the responsibility has been shifted to us for everything running within the platform, we now also have the opportunity to use more tools we were previously unable to without significant challenges.
This means that Elixir may be our first great experiment in using new tools for our platform—but it certainly won’t be the last.