What Elm and Rust Teach us About the Future

· 2017/02/08 · 17 minute read

So I recently started programming in Elm and also did some stuff in Rust. Honestly, it was mostly a hype-driven decision, but the journey was definitely worth it. I also noticed that although those two languages differ in their target audience and use cases, they made many similar design decisions. I think this is no coincidence. It is well possible that ten years from now, both Elm and Rust will be forgotten, but I am quite sure that the ideas they are built upon will be present in the languages we will use by then. This is a post about the ideas I find charming in Elm and Rust.

A quick disclaimer first: I am no expert in either language and while I am starting to feel comfortable in Elm, I am undoubtedly a Rust beginner, so please correct me if I am doing injustice to any of the languages.

Setting the Scene

Rust is a systems language which aims to compete with C++. Rust values performance, concurrency, and memory safety, but is not garbage-collected. Rust compiles to native binaries, not only for the major x86/x64 platforms but also on ARM and even certain ARM-based microcontrollers.

Elm is a language for web apps competing with Javascript in general and the virtual DOM frameworks in particular (e.g. ReactJS). Elm compiles to Javascript, is garbage collected and purely functional. Elm values simplicity and reliability.

Both languages are already usable for actual projects, but the ecosystems are still immature and the languages themselves are still evolving.

While I like both of the languages, I do not intend to limit this post to the positive sides and will also mention what are (to me) the pain points. I will start with the ideas the languages have in common, and will give more details about either language later.

Common Themes

The features described here are mostly nothing completely new and could be found in languages like OCaml, Haskell and F#. The interesting part is that Elm and Rust prove they are useful for quite diverse use-cases.

Tagged unions

This is a small but very practical feature - I would say tagged unions are enums on steroids. Consider, how often did you write something like:

enum AccountType {Savings, CreditCard};     

//In real code please use Decimal types to represent money. Please.
class CreditParams {
    int creditLimit; 
    ...
}

class Account {
    AccountType accountType;
    int balance;

    //only present for CreditCard, always null for Savings
    CreditParams creditParams;          
}       

This makes room for some sweet bugs, as your data model can represent a state that should be impossible (savings account with non-null credit parameters, or a credit card account with null credit parameters). The programmer needs to take care that no manipulation of the Account object can lead to such a state which may be non-trivial and error-prone. It also creates ambiguity - for example, there are multiple ways to get the credit limit of an account:

//Yes I know, this should be a class method
int getCreditLimit1(Account account) {
    if (account.creditParams != null) { 
        //wrong if account.accountType == Savings
        return account.creditParams.creditLimit;
    } else {
        return 0;
    }
}

int getCreditLimit2(Account account) {
    if (account.accountType == CreditCard) { 
        //possibly accessing a null pointer
        return account.creditParams.creditLimit; 
    } else {
        return 0;
    }
}

A more desirable option is to make impossible states impossible. Tagged unions let you do this by attaching heterogenous data to each variant. This lets us rewrite the data model as (Rust syntax, try it online):



struct CreditParams {
    credit_limit: i32, //i32 is a 32bit signed int
	...
} 

enum AccountDetails {
    Savings, //Savings has no attached data
    CreditCard(CreditParams), //CreditCard has a single CreditParams instance
}

struct Account { 
    balance: i32, 
    details: AccountDetails,
  }
  

With tagged unions, you cannot access the attached data without explicitly checking the type - so there is only one way to get the credit limit and it is always correct (Rust syntax, try it online):

fn get_credit_limit(account: Account) -> i32 {
    match account.details { //match is like case
        AccountDetails::CreditCard(params) =>  //bind local variable params to the attached data
            params.credit_limit,    //in Rust, return is implicit
        AccountDetails::Savings => 
            0
    }
}

Since both Elm and Rust don’t have null values, you have to specify CreditParams when building an AccountDetails instance, and so the code above is safe in all situations.

A further bonus is that in both Elm and Rust, you have to handle all possible cases of a tagged union (or provide a default branch). Failing to handle all cases is a compile-time error. In this way, the compiler makes sure that you update all your code when you extend the AcountDetails.

Type Inference

Some people are fond of static typing as it is harder to write erroneous code in statically-typed languages. Some poeple like dynamic typing, because it avoids the bureacracy of adding type annotations to everything. Type inference tries to get the best of both worlds: the language is statically typed, but you rarely need to provide type annotations. Type inference in Rust and Elm works a bit like auto in C++, but it is much more powerful - it looks at broader context and takes also downstream code into consideration. So for example (Rust syntax, try it online)

// The compiler infers that elem is a float.
let elem = 1.36;

//Explicit type annotation - f64 is a double precision float
let elem2: f64 = 3.141592;

// Create an empty vector (a growable array).
let mut vec = Vec::new();
// At this point the compiler doesn't know the exact type of `vec`, it
// just knows that it's a vector of something (`Vec<_>`).

// Insert `elem` and `elem2` in the vector.
vec.push(elem);
vec.push(elem2);
// Aha! Now the compiler knows that `vec` is a vector of doubles (`Vec<f64>`)

//The compiler infers that s is a &str (reference to string)
let s = "Hello";

//Compile-time error: expected floating-point variable, found &str
vec.push(s); 

Type inference in Rust has certain limitations and so explicit type annotations are still needed now and then. But Elm goes further, implementing a variant of the Hindley-Milner type system. In practice this means that type annotations in Elm are basically just comments (except some weird corner cases). While the Elm compiler enforces that type annotations match the code, they can be omitted and the compiler will still statically typecheck everything. Nevertheless, it is a warning to not annotate your functions with types, as type annotations let the compiler give you better error messages and force you to articulate your intent clearly.

Immutability

Immutability means that variables/data cannot be modified after initial assignment/creation. Another way to state it is that operations on immutable data can have no observable effect except for returning a value. This implies that functions on immutable data will always return the same value for the same arguments. Code working with immutable data is easier to understand and reason about and is inherently thread-safe. Consider this code with mutable data:

address = new Address();
address.street = "Mullholland Drive";
...
person = new Person();
person.primaryAddress = address;
print(person.primaryAddress.street) //Mullholland Drive
...
address.street = "Park Avenue"
...
print(person.primaryAddress.street) //Park Avenue

Now let’s say we want to figure out why person.primaryAddress.street changed. Since the data is mutable, it is not sufficient to find all usages of person.primaryAddress - we also need to check the whole tree of all variables/fields that were assigned to/from person.primaryAddress. With immutable data structures this problem is prevented as the programmer is forced to write something like:

address = new Address("Mullholland Drive", 1035, "California");
//Elm and Rust also support syntax of the form:
//address = { street = "Mullholland Drive", number = 1035, state = "California" }
...
person = new Person(address);
...
address.street = "Park Avenue" //not allowed, the object is immutable

For a more detailed discussion of why immutability is good, see for example 3 benefits of using Immutable Objects.

Elm goes all-in on immutability - everything is immutable and no function can have a side effect. Rust is a bit more relaxed: in Rust, you have to opt-in for mutability and the compiler ensures that as long as a piece of data can be changed within a code segment (there is a mutable reference to the data), no other code path can read or modify the same data.

The Problem with Immutability

Making sure that the data you are referencing cannot change without your cooperation generally makes your life easier. Unless this is EXACTLY what you want to achieve. Let’s say you are writing a traffic monitoring tool. You might want to model your data like this (Elm syntax):

-- In Elm, double dash marks a comment
type alias City =         --Curly braces declare a record type, a bit like an object
  { name: String
  , routes: List Route    --list of Route instances
  }

type alias Route =
  { from: City
  , to: City
  , trafficLevel: Float
  }

type alias World =
  { cities: List City
  , routes: List Route
  }

You may expect that when you receive new traffic information, you simply work with World.routes and the changes will be seen when accessing through City.routes. But you would be mistaken. In Elm this will not even compile (fields in record types are fully expanded at compile time, and thus cannot have circular references). And if you use tagged unions to make the model compile, the trafficLevel accessed via World.routes may not be the same as one accessed via City.routes, as those always behave as different instances.

A similar data model in Rust will compile but it will be difficult to actually instantiate the structure and you won’t be able to ever modify the trafficLevel of any Route instance, because the compiler won’t let you create a mutable reference to it (every Route is referenced at least twice).

This brings us to a less talked-about implication of immutability: immutable data structures are inherently tree-like. In both Elm and Rust, it is a pain to work with graph-like structures and you have to give up some guarantees the languages give you.

In Elm, the only way to represent a graph is by using indices to a dictionary (map) instead of direct references. For the above example a practical data model could look like:

type alias RouteId = Int    -- New types just for clarity
type alias CityId = Int 

type alias City =
  { id: CityId 
  , name: String
  , routes: List RouteId 
  }

type alias Route =
  { id: RouteId
  , from: City
  , to: City
  , trafficLevel: Float
  }

type alias World =
  { cities: Dict CityId City     --dictionary (map) with CityId as keys and City as values
  , routes: Dict RouteId Route
  }

Notice that nothing prevents us from having an invalid RouteId stored in City.routes. While Elm gives you good tools to work with such a model (e.g., it forces you to always handle the case where a given RouteId is not present in World.routes), and the advantages for every other use case make this an acceptable cost, it is still a bit annoying.

Rust has a bit more options to work with graph-like data, but they all have downsides of their own (using indices, StackOverflow discussion, graphs using ref counting or arena allocation).

Smart but Restrictive Compilers

This is basically a generalization of the previous specific features. The compilers for Elm and Rust are powerful and they do a lot of stuff for you. They not only parse the code line-by-line, but they reason about your code in the context of the whole program. However, the most interesting thing about compilers for Rust and Elm is not what they let you do. It is what they DO NOT let you do (e.g., you cannot mix floats and ints without explicit conversion, you cannot get to the data stored in an tagged union without handling all possible cases, you cannot modify certain data etc.). At the same time, the compilers are smart enough to make conforming to these restrictions less of a chore. If you think that programmers will produce better code when given fewer limitations, think of the time people complained that restricting the use of GOTO hinders productivity.

Another way to formulate this stance is that languages should not strive to make best practices easy as much as they should make writing bad code hard. I think both languages achieve this to a good degree - writing any code is a bit harder than in their less restrictive relatives, but there is much less incentive to take shortcuts.

In practice, smart but restrictive compilers mean more time spent coding and less time spent debugging. Since debugging and reading messy code can be very time-consuming, this usually results in a net productivity gain. Personally, I love writing code, while debugging is often frustrating, so to me, this is a sweet deal.

Needless to say, all those restrictions make hacking one-off dirty solutions in Rust or Elm slightly annoying. But what code is truly one-off?

Style matters

The communities of both Elm and Rust make a big push for consistent presentation of source code. At the very least, this reduces the need for lengthy project-specific style guidelines at every team using the language. To be specific, Elm compiler enforces indentation for certain language constructs, does not allow Tabs for identation(!) and enforces that types begin with an upper-case letter while functions begin in lower-case. Further, there is elm-format, a community-endorsed source formatter.

In a similar vein, Rust compiler gives warnings if you do not stick to official naming conventions and also provides a community-endorsed formatter rustfmt.

More About Elm

Now is the time to talk about the languages individually, if you are still interested. We will take Elm first. Elm is a simple, small language. The complete syntax can be documented on a single page. Elm aimes at people already using Javascript and strives for low barrier of entry. Elm is currently at version 0.18 and new releases regularly bring backwards-incompatible changes (although official conversion tools are available). An interesting thing is that over the last few versions more syntax elements were removed than added, testifying to the focus on language simplicity.

Elm is purely functional. This means there are no variables in the classical sense, everything is a function. How does an application evolve over time if there are no variables? This is handled by The Elm Architecture (TEA). On the most simplistic level, an Elm application consists primarily of an update function and a view function. The update function takes a previous state of the application and input from the user/environment and returns a new state of the application. The view function than takes the state of the application and returns a HTML representation. All changes to the application state thus happen outside of Elm code, within the native code in TEA. The architecture also provides the necessary magic to correctly and efficiently update the DOM to match the latest view result.

TEA forces you to explicitly say what constitutes the state of your application and its inputs. This lets Elm to provide its killer feature: the time-travel debugger. In essence, when the debugger is turned on, you can replay the whole history of the application and inspect the application state at any point in past. And due to the way the language is designed, it works 100% of the time.

Another big plus of TEA is that you never have to worry about forgetting to hide an element when the user clicks a checkbox. If your view function correctly displays the element based on the current application state, the element will also be automatically hidden once the application state changes again.

Further sweet things about Elm is the effort to have nice and helpful error messages, with a dedicated GitHub repository for suggesting error message improvements. Also the record system which gives you a lot of freedom in using structured types (e.g., you do not have to declare them before use), but at the same time is statically checked for correctness.

Pain Points in Elm

A big downside of TEA is that it assumes that all state of the application can be made explicit. This makes working with HTML elements that have a state of their own tricky in certain contexts (e.g., text area contents, caret position in text areas, Web Components). You need care to prevent TEA from messing with such components destructively. Further, TEA can be resource intensive, albeit less than comparable JS frameworks. Last but not least, creating large apps in Elm involves writing a significant amount of boilerplate code. The Elm community is still discussing how to develop large projects more easily.

More About Rust

Whoa, that’s a lot of new syntax! (Rust book, section 4.34 on Macros )

In comparison with Elm, Rust is quite the beast. There is a lot of syntax and a lot of things to learn. This is however not unexpected: if you want to write fast code, you really need a lot of control. Also, C and especially C++ also have loads of syntax, so Rust is definitely not at a big disadvantage here. Rust is currently at version 1.15 and has forward compatibility guarantees.

While Rust is imperative, it took in a lot of useful functional programming concepts and boasts zero cost abstractions - i.e. that all the fancy syntactic tricks that let you develop code easily incur no actual performance penalty in comparison with a hand-tuned but dirty solution.

Rust also has no OOP of the usual kind, instead it has traits (a bit like interfaces) and deliberately avoids inheritance (you should compose instead).

The weirdest and most interesting part of Rust is the borrow checker. While Rust does not have managed memory (garbage collection), it can still guarantee that you cannot access uninitialized memory, dereference a null pointer or otherwise corrupt your memory. This has big implications not only for reliability but also for security, as Rust automatically prevents whole classes of severe attacks as buffer overflow or Heartbleed (blog post). Rust also prevents most (but not all) memory leaks. The borrow checker is what enables a big portion of those guarantees by validating that your program accesses memory correctly at compile time, i.e. without the runtime penalty of managed memory. The borrow checker ensures that a mutable reference to a piece of data cannot coexist with any other reference (and thus that you cannot free memory while holding a reference to it). For some intuition, mutable references in Rust behave a bit like std::unique_ptr in C++ (specs), but with the uniqueness enforced at compile-time. More detailed description could not fit here, so check Rust by Example or just Google away :-).

Pain Points in Rust

The borrow checker is both the biggest strength and the biggest weakness of Rust. Although the Rust community took a lot of effort to make most code just work, you inevitably end up fighting the borrow checker. There are some promising updates to the borrow checker in the pipeline that could make the life of Rust programmer easier, but it will not be cakewalk anytime soon - making the compiler understand your program is hard (both for the programmer and for the compiler).

While Rust takes performance seriously and the compiler should in theory be able to do a lot more optimizations than C/C++, Rust is not quite there yet. Benchmarks I’ve seen put it equal or slightly behind C/C++ on gcc (e.g. Benchmarks game). From my memory gcc also used to produce slower code than MSVC or the Intel compiler which would be bad news for Rust. The Internet however suggests that recent gcc is on par with MSVC/Intel, but I was unable to find any good benchmark link.

Development in Rust also still has some rough edges, IDE support is incomplete - setting up a decent debug environment maybe as much as a 14-step process and still the features are limited.

Concluding

The same way functional programming has made its way from fringes to being included in mainstream languages, I believe the features that make both Elm and Rust interesting will show up in the mainstream. Some of the ideas can also be immediately transferred to the current languages (e.g. ImmutableJS). I think the take-home message of this post is that you should consider learning a new language. Preferably one that is very different from what you have been working with so far. No only it is fun, it will make you a better programmer in your language of choice.

I’ll be very happy if you provide your feedback on this post either here, on my Twitter or on Reddit.


All content is licensed under the BSD 2-clause license. Source files for the blog are available at https://github.com/martinmodrak/blog.