Sam Everett, Lead Software Developer
Taking out the garbage
Programming languages of today mostly follow either of two different options when it comes to memory management: garbage collection, or manual allocation by the programmer, but what if there was a third approach? C++ developers may be familiar with the pattern known as Resource Acquisition Is Initialization, where memory is deallocated at the end of an item’s lifetime. What would it look like if this feature was built into a language’s compiler?
An answer to this question is explored in Rust, a language developed by its open source community. Rust handles memory with a concept called ownership. The aim is to be the best of both worlds, prevent the programmer from making memory allocation mistakes (we’ve all been there), and reduce the overhead of garbage collection.
In short, they want to optimize the developer experience and the performance as well. As we’ll see, it’s a simple concept, but it has massive implications that ripple throughout the design of Rust.
Okay, but like how does it work?
Before we get into the nitty gritty, there are three main rules to ownership:
Each value has a variable dubbed it’s owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped, and the memory freed.
Let’s look at a simple example:
As in many languages, we can establish a scope simply by enclosing a section of code in curly braces as seen in the picture above. Line 2 initializes a String slice, allocating memory for it and assigning its value to the variable ‘s’. As soon as execution reaches the closing curly brace on line 4, the variable goes out of scope, loses validity and all the heap memory allocated for the String slice is dropped (Note: I will be using the words dropped and freed interchangeably).
Notice that we the programmers did not have to take any steps to free the memory. Imagine waking up in your bed, and having the bed make itself as soon as you left the room!
Now what happens if two variables are assigned to the same value?
In the above example, line 2 initializes ‘x’ as an int with the value 5, then line 3 initializes ‘y’ as ‘x’. What do you think happens? In this case, a copy of the value 5 is assigned to ‘y’ which is pushed onto the stack. That’s all well and good for fixed-size values like ints, but what about variable size items stored in heap memory? Line 5 initializes the variable ‘s1’ assigns it to a String and allocates memory for that String.
There are a few possibilities for what happens on line 6. Is it A, ‘s2’ become a copy of ‘s1’? Is it B, both ‘s1’ and ‘s2 are’ maintained as pointers to the same piece of memory? Or is it option C: create a copy of the string ‘s1’ and allocate new memory for it? Actually, it’s none of these. Rust simply moves the value of ‘s1’ to ‘s2’, so that ‘s2’ points to the memory allocated for our String, while ‘s1’ becomes invalid.
What happens if we try to access ‘s1’ after line 6, say by printing it to the console?
We get this compiler error 😡. This message illustrates an important concept within Rust’s ownership, that is the concept of moves. When ‘s1’ was assigned to ‘s2’, it’s value was moved, as noted in the error message, no worries!
If we wanted to make a deep copy of s1 in s2 we could use Rust’s clone function:
Now all is well! 😇
How does ownership work when passing variables around with functions?
Let’s look at examples comparing fixed size vs. variable size items again:
Okay so as you’ll notice with this example, we are passing variables in by value, we’ll get to references in a bit. When we pass a variable size item like a String into a function, the ownership of that String moves to the corresponding function parameter, in this case, ‘some_string’.
As seen before, as soon as the variable ‘some_string’ goes out of scope at the end of the function, the memory is freed and the variable becomes invalid. This also means that the variable ‘s’ originally passed in is now invalid, because the memory initially allocated for it is the same memory freed at the end of the ‘takes_ownership’ function.
So what about fixed-size items? As with many other languages, when such a value is passed in to the function, a copy is pushed on to the stack. Just as with the previous example ‘some_integer’ value becomes invalid and is removed from the stack at the end of the function ‘makes_copy’. However, the ‘x’ value originally passed in can still be used after the function call to ‘makes_copy’ within ‘main’.
What do you do if you need read-access to certain properties of an item passed into a function, but you don’t want to take ownership? Use a reference to borrow the item!
In the following example we’ll use a function to calculate the length of a string.
(Trivial I know, but hey, this is a blog after all.)
As with C++, references are denoted with an ampersand (syntax note: in Rust there is no ‘return’ keyword, you just leave the value you want to return floating at the end). The most interesting and important aspect of this snippet comes at the end of the ‘calculate_length’ function, where the ‘s’ reference goes out of scope and becomes invalid.
Since this was only a reference pushed onto the stack with no heap memory of its own, nothing additional happens and we can happily print the value of both ‘s1’ and ‘len’ after the ‘calculate_length’ function call on line 6, with no apprehension about compiler errors 😁.
Okay, that’s great, but what if you want access to change a value passed in by reference within a function? Another great question! In Rust, all fixed-size values can be changed anytime. They are mutable. However, their variable-sized cousins are immutable by default and must be declared mutable using the ‘mut’ keyword, otherwise you get a compiler error.
Let’s see an example:
Here we try to append a string literal onto the String reference we have passed in, and we get the following compiler message:
Let’s fix this. To do so, we will add the ‘mut’ keyword in three places:
On line 2, to declare that the string itself is mutable.
On line 7, we make sure that the function signature declares the ‘some_string’ variable to be a mutable reference
On line 4, we make sure that the reference you pass into the function is a mutable reference.
And now we’re good! 😁
But wait there’s more!
Once you start including mutable references there are some rules enforced by the compiler that prevent data races. Data races occur during these three circumstances:
● Two or more pointers access the same data at the same time
● At least one of the pointers is being used to write to the data.
● There’s no mechanism being used to synchronize access to the data.
Data races cause undefined behavior and can be difficult to diagnose and fix when you’re trying to track them down at runtime. The Rust compiler prevents all this simply by checking to make sure that anytime we make a mutable reference to an item, it’s the only reference to that item, immutable, or mutable.
Now you can run programs without this stress, and after a little while you’ll probably forget that data races were even a thing (until you go back to C++ 😭).
Remember dangling references?
A dangling reference is something that occurs when a pointer references a location in memory that may have been given to someone else, by freeing some memory while preserving a pointer to that memory. In Rust, the compiler guarantees this will never occur.
Let’s try and create one! 😈
As you can see, we initialize a variable and allocate some memory for it’s string value. Then we try to return a reference to this variable. As we’ve seen before, the ‘s’ variable will become invalid, and its memory freed at the end of the scope on line 9. Therefore, the reference returned would refer to nothing 😝.
The compiler is wise to our tricks though, and correctly points out:
The key thing here is the ‘help’ message stating that we are trying to borrow the value of something that has gone out of scope and is no longer valid.
One fix for this situation would be to simply return the string, passing ownership to the receiving variable:
This brings up an important point, functions pass ownership to receiving variables, so in this case, the string is initialized, and its memory allocated within the ‘fixed’ function, and though the ‘s’ variable loses its validity at the end of the function, ownership over the memory it pointed to moves to the ‘receiving_ownership’ variable.
That’s it folks.
Rust’s concept of ownership in memory management saves a lot of headaches. You may experience more headaches at first, but that’s just the compiler catching all of your mistakes 😜. Trust me, I’ve done the math, the total number of headaches is much less.
Java zealots out there might still have that smug grin, because they don’t waste their time dealing with memory, but remember that Rust has basically no runtime, it’s like saying that CRT tv’s look better for Smash Bros. It may be true, but why lug that thing around? And if you’re a C++ soldier, you may be interested to know that in terms of performance, Rust is neck-in-neck with the ‘ol ++.
Of course, no one wants to rewrite millions of lines of code, not even Mozilla, but the next time you are worrying about frees and allocs, or you have a memory leak, think about trying Rust, you might never look back.