Prelude to Vectors in the Rust Programming Language
In this post, we are going to explore in detail how to work with resizable arrays in Rust. Specifically, we will take a closer look at the Vector type, its syntax, and some use cases like filtering and transforming a collection.
In software development, we often face the need to deal with a list of objects or values. For example, enumerating words or ingesting series of numeric values, parsing structured data from tables or data storage like CSV files or a database.
Also referred to as collections, such data structures serve as a container of discrete values, offering a great facility to organise all kinds of dynamic data in a program. The Rust Standard Library includes several different kinds of collections like static arrays, tuples, vectors, strings and hashmaps.
Our focus here is set on one of the more commonly used array-like types - the
Vector type. Unlike arrays and tuples, collections of type
Vector are dynamic which means they can be changed at runtime, making them a versatile and convenient data type.
What You Will Learn#
In this post, we are going to explore the following aspects of the
Vector type in Rust:
Definition and semantics of the
How to create and use variables of type
Techniques to access and change elements of a
Adding and removing elements from a collection of type
Finally, we are going to take a moment to review how Rust’s internal safety mechanisms protect the developer from performing potentially unsafe operations.
This tutorial assumes you are familiar with the Rust language syntax and have a general understanding of how a program allocates memory (e.g. heap vs stack).
What is a Vector#
We often use terms like collections and arrays to describe structures of numbered lists of items. The vector type in Rust is one example of such a structure and it is the most commonly used form of collection. It has the type
Vec, it is pronounced “vector”.
The basic structure of a vector can be seen as a combination of the following information:
Pointer to the data - the address of where the data is located in memory.
Capacity - the amount of memory presently reserved for the vector (determines how many elements can be contained within the vector or how much it can “grow”).
Length - the number of elements currently present in the vector.
A vector can always be represented as a tripled of these 3 values: pointer, capacity, and length.
Also, the basic nature of the
Vec type allows us to make use of several guarantees provided by the Rust runtime.
For example, the pointer of a vector can never be
null as the
Vector type is null pointer optimized. If a vector is empty (contains no elements) or all elements are zero-sized, then Rust ensures that no memory will be allocated for the vector.
It is important to make a distinction between the memory allocated for elements currently present in a vector and the capacity of the vector.
capacity() of a vector indicates the number of elements that can be added to a vector without re-allocation of memory. It can be seen as a sort of reserved or pre-allocated memory.
You can learn more about the guarantees and memory specifics of the vector type in the documentation.
How to create a new vector?#
There are several ways we can define a new vector.
A vector can be initialized using the
Vec::new function which returns a new empty vector. Once created, the new vector variable can be marked as mutable (using the
mut keyword) in order to be able to add and remove elements from it.
It is worth pointing out that we did not declare the type of elements we intend to add to the collection. Let's see what happens if we declare the vector as we did above, but don't insert any elements to it e.g.:
This statement alone will not compile and the error message will be
cannot infer type for type parameter "T". This happens because the
Vec type uses generics in order to specify the type of elements that will be added to the vector collection.
In the first example, we added elements to the vector and so the Rust compiler was able to infer the type of the variable
vec to be
Vec<&str> as the elements being added to it are of type
In the second example, only initializing a new empty vector was not enough for the Rust compiler to determine what kind of elements we intend to store this causing raising a compiler error.
In order to solve this, we can choose to explicitly specify the type of the vector collection during initialization. For example:
vec! macro #
The syntax of the
Vec::new function may seem a bit verbose as we first need to initialize a mutable variable and only then add elements to it.
Luckily, Rust also includes the
vec! macro which adds certain facilities to make it easier to initialize new vectors by also providing the initial elements in the collection.
We could rewrite our example in order to use the
vec! macro instead of the
Vec::new function as follows:
Since the initial elements of the vector are known upfront, this can be made even more concise, by directly initializing the vector with the initial elements:
Elements of a Vector#
Now that we know how to create a vector, let's have a look at some of the techniques we can utilize in order to access the contents (or elements) of a vector.
Length and capacity#
The length of the vector corresponds to the number of elements currently being stored by the vector. We can obtain the length using the
The capacity of the vector is the number of elements that a vector can hold without the need to reallocate additional memory.
Capacity is one of the key concepts which makes the Rust vector such a versatile and efficient structure for collections.
A vector normally stores its elements in a memory buffer which can grow over time as new elements are being added to it. By default, the capacity of the vector is automatically adjusted as we add elements to the collection.
When we create a new empty vector, we can choose to define an initial capacity, essentially reserving the initial buffer size of the vector. This means that when new elements are added, the vector will not have to reallocate additional memory as long as there is remaining space in its buffer (capacity).
Let's illustrate with an example.
If we just create a new empty vector, it has 0 elements and a capacity of 0.
This means adding an element will require the vector to first allocate some memory to increase its capacity to at least 1, in order to accommodate the incoming element.
Of course, this works just fine and may not be a problem at all.
In case we are dealing with a memory sensitive scenario, we can choose to pre-allocate some capacity in advance, preventing the collection from having to reallocate memory when elements are being added.
For example, we can create a new empty vector with an initial capacity for 10 elements:
Accessing specific values within a vector#
The vector type in Rust implements the
Index trait, allowing us to directly access elements by index:
Safety first, out of bounds access#
A common source of bugs and security vulnerabilities is what is commonly known as out of bounds access i.e. trying to access an element outside the length of a vector.
While very easy to use, direct access by index has a downside - we may accidentally request an element index which is out of bounds which will cause the program to panic:
To help with that, Rust offers an alternative using the
Vec::get function which returns a value of type
Option instead, allowing us to gracefully handle this scenario and as a result, improving the reliability of the program:
Updating elements of a vector#
Adding and removing elements of a vector#
A mutable vector can be changed by adding or removing elements from it. We do this using the
Vec::pop functions. Respectively they either append an element to the end of a vector or remove the last element of a vector.
Vec::pop method also returns an
Option value which either holds the removed element or
None if no elements were removed from the vector (e.g. when it was already empty).
Mutating elements in-place#
In some situations it may be useful to update an element which already belongs to a vector:
You may notice that we are directly accessing an element by index. Like we saw earlier, given an index outside the bounds of the vector the application will panic.
We can use a technique we showed earlier with the
Vec::get method and its companion
Vec::get_mut which returns a mutable reference to an element, if it exists. We can then rewrite the above example in a safer way as follows:
get_mut method returns an
Option with a reference to the element at the given index. If the element doesn't exist (when the index is out of bounds),
None. If the element exists,
get_mut returns a mutable reference which we can use to update the value.
To change the value that the mutable reference is referring to, we make use of the dereference operator (*). You may learn more about it in the Rust documentation
Working with Vectors#
Access all elements#
If we would like to perform a certain operation over each element in a vector, we can iterate through all elements rather than accessing them one at a time. One way would be to use a
In this case, we are consuming the vector by executing the operation defined in the for loop block over each element of that vector.
We could also limit the operation to just references to the elements of the collection.
Using the same technique, we can obtain a mutable reference to the elements, allowing us to affect changes to the collection:
Enumerate all elements of a vector#
Another powerful technique to access the elements of a vector is through means of an iterator.
To obtain an iterator over a vector, we use the
Generally speaking, Rust makes it easy to use iterators for almost everything. In the ergonomics of the language, it is almost preferred to use iterators instead of directly interacting with a vector.
An example use of iterators will be the case of transforming the values of a collection from one type to another. Given a collection of words, let's build a vector which holds the length of each word:
We use the
.iter()method on the vector of words in order to obtain an iterator which will give us access to each element.
We use the
.map()method to execute a closure over each element yielded by the iterator. In other words, the closure is executed for each word.
We use the
collect()method in order to transform the result of the map operation into a vector of type
Vec<usize>which the list of the lengths of each word.
Filter the elements of a vector#
Iterators are a powerful concept in Rust and they prove to be very useful when we are interested in obtaining a subset of a given collection. We can use the
Vec::filter method in order to filter the elements of a vector:
We use the
.into_iter()method in order to obtain a consuming iterator over all elements. A consuming iterator moves each value out of the vector allowing us to create a new vector with only the matching elements.
We use the
.filter()method in order to check which elements of the vector should be yielded. For such elements, the closure given to the filter method needs to return
We use the
collect()method in order to transform the result of the filter operation into a vector of type
Removing duplicates from a vector#
Nothing prevents our application from adding the same value multiple times to a vector.
There are however circumstances when it may be needed to remove the duplicates from a collection. Imagine for example, if the vector is based on user-provided data and we are only interested in working with unique (non-repeatable) values.
This is easy to achieve using the
Vec::dedup method. Once called on an instance of a vector,
dedup works on that same instance and removes consecutively repeated elements. This means that for the deduplication logic to work as we expect, the vector needs to be sorted so that repeating elements follow each other.
[1, 3, 2, 3] will not work very well because the repeating values
3 are not adjacent to each other. Once the vector is sorted to
[1, 2, 3, 3] we can make use of the
Vec::dedup method in order to remove the repeating values.
Vec::deduce needs the elements of the vector to implement the
PartialEr trait in order for the comparison to work. This means it can also work for custom structs, as long as they implement
Let's check an example:
Try it out for yourself.
Here we declare the vector as mutable since
Vec::dedup updates the contents of the collection in place.
A common type to represent resizable arrays is the
Vector type. It is one of the more versatile collection types, enabling a great deal of flexibility when accessing and working with its elements.
In this post, we saw how to get started with using the vector type for common operations like filtering and transforming a collection of elements. In addition, we discussed some of the safety protections provided by the Rust runtime in such scenarios like guarding against out of bounds access or null pointers.
You may also find it useful to explore the
Vec specification from the Rust documentation where you can read about all available methods and additional sample use cases and code snippets.
You can also check one of my other posts which cover additional use cases for using Iterators with vectors in Rust.