Prelude to Vectors in the Rust Programming Language

In this post, we are going to explore in detail how to work with resizable arrays in Rust. Specifically, we will take a closer look at the Vector type, its syntax, and some use cases like filtering and transforming a collection.

Introduction#

In software development, we often face the need to deal with a list of objects or values. For example, enumerating words or ingesting series of numeric values, parsing structured data from tables or data storage like CSV files or a database.

Also referred to as collections, such data structures serve as a container of discrete values, offering a great facility to organise all kinds of dynamic data in a program. The Rust Standard Library includes several different kinds of collections like static arrays, tuples, vectors, strings and hashmaps.

Our focus here is set on one of the more commonly used array-like types - the Vector type. Unlike arrays and tuples, collections of type Vector are dynamic which means they can be changed at runtime, making them a versatile and convenient data type.

What You Will Learn#

In this post, we are going to explore the following aspects of the Vector type in Rust:

  • Definition and semantics of the Vector collection type.

  • How to create and use variables of type Vector.

  • Techniques to access and change elements of a Vector.

  • Adding and removing elements from a collection of type Vector.

Finally, we are going to take a moment to review how Rust’s internal safety mechanisms protect the developer from performing potentially unsafe operations.

Prerequisites#

This tutorial assumes you are familiar with the Rust language syntax and have a general understanding of how a program allocates memory (e.g. heap vs stack).

What is a Vector#

We often use terms like collections and arrays to describe structures of numbered lists of items. The vector type in Rust is one example of such a structure and it is the most commonly used form of collection. It has the type Vec, it is pronounced “vector”.

The basic structure of a vector can be seen as a combination of the following information:

  • Pointer to the data - the address of where the data is located in memory.

  • Capacity - the amount of memory presently reserved for the vector (determines how many elements can be contained within the vector or how much it can “grow”).

  • Length - the number of elements currently present in the vector.

A vector can always be represented as a tripled of these 3 values: pointer, capacity, and length.

Also, the basic nature of the Vec type allows us to make use of several guarantees provided by the Rust runtime.

For example, the pointer of a vector can never be null as the Vector type is null pointer optimized. If a vector is empty (contains no elements) or all elements are zero-sized, then Rust ensures that no memory will be allocated for the vector.

It is important to make a distinction between the memory allocated for elements currently present in a vector and the capacity of the vector.

The capacity() of a vector indicates the number of elements that can be added to a vector without re-allocation of memory. It can be seen as a sort of reserved or pre-allocated memory.

You can learn more about the guarantees and memory specifics of the vector type in the documentation.

How to create a new vector?#

There are several ways we can define a new vector.

Using Vec::new#

A vector can be initialized using the Vec::new function which returns a new empty vector. Once created, the new vector variable can be marked as mutable (using the mut keyword) in order to be able to add and remove elements from it.

Try it out for yourself.

It is worth pointing out that we did not declare the type of elements we intend to add to the collection. Let's see what happens if we declare the vector as we did above, but don't insert any elements to it e.g.:

This statement alone will not compile and the error message will be cannot infer type for type parameter "T". This happens because the Vec type uses generics in order to specify the type of elements that will be added to the vector collection.

In the first example, we added elements to the vector and so the Rust compiler was able to infer the type of the variable vec to be Vec<&str> as the elements being added to it are of type &str.

In the second example, only initializing a new empty vector was not enough for the Rust compiler to determine what kind of elements we intend to store this causing raising a compiler error.

In order to solve this, we can choose to explicitly specify the type of the vector collection during initialization. For example:


Using vec! macro #

The syntax of the Vec::new function may seem a bit verbose as we first need to initialize a mutable variable and only then add elements to it.

Luckily, Rust also includes the vec! macro which adds certain facilities to make it easier to initialize new vectors by also providing the initial elements in the collection.

We could rewrite our example in order to use the vec! macro instead of the Vec::new function as follows:


Since the initial elements of the vector are known upfront, this can be made even more concise, by directly initializing the vector with the initial elements:

Try it out for yourself.

Elements of a Vector#

Now that we know how to create a vector, let's have a look at some of the techniques we can utilize in order to access the contents (or elements) of a vector.

Length and capacity#

The length of the vector corresponds to the number of elements currently being stored by the vector. We can obtain the length using the len() function:

The capacity of the vector is the number of elements that a vector can hold without the need to reallocate additional memory.

Capacity is one of the key concepts which makes the Rust vector such a versatile and efficient structure for collections.

A vector normally stores its elements in a memory buffer which can grow over time as new elements are being added to it. By default, the capacity of the vector is automatically adjusted as we add elements to the collection.

When we create a new empty vector, we can choose to define an initial capacity, essentially reserving the initial buffer size of the vector. This means that when new elements are added, the vector will not have to reallocate additional memory as long as there is remaining space in its buffer (capacity).

Let's illustrate with an example.

Try it out for yourself.

If we just create a new empty vector, it has 0 elements and a capacity of 0.

This means adding an element will require the vector to first allocate some memory to increase its capacity to at least 1, in order to accommodate the incoming element.

Of course, this works just fine and may not be a problem at all.

In case we are dealing with a memory sensitive scenario, we can choose to pre-allocate some capacity in advance, preventing the collection from having to reallocate memory when elements are being added.

For example, we can create a new empty vector with an initial capacity for 10 elements:


Accessing specific values within a vector#

The vector type in Rust implements the Index trait, allowing us to directly access elements by index:


Safety first, out of bounds access#

A common source of bugs and security vulnerabilities is what is commonly known as out of bounds access i.e. trying to access an element outside the length of a vector.

While very easy to use, direct access by index has a downside - we may accidentally request an element index which is out of bounds which will cause the program to panic:

To help with that, Rust offers an alternative using the Vec::get function which returns a value of type Option instead, allowing us to gracefully handle this scenario and as a result, improving the reliability of the program:

Try it out for yourself.

Updating elements of a vector#

Adding and removing elements of a vector#

A mutable vector can be changed by adding or removing elements from it. We do this using the Vec::push and Vec::pop functions. Respectively they either append an element to the end of a vector or remove the last element of a vector.

The Vec::pop method also returns an Option value which either holds the removed element or None if no elements were removed from the vector (e.g. when it was already empty).

Try it out for yourself.

Mutating elements in-place#

In some situations it may be useful to update an element which already belongs to a vector:

Try it out for yourself.

You may notice that we are directly accessing an element by index. Like we saw earlier, given an index outside the bounds of the vector the application will panic.

We can use a technique we showed earlier with the Vec::get method and its companion Vec::get_mut which returns a mutable reference to an element, if it exists. We can then rewrite the above example in a safer way as follows:

Like get, the get_mut method returns an Option with a reference to the element at the given index. If the element doesn't exist (when the index is out of bounds), get_mut returns None. If the element exists, get_mut returns a mutable reference which we can use to update the value.

To change the value that the mutable reference is referring to, we make use of the dereference operator (*). You may learn more about it in the Rust documentation

Working with Vectors#

Access all elements#

If we would like to perform a certain operation over each element in a vector, we can iterate through all elements rather than accessing them one at a time. One way would be to use a for loop:

In this case, we are consuming the vector by executing the operation defined in the for loop block over each element of that vector.

We could also limit the operation to just references to the elements of the collection.

Using the same technique, we can obtain a mutable reference to the elements, allowing us to affect changes to the collection:

Try it out for yourself.

Enumerate all elements of a vector#

Another powerful technique to access the elements of a vector is through means of an iterator.

To obtain an iterator over a vector, we use the Vec::iter method:

Try it out for yourself.

Generally speaking, Rust makes it easy to use iterators for almost everything. In the ergonomics of the language, it is almost preferred to use iterators instead of directly interacting with a vector.

An example use of iterators will be the case of transforming the values of a collection from one type to another. Given a collection of words, let's build a vector which holds the length of each word:

Try it out for yourself.

  • We use the .iter() method on the vector of words in order to obtain an iterator which will give us access to each element.

  • We use the .map() method to execute a closure over each element yielded by the iterator. In other words, the closure is executed for each word.

  • We use the collect() method in order to transform the result of the map operation into a vector of type Vec<usize> which the list of the lengths of each word.

Filter the elements of a vector#

Iterators are a powerful concept in Rust and they prove to be very useful when we are interested in obtaining a subset of a given collection. We can use the Vec::filter method in order to filter the elements of a vector:

Try it out for yourself.

  • We use the .into_iter() method in order to obtain a consuming iterator over all elements. A consuming iterator moves each value out of the vector allowing us to create a new vector with only the matching elements.

  • We use the .filter() method in order to check which elements of the vector should be yielded. For such elements, the closure given to the filter method needs to return true.

  • We use the collect() method in order to transform the result of the filter operation into a vector of type Vec<&str>.

Removing duplicates from a vector#

Nothing prevents our application from adding the same value multiple times to a vector.

There are however circumstances when it may be needed to remove the duplicates from a collection. Imagine for example, if the vector is based on user-provided data and we are only interested in working with unique (non-repeatable) values.

This is easy to achieve using the Vec::dedup method. Once called on an instance of a vector, dedup works on that same instance and removes consecutively repeated elements. This means that for the deduplication logic to work as we expect, the vector needs to be sorted so that repeating elements follow each other.

For example [1, 3, 2, 3] will not work very well because the repeating values 3 are not adjacent to each other. Once the vector is sorted to [1, 2, 3, 3] we can make use of the Vec::dedup method in order to remove the repeating values.

The Vec::deduce needs the elements of the vector to implement the PartialEr trait in order for the comparison to work. This means it can also work for custom structs, as long as they implement PartialEr.

Let's check an example:

Try it out for yourself. Here we declare the vector as mutable since Vec::dedup updates the contents of the collection in place.

Conclusion#

A common type to represent resizable arrays is the Vector type. It is one of the more versatile collection types, enabling a great deal of flexibility when accessing and working with its elements.

In this post, we saw how to get started with using the vector type for common operations like filtering and transforming a collection of elements. In addition, we discussed some of the safety protections provided by the Rust runtime in such scenarios like guarding against out of bounds access or null pointers.

What's Next#

You may also find it useful to explore the Vec specification from the Rust documentation where you can read about all available methods and additional sample use cases and code snippets.

You can also check one of my other posts which cover additional use cases for using Iterators with vectors in Rust.