Temporarily moving elements out of collections
A recurring pattern when working with collections is the following.
Suppose I have a collection, like a Vec<T>, and suppose I would like to take an element out and replace it with a new one.
Fortunately for us, holding a mutable reference is almost the same as having ownership, so we can for example use replace:
What happens if we want the replacement element to depend on the element that we just removed?
This solution is very inefficient!
vec.remove removes the element at idx, and then shifts all of the remaining elements left, which we then proceed to immediately shift back.
This happens since a Vec must always be contiguous in memory.
If we can get away with it, we could pass an &mut T reference in the closure, instead of passing ownership.
For example, this is the approach that the entry API takes.
However, this still requires having the new element at hand at the point when it is replaced; we’ve just deferred the call to replace farther into the closure.
For a more realisticAt least this is the example that motivated me to think about this problem. situation analogous to the above, suppose we in fact want to replace the element at idx with an iterator of elements which can depend on the index.
Here, the analogy to insert is splice:
Our goal is to optimize this implementation to avoid shifting the elements twice, and ideally (for example, if the iterator contains exactly one element) to not shift the elements at all.
Why do we have to be careful?
Let’s do something illegal.
The short answer is: what happens if the closure panics?
If the closure f panics here, the element T could be freed twice: first by the closure itself (which took ownership of T), and then by the Drop implementation of Vec, which cleans up its own memory.
The crux of the issue is that a Vec assumes that all of its elements are always in an initialized state.
A safe solution: using a sentinel value
A safe pattern is to use a cheap default value to swap with the element. The idea is the following: first, swap the default value for the element, then do something with the element, and then swap back.
For example, if T::default() is cheap (for instance if T is an Option<U>):
use swap;
Now if f panics, the original vector could be left in a logically invalid state, but this will not result in memory corruption.
What happens if T does not have a cheap default?
We can just use other values already in the Vec!
Finally, for the replace_iter variant, this is the analogous implementation:
The idea that the data could be left in an invalid state is particularly important in multi-threaded code.
If we access the vector through a mutex, and then our closure panics while calling one of the above replace_with functions, the fact that the data is invalid could result in logic errors in the other thread.
This is referred to as poisoning.
If you are still interested in the sentinel pattern, Niko Matsakis has a nice article about it.
Solutions using unsafe code
Now let’s revisit our earlier problems and solve them using unsafe Rust.
Updating an element in-place with a closure
Let’s start with replace_with.
Let’s recalling our issue from before.
If the closure panics, the element which we are replacing could be freed twice: once by the closure itself while unwinding, and again by the Vec.
A common way to work around this issue is to tell the vector that it is no longer responsible for the memory.
Now, if f panics, e is dropped by the closure, and the elements in the vector preceding e are dropped by the Vec.
In case of a panic, the elements which follow e will be leaked.
This does not cause undefined behaviour; generally speaking leaking is a safe operation.
However, leaking memory isn’t great if we can avoid it.
Replacing an element with an iterator
Now let’s handle the general case in replace_iter.
In order to minimize the amount of unsafe code we have to write, and also to depend on the optimized implementation of splice in the standard library, we can take the following approach:
- Start with the approach from
replace_with: temporarily set the length, useptr::readto load the element we want to replace, and then apply the closure to get the new iterator. - If the iterator is empty, shift the remaining elements back by 1 index, correct the length, and return. Otherwise, it has at least one element, so we can use
ptr::writeto put that element back. - At this point, the vector is back in an initialized state, so we can use
spliceto insert the remaining elements at the next index.
This looks like the following in practice: