Performance of C-style arrays vs C++ std::vector

by Martin D. Maas, Ph.D

The floating-point performance of various array-type structures in C++ can be confusing, as there are several options to pick from, and different ways to use them. This article presents a simple benchmark to try to clarify some of the basic differences.

In C++, there are several options to store a sequence of values, but the most common options are std:vector and C-style arrays.

While there is no doubt that std::vector is easier to use and offers important safety guarantees, it’s not obvious that we can always obtain an optimal performance using standard vectors, as usually there is a trade-off between abstraction and performance.

According to the general wisdom, however, std::vector is an array under the hood, and the compiler should be smart enough to optimize out all extra abstraction and make them as fast as raw arrays.

But… is this true? Is the compiler that smart?

Well, as we shall see, a lot depends on how are you using these vectors or arrays.

Some differences between std::vector and arrays

Before diving into some benchmarks, let’s discuss some of the fundamental differences between C++ vectors and C arrays.

According to the C++ standard, std::vector is implemented as a combination of two stack pointers and a plain array. However, std::vector will also do some sanity checks, especially when compiled in debug mode, which will have an associated cost.

This is actually an interesting feature of std::vector, as using them will save you time debugging run-time errors, and when compiled with optimization flags, these checks do not take place.

Standard vectors are also more flexible, enabling dynamic memory management. As such, are more prone to being misused. For example, when constructing a std::vector without reserve, due to copying of its elements when exceeding its capacity, as we shall verify, they can be indeed be orders of magnitude slower than a raw array.

Stack, heap, and pre-allocation.

Yet another aspect to have in mind is that memory can be allocated on the stack or on the heap, which can impact performance to some extent.

Allocating on the stack is easier with C, as since C99, C supports variable-length arrays (VLA) which are stack-allocated. While the C++ standard doesn’t allow this, most compilers offer VLA as an extension to C++.

In contrast, std::vector will normally be allocated on the heap by default.

While using the stack is generally faster (and we will see by how much), it is also the cause of bugs and security risks (the infamous stack overflow).

Of course, another way to use memory is to pre-allocate a chuck of memory, and then reuse it. This can be done with any type of vector or array. Interestingly, this has an important impact on performance.

We will compare all these options in a simple test case, to see if there is any difference.

Benchmarks and use cases

As usual, when creating small benchmarks, the overall goal should be gaining insight, and the example should be motivated by our particular use case.

There are, of course, a number of possibilities and use cases:

  • The size of the array is big or small.
  • The size is fixed, or changes dynamically.
  • The size is known, or unknown, at compile time.
  • We are frequently allocating new arrays, or reusing the same array repeatedly.

In my case, I wanted to test the performance of relatively small arrays, used within a hot loop (i.e the same chunk of memory is used repeatedly)

So, as a minimal example, I considered the problem of adding the first NN terms of the following sum:

n=1N1n2\begin{equation} \sum_{n=1}^N \frac{1}{n^2} \end{equation}

In order to run our benchmark, we can write a simple program which does the following:

  • Allocate memory for an array of size NN (a number which is not necessarily known at compile time).
  • Fill the memory with the inverse squares of numbers from 11 to NN.
  • Perform the sum of all the numbers.

Of course, the tests will be performed using optimization flags turned on.

C-style stack vs heap arrays

The first thing I was eager to test was if there was any difference with heap vs stack allocation when using C-style arrays.

As most C++ compilers tolerate the C99 stack-allocated variable-length arrays and support it as an extension, coding this function using stack-allocated arrays comes as simple as:

double sumStackArray( int N ) {
    double smallarray[N];
    for(unsigned int k = 0; k<N; k++){
        smallarray[k] = 1.0/pow(k+1,2);
    }

    double sum = 0.0;
    for(unsigned int k = 0; k<N; k++){
        sum += smallarray[k];
    }
    
    return sum;
}

Now, for heap-allocated C-style arrays, we can use new and delete

double sumNewArray( int N ) {
    double * smallarray = new double[N];
    for(unsigned int k = 0; k<N; k++){
        smallarray[k] = 1.0/pow(k+1,2);
    }
    
    double sum = 0.0;
    for(unsigned int k = 0; k<N; k++){
        sum += smallarray[k];
    }
    delete [] smallarray;
    return sum;
}

The results I get (with times in microseconds after 10.000 repetetions) are the following:

N10100100010000
StackArray78652655765860
NewArray182748697166817

So it seems that yes, for the case of extremely tiny arrays, allocating on the stack is faster than on the heap, but the speed-up factor diminishes rapidly as the size of the array increases. However, this is still not the full story, so read on.

C++ std::vector: reserve vs push_back

Now let’s turn our attention to std::vector.

When using std::vector with a fixed-size vector, the best option seems to be using the reserve method to allocate space, and then access each element directly.

std::vector<double> smallarray;
smallarray.reserve(N);

Alternatively, we can simply initialize the vector using the following syntax.

std::vector<double> smallarray(N);

For the sake of comparing with a dynamic implementation which does not reserve space in advance, we can also code this using the push_back method.

std::vector<double> smallarray;
for(unsigned int k = 0; k<N; ++k){
    smallarray.push_back(1.0/pow(k,2));
}
N10100100010000
StdVector252165615624153966
StdVector_pushback1510515939054373518

So yes, using standard vectors with either the reserve keyword or direct initialization will be substantially faster than using the fully dynamic push_back function. However, the results so far still show a meaningful performance difference favoring arrays over std::vector.

Let’s see what happens next.

Pre-allocate and reuse within a hot loop

So far our examples have been quite unrealistic, or at least, very suboptimal, as there are very frequent allocations and deallocations during the execution of our program. This is best illustrated in the sumNewArray function, where we can actually see the new and delete keywords in action, but it is taking place under the hood in the other functions as well.

An improvement we can implement is to try to allocate our memory once, and then reuse it as many times as needed. This can be done for each of the different variants of arrays and vectors.

For the sake of brevity, I’m going to omit posting the full code, and refer you to the full code on github for the full details. The main difference is that our hot loop will take a vector (or an array) as one of its arguments:

double sumStdVector_prealloc( std::vector<double>& smallarray  ){
    (...)
}

and then timing with a function that does:

std::vector<double> smallarray(N);
// smallarray.reserve(N);
for (int i=0; i<REP; i++){
    x = sumStdVector_prealloc(smallarray);
}

The new timings go as follows:

N10100100010000
StackArray_prealloc78652663967532
NewArray_prealloc84738673073570
StdVector_prealloc88768720471952

So, we see that with the exception of the tiniest case (arrays of size 10), when using pre-allocated memory, std::vector now becomes competitive with C arrays, providing both convenience and performance.

Conclusion

This is how the entire comparison looks like (setting the stack-array as a benchmark):

N10100100010000
StackArray1111
NewArray2.331.151.061.01
StdVector3.232.542.382.34
StdVector_pushback19.367.915.965.67
StackArray_prealloc1.001.001.011.03
NewArray_prealloc1.081.131.031.12
StdVector_prealloc1.131.181.101.09

The conclusion is that, when used properly with pre-allocated memory, std::vector are almost as fast as raw C-style arrays. However, being a more flexible structure, std::vector have a larger room for suboptimal performance. In order to make good use of them, pre-allocation and memory reuse is strongly recommended whenever performance is important.

Also, it doesn’t seem to matter much weather the arrays being used within a hot loop are originally allocated on the stack or the heap. As long as the compiler is constantly referencing the same memory addresses, the CPU will be able to maintain the memory on the CPU cache, guaranteeing maximum performance.

By the way, in case you want to re-run these tests yourself, remember that you can find the complete code on github.