Using Vectors

Steven Zeil

Last modified: Oct 25, 2021

Contents:

1 Keeping Information Together

1.1 Vectors

1.2 Adding elements to a Vector

1.3 Removing Elements from a Vector

2 Example: Computing the Median

3 Vectors versus Arrays

The std::vector template is a more convenient replacement for dynamically allocated arrays. It provides, in essence, an array that grows as necessary to accomodate the amount of data that we need.

Unlike dynamically allocated arrays, however, it manages its own memory, can be copied, compared, and passed to functions much like any other class.

1 Keeping Information Together

One criticism of typical array-manipulation functions, such as

/**
  * Add a value to an array, keeping all elements in order.
  *
  * @param array an array of strings, with elements 0..size-1 already in order.
  * @param size the number of elements in the array
  * @param capacity the number of slots allocated for the array. If 
  *                 size >= capacity, there is no room to add new elements and
  *                 this function will fail.
  * @param value the string to be inserted into the array
int addInOrder (std::string* array, 
    int& size, 
    int capacity
    std::string value)
{
    assert (size < capacity); 
    int k = size;
    while (k > 0 && array[k-1] > value)
    {
       array[k] = array[k-1];
       --k;
    }
    array[k] = value;
    ++size;
}

is that they separate the array, the size, and the capacity

It’s easy for programmers to lose track of which integer counter applies to which array.
It’s easy to lose track of the difference between the capacity (the number of elements that can fit in the array) and the size (the number of elements in the array that contain useful data).
It’s just plain messy to pass this information as separate parameters.

Wrapping arrays within structs

One solution: use a struct to gather the related elements together:

/// A collection of items
struct ItemSequence {
   static const int capacity = 500;
   int size;
   Item data[capacity];
};

In fact, that’s pretty much what std::array does, but that doesn’t help when we don’t know the required capacity until the program is already running.

That’s where the vector comes into play.

1.1 Vectors

The vector is an array-like structure provided in the std header <vector>.

Think of it as an array that can grow at the high end

vector is a template, so you have to give the element type to instantiate it when you want to create or pass vector objects:

std::vector<int> vi; // a vector of 0 ints
std::vector<std::string> vs (10); // a vector of 10
                      //   empty strings
std::vector<float> vf (5, 1.0); // a vector of 5 
                    //    floats, all 1.0

Accessing Elements in a Vector

Use the [ ] brackets just as with an array:

vector<int> v(10, 0);
for (int i = 0; i < 10; ++i)
  {
   int j;
   cin >> j;
   v[i] = j + 1;
   cout << v[i] << endl;
  }

We can also ask a vector for its current size:

void foo (vector<int>& v) {
  for (unsigned i = 0; i < v.size(); ++i)
    {
     int j;
     cin >> j;
     v[i] = j + 1;
     cout << v[i] << endl;
    }
}

This brings us to the biggest stylistic difference between working with vectors and arrays:

When we work with arrays, we allocate the maximum space (capacity) that we think we will need, and then keep separate track of how much of that capacity we are currently using (size). Commonly, only a small portion of our arrays actually contain useful data at any given time.

When we work with vectors, we insert exactly as many elements as we actually have. The vector itself expands its capacity as necessary and keeps track of how much of that capacity is actually in use (the size) at any time.

We can ask a vector for its current capacity:

unsized cap = myVector.capacity();

but that value isn’t fixed the way that it is for arrays. If we try to add enough data that the size() would be greater than the capacity(), the vector will increase its capacity(). In fact, what the capaacity() means is simply how large the vector can grow before it will need to grab more memory. We’ll see how this actually happens in a later lesson.

1.2 Adding elements to a Vector

The square brackets [ ] allow us to look at and assign to already existing elements in the vector:

v[i] = v[j] + 1;  // Valid only if i < v.size() && j < v.size()

You can’t add new elements to a vector by simply assigning to them:

vector<int> v;  // v.size() == 0
for (int i = 0; i < 100; ++i)
    v[i] = i; // Crash! (If we're lucky.)

Instead, the way we usually add elements to a vector is by pushing them, one at a time, onto the back of the vector:

vector<int> v;  // v.size() == 0
for (int i = 0; i < 100; ++i)
    v.push_back(i); 
// v.size() == 100, v.capacity() >= 100

It is possible to add something to the middle of a vector

v.insert(42, pos);

where pos is an iterator, but this operation is slow enough that we should be careful about using it.

The vector equivalent of our earlier array-based addInOrder function would be

/**
  * Add a value to a vector, keeping all elements in order.
  *
  * @param v a vector of strings, with elements 0..size-1 already in order.
  * @param value the string to be inserted into the array
int addInOrder (std::vector<std::string>& v, std::string value)
{
    int k = v.size();
    v.push_back(value); // Increase the size by 1, so that we
                        // have room for the new element
    while (k > 0 && v[k-1] > value)
    {
       v[k] = v[k-1];
       --k;
    }
    array[k] = value;
}

1.3 Removing Elements from a Vector

We can quickly remove elements from the end of a vector:

v.pop_back();

This operations will decrease v.size() by 1.

It is possible to add something to the middle of a vector

v.erase(pos);

where pos is an iterator, but this operation is slow enough that we don’t want to do this very often.

2 Example: Computing the Median

The “median average” of a collection of numbers is the middle number when they are arranged in sorted order (or the midpoint between the two middle values if we have an even number of values).

Let’s write a function to read a series of numbers from an input stream and to compute the median average.
* We don’t know, ahead of time, how many numbers will be in the input. We won’t know until we hit the end of the input. * We will assume that the numbers in the input are sorted (arranged into ascending order).

To illustrate the difference in style between working with arrays and vectors, we’ll two two versions of this function. First: the array-based version:

double median1 (std::istream& input)
{
    const int MaxInput = 1000;
    double numbers[MaxInput];
    unsigned n = 0;
    double x;
    while (input >> x)  // read until end of input
    {
       assert (n < MaxInput); // Abort if too much input
       numbers[n] = x;
       ++n;
    }
    double median = numbers[n/2]; // If n is odd
    if (n % 2 == 0)
       median = (numbers[n/2 - 1] + numbers[n]) / 2.0;
    return median;
}

then the vector version:

double median2 (std::istream& input)
{
    vector<double> numbers;   // numbers is initially empty
    double x;
    while (input >> x)  // read until end of input
    {
       numbers.push_back(x);
    }
    unsigned n = numbers.size();
    double median = numbers[n/2]; // If n is odd
    if (n % 2 == 0)
       median = (numbers[n/2 - 1] + numbers[n]) / 2.0;
    return median;
}

Differences of note:

We don’t need to ask if we have too much data in the vector version, because we don’t have to guess at a preset maximum when we write the code.
- Although there is a maximum size for a vector, it’s based upon the largest block of memory the operating system will allow us to allocate to a program. We rarely have to worry about getting that much data. (The one exception would be if we had a vector<T> where T is a data type that is itself quite large. However, we would also face problems allocating large arrays of T in that case.)
The vector starts at size 0, and we grow it one element at a time via push_back.
We don’t need to write our own code to track the current size of the data (n) – the vector does that for us.
Once we have the data inserted into the vector, we afterwards can access it just like we would the data in an array.

3 Vectors versus Arrays

Advantages of Vectors

Can grow as necessary
Need not worry about pointers, allocation, delete
Vectors can copy (v1 = v2;)
Vectors can be compared (v1 == v2, v1 != v2, v1 < v2, etc.)

Disadvantages of Vectors

A bit slower overall than arrays

Turning on compiler optimization with -O2 relieves a lot of this.

Individual calls to push_back vary considerably in time required. We’ll explore this more later.
Can waste a lot of storage
- but so can arrays if we have to guess at the required maximum capacity.
Harder to work with in a debugger