Implementing the Vector Class

Steven J. Zeil

Last modified: Oct 26, 2023

Contents:

1 Implementing a vector class

1.1 Retrieving elements by index

1.2 push_back: Adding to the end of a vector

1.3 Coding push_back

2 Performance

2.1 Looking at push_back

2.2 Using reserve() to get a True O(1) Worst Case

2.3 Summary

Now that we have a sense of how to use vectors, let’s talk about how we can implement a vector.

Your text introduces a somewhat simplified version (Vector) of the std interface.

Required Performance

The C++ standard specifies that a legal (i.e., standard-conforming) implementation of vector must satisfy the following performance requirements:

Operation	Speed
`vector()`	O(1)
`vector(n, x)`	O(n)
`size()`	O(1)
`v[ i ]`	O(1)
`push_back(x)`	O(1)
`pop_back`	O(1)
`insert`	O(size())
`erase`	O(size())
`front, back`	O(1)

1 Implementing a vector class

1.1 Retrieving elements by index

We want to support operator[] in O(1) time.
Natural choice is to use an array.

For example, we could declare the data members for a vector like this:

template <typename Object>
class Vector
{
  ⋮
  private:
    int theSize;
    int theCapacity;
    Object* objects;
};

with objects being a pointer to a dynamically allocated array.

The theSize and theCapacity variables track how many elements are in the array and how large the array is, respectively.

This makes many of the operations very easy to implement. For example, indexing into the vector is done by applying the same index to the array objects.

    Object & operator[]( int index )
    {
        return objects[ index ];
    }

    const Object & operator[]( int index ) const
    {
        return objects[ index ];
    }

The at function makes this a bit safer:

    Object & at( int index )
    {
        if( index < 0 || index >= size( ) )
            throw ArrayIndexOutOfBoundsException{ };
        return objects[ index ];
    }

    const Object & at( int index ) const
    {
        if( index < 0 || index >= size( ) )
            throw ArrayIndexOutOfBoundsException{ };
        return objects[ index ];
    }

by checking to be sure the index is in a valid range.

1.2 push_back: Adding to the end of a vector

But the real question is, how do we allow vectors to grow to arbitrary size?

We’ll split this into cases:

1.2.1 Adding to an Array That Has Unused Space

Let’s consider the operation of adding a new element to the end of this vector,

v.push_back(t5);

If there is room in the array, we just add our new data element.

    void push_back( const Object & x )
    {
	      ⋮
        objects[ theSize++ ] = x;
    }

1.2.2 Adding to an Array That is Full

If the data array is already filled to capacity, and we try to add more to it,

v.push_back(T4);

we make another that is twice as big …

First the new data area is allocated.

Then the old data is copied into the new array.

The old array is deleted, and the objects and theCapacity fields updated.

Now we have room add the new element to the end of the vector.

1 of 5

We’ll see later what this does to the big-O complexity for push_back().

1.3 Coding push_back

Now, let’s look at the code to accomplish all this.

Remember the basic steps:

If the data array is already filled to capacity, and we try to add more to it, we make another that is twice as big …
- First the new data area is allocated.
- Then the old data is copied into the new array.
- The old array is deleted, and the objects and theCapacity fields updated.
Now we can add the new element to the end of the vector.

1.3.1 Are we Full?

    void push_back( const Object & x )
    {
        if( theSize == theCapacity ) ➀
            reserve( 2 * theCapacity ); ➁
        objects[ theSize++ ] = x;
    }

If the data array is already filled to capacity ➀ , and we try to add more to it,
then we need to create a new array that is twice as large ➁
- this is accomplished via the function reserve, which is part of the public interface of vector.
  
  v.reserve(n) guarantees that v has enough storage to contain at least n elements without requiring any additional memory allocation.

1.3.2 Reserve: reserving space for future growth

    void reserve( int newCapacity )
    {
        if( newCapacity < theSize )  ➀
            return;

        Object *newArray = new Object[ newCapacity ]; ➁
        for( int k = 0; k < theSize; ++k )            ➂
            newArray[ k ] = std::move( objects[ k ] );

        theCapacity = newCapacity;
        std::swap( objects, newArray );   ➃
        delete [ ] newArray;              ➄
    }

If the vector’s array is already large enough ➀ to hold at least newCapacity elements, then reserve returns immediately without doing anything.
Otherwise, a new array is allocated ➁
And the existing data is copied from the old array to the new one ➂
The vector’s own recorded maximum capcity is updated, and the two array pointers are swapped ➃ so that the vector itself now points to the new array, while newArray now points to the older, smaller array.
And finally we can clean up by deleting the original array ➄

1.3.3 push_back: add to an array that might have space

    void push_back( const Object & x )
    {
        if( theSize == theCapacity )
            reserve( 2 * theCapacity ); 
        objects[ theSize++ ] = x; ➀
    }

When we return to push_back from the reserve call ➀ , we know that we have enough room to insert out new element x onto the end of the array.

Try out the vector operations in an animation.

2 Performance

A reminder: the standard promises this:

Operation	Speed
`vector()`	O(1)
`vector(n, x)`	O(n)
`size()`	O(1)
`v[ i ]`	O(1)
`push_back(x)`	O(1)
`pop_back`	O(1)
`insert`	O(size())
`erase`	O(size())
`front, back`	O(1)

Did we deliver?

2.1 Looking at push_back

pushback.cpp

    void reserve( int newCapacity )
    {
        if( newCapacity < theSize )
            return;

        Object *newArray = new Object[ newCapacity ];
        for( int k = 0; k < theSize; ++k )
	  newArray[ k ] = std::move( objects[ k ] );

        theCapacity = newCapacity;
        std::swap( objects, newArray );
        delete [ ] newArray;
    }

      // Stacky stuff
    void push_back( const Object & x )
    {
        if( theSize == theCapacity )
            reserve( 2 * theCapacity + 1 );
        objects[ theSize++ ] = x;
    }

The only apparent problem is push_back().

When we add to the end of a vector that is already filled to capacity(), we
- allocate a new array with twice the capacity() $O(1)$
  
  Actually, allocating an array the way it is done here is really O(newCapacity), because the default constructor would be invoked on each of the new elements. But “real” implementations of vector use a more primitive system function to allocate space without initializing it, avoiding that cost.
  
  Even at O(newCapacity), though, this would not change our final result. newCapacity == 2*size(), so we would simplify this to say the allocation is O(size()), and when we add up all the other steps described on this page, the final result would be unchanged.
- copy the old elements into the new array $O(\mbox{size}())$
- discard the old array $O(1)$
  
  Again, this is a simplification. “really” deleting the array would be O(size()) because the destructor would be invoked on each array element. But real implementations have ways around that as well. In fact, that’s the reason that std::move is used in the copy above.
- add the new element to the end. $O(1)$
Total is $O(\mbox{size}())$

2.1.1 push_back is, worst-case, O(size())

Total is $O(\mbox{size()})$.

That would seem to violate the C++ standard’s requirement that push_back run in $O(1)$ time.

But not every push_back() takes O(size()) time.

only when we have filled the array
otherwise, it takes $O(1)$ time

Let’s look at the issue from a slightly different point of view:

How long does it take to do a total of $n$ push_back() operations, starting with an empty vector?

2.1.2 Doing N push_backs

Let $k$ be the smallest integer such that $n \leq 2^k$. For the sake of simplicity, we’ll do the analysis as if we were actually going to do $2^k$ pushes.

Let $m$ be the current size() of the vector.
- If $m$ is a power of $2$, say, $2^i$ then the array is full and we do $O(m)$ work on the next call to push_back().
- If $m \neq 2^i$, then we do $O(1)$ work on next call.

We can then add up the total effort as

\[ \begin{eqnarray*} T(n) & = & \sum_{i=1}^k \left( O(2^i) + \sum_{j=1}^{2^i-1} O(1)\right) \\ & = & O\left(\sum_{i=1}^k (2^i)\right) \\ & = & O\left(1 + 2 + 4 + \ldots + 2^k\right) \\ & = & O\left(2^{k+1} - 1\right) \\ & = & O\left(2^{k+1} \right) \end{eqnarray*} \]

The total effort is $O\left(2^{k+1}\right)$.

But we started with the definition of $n$ saying that $n=2^k$, so this total effort is $O(2n) = O(n)$.

2.1.3 push_back has an Amortized Worst Case of O(1)

So even though

an individual call to push_back() may be $O(n)$,
the total effort for all $n$ push_back’s used to build a vector “from scratch” is also $O(n)$

We say that the amortized worst-case time of push_back() is therefore O(1).

Definition

amortize: to decrease (on average) over an extended period of time.

This term comes from the world of finance, where the cost of an initial high investment in equipment or facilities is often assessed (e.g., for tax purposes) at its equivalent annual cost over all the years that the equipment is in operation. For example, a \$10,000 computer expected to have a working lifetime of 5 years may be said to have an amortized cost of \$2,000 per year.

Similarly, if we consider the total work necessary to actually get a vector of $n$ elements, we say that the total cost is $O(n)$ and therefore we have an amortized worst-case complexity of $O(1)$ per push_back call.

Whether or not the amortized cost is really what we want depends upon what kind of performance is important to us. If we are mainly interested in how some algorithm involving many push_backs performs in totality, the amortized cost is appropriate. If, however, we are dealing with an interactive algorithm that does one push_back in between each prompt for user input, then the “real” $O(n)$ worst case is more appropriate because it indicates the amount of time that the user might have to wait after submitting an input.

2.2 Using reserve() to get a True O(1) Worst Case

If we knew ahead of time how many elements would be placed into the vector, we can make all the push_back’s O(1) time. We would do this by calling reserve to make sure there are enough slots without requesting more memory:

int n;
vector<std::string> names;
cout << "How many names? " << flush;
cin >> n;
names.reserve (n);
for (int i = 0; i < n; ++i)
  {
    cout << "\nEnter name #" << i << ": " << flush;
    std::string aName;
    cin >> aName;
    names.push_back(aName);
  }

This is the same reserve function that we looked at as part of the implementation of push_back.

2.3 Summary

So the true answer is that vector::push_back does have a worst case of $O(n)$, but in special circumstances that cost may average (amortize) to $O(1)$ over a sequence of $n$ calls.

In fact, if you were to look at the required behavior for vector::push_back listed in the C++ language standard, you would find that the required $O(1)$ behavior, is, indeed, a requirement for amortized time, not a requirement on the worst-case time.