Implementing the Vector Class

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

Now that we have a sense of how to use vectors, let’s talk about how we can implement a vector.

Required Performance

The C++ standard specifies that a legal (i.e., standard-conforming) implementation of vector must satisfy the following performance requirements:

Operation Speed
vector() O(1)
vector(n, x) O(n)
size() O(1)
v[ i ] O(1)
push_back(x) O(1)
pop_back O(1)
insert O(size())
erase O(size())
front, back O(1)

1 Implementing a vector class

1.1 Retrieving elements by index

 

For example, we could declare the data members for a vector like this:

template <typename Object>
class Vector
{
  ⋮
  private:
    int theSize;
    int theCapacity;
    Object* objects;
};

with objects being a pointer to a dynamically allocated array.

The theSize and theCapacity variables track how many elements are in the array and how large the array is, respectively.

This makes many of the operations very easy to implement. For example, indexing into the vector is done by applying the same index to the array objects.

    Object & operator[]( int index )
    {
        return objects[ index ];
    }

    const Object & operator[]( int index ) const
    {
        return objects[ index ];
    }

The at function makes this a bit safer:

    Object & at( int index )
    {
        if( index < 0 || index >= size( ) )
            throw ArrayIndexOutOfBoundsException{ };
        return objects[ index ];
    }

    const Object & at( int index ) const
    {
        if( index < 0 || index >= size( ) )
            throw ArrayIndexOutOfBoundsException{ };
        return objects[ index ];
    }

by checking to be sure the index is in a valid range.

1.2 push_back: Adding to the end of a vector

But the real question is, how do we allow vectors to grow to arbitrary size?

We’ll split this into cases:

1.2.1 Adding to an Array That Has Unused Space

 

Let’s consider the operation of adding a new element to the end of this vector,

v.push_back(t5);

If there is room in the array, we just add our new data element.

    void push_back( const Object & x )
    {
	      ⋮
        objects[ theSize++ ] = x;
    }

1.2.2 Adding to an Array That is Full

 

If the data array is already filled to capacity, and we try to add more to it,

v.push_back(T4);

we make another that is twice as big …

first prev1 of 5next last

We’ll see later what this does to the big-O complexity for push_back().

1.3 Coding push_back

Now, let’s look at the code to accomplish all this.

Remember the basic steps:

1.3.1 Are we Full?

    void push_back( const Object & x )
    {
        if( theSize == theCapacity ) ➀
            reserve( 2 * theCapacity ); ➁
        objects[ theSize++ ] = x;
    }

1.3.2 Reserve: reserving space for future growth

    void reserve( int newCapacity )
    {
        if( newCapacity < theSize )  ➀
            return;

        Object *newArray = new Object[ newCapacity ]; ➁
        for( int k = 0; k < theSize; ++k )            ➂
            newArray[ k ] = std::move( objects[ k ] );

        theCapacity = newCapacity;
        std::swap( objects, newArray );   ➃
        delete [ ] newArray;              ➄
    }

1.3.3 push_back: add to an array that might have space

    void push_back( const Object & x )
    {
        if( theSize == theCapacity )
            reserve( 2 * theCapacity ); 
        objects[ theSize++ ] = x; ➀
    }

When we return to push_back from the reserve call , we know that we have enough room to insert out new element x onto the end of the array.

Try out the vector operations in an animation.

2 Performance

A reminder: the standard promises this:

Operation Speed
vector() O(1)
vector(n, x) O(n)
size() O(1)
v[ i ] O(1)
push_back(x) O(1)
pop_back O(1)
insert O(size())
erase O(size())
front, back O(1)

Did we deliver?

2.1 Looking at push_back

pushback.cpp
    void reserve( int newCapacity )
    {
        if( newCapacity < theSize )
            return;

        Object *newArray = new Object[ newCapacity ];
        for( int k = 0; k < theSize; ++k )
	  newArray[ k ] = std::move( objects[ k ] );

        theCapacity = newCapacity;
        std::swap( objects, newArray );
        delete [ ] newArray;
    }

      // Stacky stuff
    void push_back( const Object & x )
    {
        if( theSize == theCapacity )
            reserve( 2 * theCapacity + 1 );
        objects[ theSize++ ] = x;
    }

The only apparent problem is push_back().

2.1.1 push_back is, worst-case, O(size())

Total is $O(\mbox{size()})$.

That would seem to violate the C++ standard’s requirement that push_back run in $O(1)$ time.

But not every push_back() takes O(size()) time.

Let’s look at the issue from a slightly different point of view:

How long does it take to do a total of $n$ push_back() operations, starting with an empty vector?

2.1.2 Doing N push_backs

Let $k$ be the smallest integer such that $n \leq 2^k$. For the sake of simplicity, we’ll do the analysis as if we were actually going to do $2^k$ pushes.

We can then add up the total effort as

\[ \begin{eqnarray*} T(n) & = & \sum_{i=1}^k \left( O(2^i) + \sum_{j=1}^{2^i-1} O(1)\right) \\ & = & O\left(\sum_{i=1}^k (2^i)\right) \\ & = & O\left(1 + 2 + 4 + \ldots + 2^k\right) \\ & = & O\left(2^{k+1} - 1\right) \\ & = & O\left(2^{k+1} \right) \end{eqnarray*} \]

The total effort is $O\left(2^{k+1}\right)$.

But we started with the definition of $n$ saying that $n=2^k$, so this total effort is $O(2n) = O(n)$.

2.1.3 push_back has an Amortized Worst Case of O(1)

So even though

We say that the amortized worst-case time of push_back() is therefore O(1).

Definition

amortize: to decrease (on average) over an extended period of time.

This term comes from the world of finance, where the cost of an initial high investment in equipment or facilities is often assessed (e.g., for tax purposes) at its equivalent annual cost over all the years that the equipment is in operation. For example, a \$10,000 computer expected to have a working lifetime of 5 years may be said to have an amortized cost of \$2,000 per year.

Similarly, if we consider the total work necessary to actually get a vector of $n$ elements, we say that the total cost is $O(n)$ and therefore we have an amortized worst-case complexity of $O(1)$ per push_back call.

Whether or not the amortized cost is really what we want depends upon what kind of performance is important to us. If we are mainly interested in how some algorithm involving many push_backs performs in totality, the amortized cost is appropriate. If, however, we are dealing with an interactive algorithm that does one push_back in between each prompt for user input, then the “real” $O(n)$ worst case is more appropriate because it indicates the amount of time that the user might have to wait after submitting an input.

2.2 Using reserve() to get a True O(1) Worst Case

If we knew ahead of time how many elements would be placed into the vector, we can make all the push_back’s O(1) time. We would do this by calling reserve to make sure there are enough slots without requesting more memory:

int n;
vector<std::string> names;
cout << "How many names? " << flush;
cin >> n;
names.reserve (n);
for (int i = 0; i < n; ++i)
  {
    cout << "\nEnter name #" << i << ": " << flush;
    std::string aName;
    cin >> aName;
    names.push_back(aName);
  }

This is the same reserve function that we looked at as part of the implementation of push_back.

2.3 Summary

So the true answer is that vector::push_back does have a worst case of $O(n)$, but in special circumstances that cost may average (amortize) to $O(1)$ over a sequence of $n$ calls.

In fact, if you were to look at the required behavior for vector::push_back listed in the C++ language standard, you would find that the required $O(1)$ behavior, is, indeed, a requirement for amortized time, not a requirement on the worst-case time.