Implementing the Vector Class
Steven J. Zeil
Now that we have a sense of how to use vectors, let’s talk about how we can implement a vector.
- Your text introduces a somewhat simplified version (
Vector
) of thestd
interface.
Required Performance
The C++ standard specifies that a legal (i.e., standard-conforming) implementation of vector
must satisfy the following performance requirements:
Operation | Speed |
---|---|
vector() |
O(1) |
vector(n, x) |
O(n) |
size() |
O(1) |
v[ i ] |
O(1) |
push_back(x) |
O(1) |
pop_back |
O(1) |
insert |
O(size()) |
erase |
O(size()) |
front, back |
O(1) |
1 Implementing a vector class
1.1 Retrieving elements by index
-
We want to support operator[] in O(1) time.
-
Natural choice is to use an array.
For example, we could declare the data members for a vector like this:
template <typename Object>
class Vector
{
⋮
private:
int theSize;
int theCapacity;
Object* objects;
};
with objects
being a pointer to a dynamically allocated array.
The theSize
and theCapacity
variables track how many elements are in the array and how large the array is, respectively.
This makes many of the operations very easy to implement. For example, indexing into the vector is done by applying the same index to the array objects
.
Object & operator[]( int index )
{
return objects[ index ];
}
const Object & operator[]( int index ) const
{
return objects[ index ];
}
The at
function makes this a bit safer:
Object & at( int index )
{
if( index < 0 || index >= size( ) )
throw ArrayIndexOutOfBoundsException{ };
return objects[ index ];
}
const Object & at( int index ) const
{
if( index < 0 || index >= size( ) )
throw ArrayIndexOutOfBoundsException{ };
return objects[ index ];
}
by checking to be sure the index is in a valid range.
1.2 push_back: Adding to the end of a vector
But the real question is, how do we allow vectors to grow to arbitrary size?
We’ll split this into cases:
1.2.1 Adding to an Array That Has Unused Space
Let’s consider the operation of adding a new element to the end of this vector,
v.push_back(t5);
If there is room in the array, we just add our new data element.
void push_back( const Object & x )
{
⋮
objects[ theSize++ ] = x;
}
1.2.2 Adding to an Array That is Full
We’ll see later what this does to the big-O complexity for push_back()
.
1.3 Coding push_back
Now, let’s look at the code to accomplish all this.
Remember the basic steps:
-
If the data array is already filled to capacity, and we try to add more to it, we make another that is twice as big …
-
First the new data area is allocated.
-
Then the old data is copied into the new array.
-
The old array is deleted, and the
objects
andtheCapacity
fields updated.
-
-
Now we can add the new element to the end of the vector.
1.3.1 Are we Full?
void push_back( const Object & x )
{
if( theSize == theCapacity ) ➀
reserve( 2 * theCapacity ); ➁
objects[ theSize++ ] = x;
}
- If the data array is already filled to capacity ➀ , and we try to add more to it,
- then we need to create a new array that is twice as large ➁
-
this is accomplished via the function
reserve
, which is part of the public interface ofvector
.v.reserve(n)
guarantees thatv
has enough storage to contain at leastn
elements without requiring any additional memory allocation.
-
1.3.2 Reserve: reserving space for future growth
void reserve( int newCapacity )
{
if( newCapacity < theSize ) ➀
return;
Object *newArray = new Object[ newCapacity ]; ➁
for( int k = 0; k < theSize; ++k ) ➂
newArray[ k ] = std::move( objects[ k ] );
theCapacity = newCapacity;
std::swap( objects, newArray ); ➃
delete [ ] newArray; ➄
}
-
If the vector’s array is already large enough ➀ to hold at least
newCapacity
elements, thenreserve
returns immediately without doing anything. -
Otherwise, a new array is allocated ➁
-
And the existing data is copied from the old array to the new one ➂
-
The vector’s own recorded maximum capcity is updated, and the two array pointers are swapped ➃ so that the vector itself now points to the new array, while
newArray
now points to the older, smaller array. -
And finally we can clean up by deleting the original array ➄
1.3.3 push_back: add to an array that might have space
void push_back( const Object & x )
{
if( theSize == theCapacity )
reserve( 2 * theCapacity );
objects[ theSize++ ] = x; ➀
}
When we return to push_back
from the reserve
call ➀ , we know that we have enough room to insert out new element x
onto the end of the array.
Try out the vector
operations in an animation.
2 Performance
A reminder: the standard promises this:
Operation | Speed |
---|---|
vector() |
O(1) |
vector(n, x) |
O(n) |
size() |
O(1) |
v[ i ] |
O(1) |
push_back(x) |
O(1) |
pop_back |
O(1) |
insert |
O(size()) |
erase |
O(size()) |
front, back |
O(1) |
Did we deliver?
2.1 Looking at push_back
void reserve( int newCapacity )
{
if( newCapacity < theSize )
return;
Object *newArray = new Object[ newCapacity ];
for( int k = 0; k < theSize; ++k )
newArray[ k ] = std::move( objects[ k ] );
theCapacity = newCapacity;
std::swap( objects, newArray );
delete [ ] newArray;
}
// Stacky stuff
void push_back( const Object & x )
{
if( theSize == theCapacity )
reserve( 2 * theCapacity + 1 );
objects[ theSize++ ] = x;
}
The only apparent problem is push_back()
.
-
When we add to the end of a vector that is already filled to
capacity()
, we-
allocate a new array with twice the
capacity()
$O(1)$Actually, allocating an array the way it is done here is really
O(newCapacity)
, because the default constructor would be invoked on each of the new elements. But “real” implementations ofvector
use a more primitive system function to allocate space without initializing it, avoiding that cost.Even at
O(newCapacity)
, though, this would not change our final result.newCapacity == 2*size()
, so we would simplify this to say the allocation isO(size())
, and when we add up all the other steps described on this page, the final result would be unchanged. -
copy the old elements into the new array $O(\mbox{size}())$
-
discard the old array $O(1)$
Again, this is a simplification. “really” deleting the array would be O(size()) because the destructor would be invoked on each array element. But real implementations have ways around that as well. In fact, that’s the reason that
std::move
is used in the copy above. -
add the new element to the end. $O(1)$
-
-
Total is $O(\mbox{size}())$
2.1.1 push_back is, worst-case, O(size())
Total is $O(\mbox{size()})$.
That would seem to violate the C++ standard’s requirement that push_back
run in $O(1)$ time.
But not every push_back()
takes O(size()) time.
-
only when we have filled the array
-
otherwise, it takes $O(1)$ time
Let’s look at the issue from a slightly different point of view:
How long does it take to do a total of $n$
push_back()
operations, starting with an empty vector?
2.1.2 Doing N push_backs
Let $k$ be the smallest integer such that $n \leq 2^k$. For the sake of simplicity, we’ll do the analysis as if we were actually going to do $2^k$ pushes.
- Let $m$ be the current
size()
of the vector.-
If $m$ is a power of $2$, say, $2^i$ then the array is full and we do $O(m)$ work on the next call to
push_back()
. -
If $m \neq 2^i$, then we do $O(1)$ work on next call.
-
We can then add up the total effort as
\[ \begin{eqnarray*} T(n) & = & \sum_{i=1}^k \left( O(2^i) + \sum_{j=1}^{2^i-1} O(1)\right) \\ & = & O\left(\sum_{i=1}^k (2^i)\right) \\ & = & O\left(1 + 2 + 4 + \ldots + 2^k\right) \\ & = & O\left(2^{k+1} - 1\right) \\ & = & O\left(2^{k+1} \right) \end{eqnarray*} \]
The total effort is $O\left(2^{k+1}\right)$.
But we started with the definition of $n$ saying that $n=2^k$, so this total effort is $O(2n) = O(n)$.
2.1.3 push_back has an Amortized Worst Case of O(1)
So even though
-
an individual call to
push_back()
may be $O(n)$, -
the total effort for all $n$
push_back
’s used to build a vector “from scratch” is also $O(n)$
We say that the amortized worst-case time of push_back()
is therefore O(1).
Definition
amortize: to decrease (on average) over an extended period of time.
This term comes from the world of finance, where the cost of an initial high investment in equipment or facilities is often assessed (e.g., for tax purposes) at its equivalent annual cost over all the years that the equipment is in operation. For example, a \$10,000 computer expected to have a working lifetime of 5 years may be said to have an amortized cost of \$2,000 per year.
Similarly, if we consider the total work necessary to actually get a vector of $n$ elements, we say that the total cost is $O(n)$ and therefore we have an amortized worst-case complexity of $O(1)$ per push_back
call.
Whether or not the amortized cost is really what we want depends upon what kind of performance is important to us. If we are mainly interested in how some algorithm involving many push_back
s performs in totality, the amortized cost is appropriate. If, however, we are dealing with an interactive algorithm that does one push_back
in between each prompt for user input, then the “real” $O(n)$ worst case is more appropriate because it indicates the amount of time that the user might have to wait after submitting an input.
2.2 Using reserve() to get a True O(1) Worst Case
If we knew ahead of time how many elements would be placed into the vector, we can make all the push_back
’s O(1) time. We would do this by calling reserve
to make sure there are enough slots without requesting more memory:
int n;
vector<std::string> names;
cout << "How many names? " << flush;
cin >> n;
names.reserve (n);
for (int i = 0; i < n; ++i)
{
cout << "\nEnter name #" << i << ": " << flush;
std::string aName;
cin >> aName;
names.push_back(aName);
}
This is the same reserve
function that we looked at as part of the implementation of push_back
.
2.3 Summary
So the true answer is that vector::push_back
does have a worst case of $O(n)$, but in special circumstances that cost may average (amortize) to $O(1)$ over a sequence of $n$ calls.
In fact, if you were to look at the required behavior for vector::push_back
listed in the C++ language standard, you would find that the required $O(1)$ behavior, is, indeed, a requirement for amortized time, not a requirement on the worst-case time.