Using Vectors
Steven Zeil
The std::vector
template is a more convenient replacement for dynamically allocated arrays. It provides, in essence, an array that grows as necessary to accomodate the amount of data that we need.
Unlike dynamically allocated arrays, however, it manages its own memory, can be copied, compared, and passed to functions much like any other class.
1 Keeping Information Together
One criticism of typical array-manipulation functions, such as
/**
* Add a value to an array, keeping all elements in order.
*
* @param array an array of strings, with elements 0..size-1 already in order.
* @param size the number of elements in the array
* @param capacity the number of slots allocated for the array. If
* size >= capacity, there is no room to add new elements and
* this function will fail.
* @param value the string to be inserted into the array
int addInOrder (std::string* array,
int& size,
int capacity
std::string value)
{
assert (size < capacity);
int k = size;
while (k > 0 && array[k-1] > value)
{
array[k] = array[k-1];
--k;
}
array[k] = value;
++size;
}
is that they separate the array, the size, and the capacity
- It’s easy for programmers to lose track of which integer counter applies to which array.
- It’s easy to lose track of the difference between the capacity (the number of elements that can fit in the array) and the size (the number of elements in the array that contain useful data).
- It’s just plain messy to pass this information as separate parameters.
Wrapping arrays within structs
One solution: use a struct to gather the related elements together:
/// A collection of items
struct ItemSequence {
static const int capacity = 500;
int size;
Item data[capacity];
};
In fact, that’s pretty much what std::array
does, but that doesn’t help when we don’t know the required capacity until the program is already running.
That’s where the vector
comes into play.
1.1 Vectors
The vector is an array-like structure provided in the std
header <vector>
.
- Think of it as an array that can grow at the high end
vector
is a template, so you have to give the element type to instantiate it when you want to create or pass vector objects:
std::vector<int> vi; // a vector of 0 ints
std::vector<std::string> vs (10); // a vector of 10
// empty strings
std::vector<float> vf (5, 1.0); // a vector of 5
// floats, all 1.0
Accessing Elements in a Vector
Use the [ ]
brackets just as with an array:
vector<int> v(10, 0);
for (int i = 0; i < 10; ++i)
{
int j;
cin >> j;
v[i] = j + 1;
cout << v[i] << endl;
}
We can also ask a vector for its current size:
void foo (vector<int>& v) {
for (unsigned i = 0; i < v.size(); ++i)
{
int j;
cin >> j;
v[i] = j + 1;
cout << v[i] << endl;
}
}
This brings us to the biggest stylistic difference between working with vectors and arrays:
When we work with arrays, we allocate the maximum space (capacity) that we think we will need, and then keep separate track of how much of that capacity we are currently using (size). Commonly, only a small portion of our arrays actually contain useful data at any given time.
When we work with vectors, we insert exactly as many elements as we actually have. The vector itself expands its capacity as necessary and keeps track of how much of that capacity is actually in use (the size) at any time.
We can ask a vector for its current capacity:
unsized cap = myVector.capacity();
but that value isn’t fixed the way that it is for arrays. If we try to add enough data that the size()
would be greater than the capacity()
, the vector will increase its capacity()
. In fact, what the capaacity()
means is simply how large the vector can grow before it will need to grab more memory. We’ll see how this actually happens in a later lesson.
1.2 Adding elements to a Vector
The square brackets [ ]
allow us to look at and assign to already existing elements in the vector:
v[i] = v[j] + 1; // Valid only if i < v.size() && j < v.size()
You can’t add new elements to a vector by simply assigning to them:
vector<int> v; // v.size() == 0
for (int i = 0; i < 100; ++i)
v[i] = i; // Crash! (If we're lucky.)
Instead, the way we usually add elements to a vector is by pushing them, one at a time, onto the back of the vector:
vector<int> v; // v.size() == 0
for (int i = 0; i < 100; ++i)
v.push_back(i);
// v.size() == 100, v.capacity() >= 100
It is possible to add something to the middle of a vector
v.insert(42, pos);
where pos
is an iterator, but this operation is slow enough that we should be careful about using it.
The vector equivalent of our earlier array-based addInOrder
function would be
/**
* Add a value to a vector, keeping all elements in order.
*
* @param v a vector of strings, with elements 0..size-1 already in order.
* @param value the string to be inserted into the array
int addInOrder (std::vector<std::string>& v, std::string value)
{
int k = v.size();
v.push_back(value); // Increase the size by 1, so that we
// have room for the new element
while (k > 0 && v[k-1] > value)
{
v[k] = v[k-1];
--k;
}
array[k] = value;
}
1.3 Removing Elements from a Vector
We can quickly remove elements from the end of a vector:
v.pop_back();
This operations will decrease v.size()
by 1.
It is possible to add something to the middle of a vector
v.erase(pos);
where pos
is an iterator, but this operation is slow enough that we don’t want to do this very often.
2 Example: Computing the Median
The “median average” of a collection of numbers is the middle number when they are arranged in sorted order (or the midpoint between the two middle values if we have an even number of values).
Let’s write a function to read a series of numbers from an input stream and to compute the median average.
* We don’t know, ahead of time, how many numbers will be in the input. We won’t know until we hit the end of the input. * We will assume that the numbers in the input are sorted (arranged into ascending order).
To illustrate the difference in style between working with arrays and vectors, we’ll two two versions of this function. First: the array-based version:
double median1 (std::istream& input)
{
const int MaxInput = 1000;
double numbers[MaxInput];
unsigned n = 0;
double x;
while (input >> x) // read until end of input
{
assert (n < MaxInput); // Abort if too much input
numbers[n] = x;
++n;
}
double median = numbers[n/2]; // If n is odd
if (n % 2 == 0)
median = (numbers[n/2 - 1] + numbers[n]) / 2.0;
return median;
}
then the vector version:
double median2 (std::istream& input)
{
vector<double> numbers; // numbers is initially empty
double x;
while (input >> x) // read until end of input
{
numbers.push_back(x);
}
unsigned n = numbers.size();
double median = numbers[n/2]; // If n is odd
if (n % 2 == 0)
median = (numbers[n/2 - 1] + numbers[n]) / 2.0;
return median;
}
Differences of note:
-
We don’t need to ask if we have too much data in the vector version, because we don’t have to guess at a preset maximum when we write the code.
-
Although there is a maximum size for a vector, it’s based upon the largest block of memory the operating system will allow us to allocate to a program. We rarely have to worry about getting that much data. (The one exception would be if we had a
vector<T>
whereT
is a data type that is itself quite large. However, we would also face problems allocating large arrays ofT
in that case.)
-
-
The vector starts at size 0, and we grow it one element at a time via
push_back
. -
We don’t need to write our own code to track the current size of the data (
n
) – the vector does that for us. -
Once we have the data inserted into the vector, we afterwards can access it just like we would the data in an array.
3 Vectors versus Arrays
Advantages of Vectors
-
Can grow as necessary
-
Need not worry about pointers, allocation, delete
-
Vectors can copy (
v1 = v2;
) -
Vectors can be compared (
v1 == v2
,v1 != v2
,v1 < v2
, etc.)
Disadvantages of Vectors
-
A bit slower overall than arrays
- Turning on compiler optimization with
-O2
relieves a lot of this. - Individual calls to
push_back
vary considerably in time required. We’ll explore this more later.
- Turning on compiler optimization with
-
Can waste a lot of storage
- but so can arrays if we have to guess at the required maximum capacity.
-
Harder to work with in a debugger