Commentary: the std::string Class
Chris Wild and Steven Zeil
One of the most commonly used data types in any programming language is the character string.
In C++, string is NOT a built-in data type, but is part of the standard library.
1 Strings
The std::string
type supports:
- Constructors (declarations) for initializing new string variables.
- A size or length function to count how many characters are in the string.
- Various “append” functions to add to the end of the string.
- A
+
operator to join two strings together. - The relational operators
==
,!=
,<
,<=
,>
, and>
for determining if two strings are equal or to check if one string comes before the other in “alphabetical order”. - Functions for finding characters or shorter strings within another.
- Functions for extracting a selected portion (“substring”) of a string.
- Mechanisms for extracting and replacing individual characters of a string.
2 Converting Other Data Types to/from String
2.1 Converting can be a challenge
Although a very common operation, converting to/from strings is not trivial in C++.
But the most common approach in C++ is to use I/O operations to read and write from “string streams”. Remember that part of the C++ model is that we can read (>>
) and write (<<
) from streams, but different kinds of streams may connect to different kinds of devices. One such “device” can be a variable holding a string, e.g.:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
⋮
string dataIn = "42 3.14159";
istringstream in (dataIn); // in actually reads from the string dataIn
int i;
double d;
in >> i >> d; // i will be 42, and d will be 3.14.159
in.close();
or, when converting to a string:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
⋮
string dataOut;
int i = 42;
double d = 3.14159;
ostringstream out; // out actually writes to a string
out << i << ' ' << d;
dataOut = out.str(); // Get the string that has been written into
out.close();
2.2 String Literals are not Strings
Sometimes programming languages (like some people) do not show their age gracefully.
C++ is the direct descendant of the C programming language. In fact, the very name “C++” is considered to be a bit of a joke: the ++
operator takes us “one step beyond” the value it is applied to. So C++ is just “one step beyond” C.
C dates back the late 1970’s. One of the goals of C++ was to maintain as much backwards compatibility as possible with C – most old code written in C should still compile and run using a C++ compiler.
One of the places where this shows up is in string handling. C did not have a data type named “string”. Instead, C used arrays of characters. When used to store character strings, the convention was to indicate the end of a string by inserting final character containing the ASCII character code 0, known as NUL. So strings in C are often described as null-terminated arrays. (“NUL” and “null” aren’t actually identical, but tradition has conflated them.)
The one place where you may notice this is that string literals (the string “constants” we write in our code) do not have data type std::string
. They actually are considered to be of type “const
char*
”. In the next module we will see that this means “an array of characters in which we are prohibited from changing the individual characters”.
const char* myName = "Steven Zeil";
You might wonder where the NUL terminator is in the above value. The answer is that you can’t see it. NUL was designed to be an invisible, non-printing value in the ASCII character set. So the array containing "Steven Zeil"
will actually have 12 characters, even though you only see 11. The final character is a NUL, automatically inserted by the compiler.
The data type “const char*
”, together with the convention of null termination, is generally referred to as “character arrays” or C strings.
Converting from C strings to std::string
is easy:
string myNameAsAString = myName;
string myNameAsAString2 = "Steven Zeil";
These lines rely on the fact that the C++ string
type is declared in a way that automatically converts const char*
values to string
values.
Every now and then, however, you run across a bit of code (possibly old code) that takes parameters of type const char*
, and you are forced to remember that string literals are not same thing as strings:
void oldCode (const char* fileName);
⋮
oldCode("foo.txt"); // OK
string aFileName = "bar.txt";
oldCode (aFileName); // compilation error
That last statement gets a compilation error because aFileName
has type string
, but oldCode
is expecting a character array. The first call to oldCode
worked because the string literal “foo.txt” really is a character array.
For those odd occasions when you really need a character array/C string, the std::string
class provides a function to do the conversion:
void oldCode (const char* fileName);
⋮
oldCode("foo.txt"); // OK
string aFileName = "bar.txt";
oldCode (aFileName.c_str()); // also OK
3 String I/O
There are three approaches used for reading strings.
3.1 Read characters until a whitespace character is found
-
Whitespace characters are: blank, tab and new line.
-
Use the regular istream extraction operator (‘>>’) for this purpose.
-
This technique is illustrated in the earlier program, where the strings firstName and lastName are read.
-
-
Initial whitespace characters are ignored. (Try running the above program and adding extra spaces between your names.)
-
One problem with this approach is that you cannot read a string with blank characters in it.
3.2 Read until the end of line.
Here is a program for reading names:
#include <string>
#include <iostream>
using namespace std;
int main()
{
string firstName, lastName, fullName;
string greeting("Hello ");
cout << "What is your name? ";
getline (cin, fullName);
if (fullName.size() > 0 && fullName[0] == ' ')
{ // Trim leading blanks from fullName
int charPosition = fullName.find_first_not_of (" ");
fullName = fullName.substr(charPosition);
}
// Split fullName into parts
int blankPosition = fullName.find(' ');
if (blankPosition != string::npos)
{
firstName = fullName.substr (0, blankPosition);
int charPosition = fullName.find_first_not_of (" ", blankPosition);
lastName = fullName.substr (charPosition);
}
else
lastName = fullName;
greeting.append(lastName);
greeting.append(", " + firstName);
string banner(greeting.length() + 4,'$'); // construct string with bunch of '$'s
cout << banner << endl;
cout << "$ " << greeting + " $" << endl;
cout << banner << endl << endl;
if(firstName < lastName)
cout << "your first name is alphabetically before your last\n";
else
cout << "your first name is alphabetically after your last\n";
return 0;
}
-
We use the function getline for this purpose.
getline will put all characters into the string including blanks and tabs (but not newline characters).
-
But, because we have acquired the entire name in a single string value, we now need to use string functions to split the name into the desired pieces.
3.3 Read until a special character is found.
- The special character is passed as the optional third parameter of the getline function.
#include <string>
#include <iostream>
using namespace std;
int main()
{
string firstName, lastName, fullName;
string greeting("Hello ");
cout << "What is your name (first last separated by a space)? ";
getline (cin, firstName, ' ');
getline (cin, lastName);
cin >> firstName >> lastName;
greeting.append(lastName);
greeting.append(", " + firstName);
string banner(greeting.length() + 4,'$'); // construct string with bunch of '$'s
cout << banner << endl;
cout << "$ " << greeting + " $" << endl;
cout << banner << endl << endl;
if(firstName < lastName)
cout << "your first name is alphabetically before your last\n";
else
cout << "your first name is alphabetically after your last\n";
return 0;
}
- This code is simpler, but also more “fragile” than the version before. If the person actually types blanks before their first name, we are in trouble. If the person types multiple blanks between the two parts of their name, we are likewise in trouble.
-
As a general rule, we have more control if we read an entire line and then use the string functions to extract what we want than we can get by relying on special characters to actually stop the input partway through a line.
-
So which method of inputting strings is the best to use?
-
The answer depends on how much you know about your input. Use
>>
when you know you want to skip leading whitespace and that you won’t have whitespace inside the value you want to read. -
Use getline when you want an entire line of data.
-
If you want a partial line of data that stops at something other than whitespace, use the 3-parameter form of getline.
Remember always that >>
skips over leading whitespace and stops before (but does not consume) trailing whitespace. getline preserves leading whitespace and consumes (discards) its stopping character.