The Structure of a C++ Program
Steven Zeil
1 Separate Compilation
C++ programs can range from a handful of statements to hundreds of thousands, and may be written by one person or by a team.
Beginning C++ programmers usually work with programs in which all of the source code is written into a single file.
Putting your entire program into a single file is OK for small programs. But with large programs
-
compilation would take minutes, hours, maybe days
-
might break compiler
-
-
Team members would interfere with one another’s work.
“Are you still editing that file? You’ve had it all afternoon.”
or, even worse,
“What do you mean you’re saving changes to the file? I’ve been editing it for the last 45 minutes!”
By splitting a program up into multiple files that can be separately,
-
Team members can work in parallel on separate files
-
Files are compiled separately
-
each individual compilation is fast
-
-
Separately compiled code is linked to produce the executable
-
linking is much faster than compilation
-
1.1 The Files of a C++ Program
A typical C++ program is divided into many source code files
-
Some are headers
-
Typically end in “.h”
-
May be
#include
d from many different places -
May
#include
other headers -
Not directly compiled
-
-
Some are compilation units
-
Typically end in “.cpp”, “.cc”, or “.C”
-
Should never be
#include
d from elsewhere -
May
#include
headers -
Are directly compiled
-
1.2 How C++ Code is Compiled
-
Each file of source code (programming language text)
-
is compiled to produce a file of object code.
-
All object code files are linked to produce the executable
Object Code is
-
binary code, almost executable
-
but exact addresses of variables and functions not known (because they may be set in other .cpp files that haven’t been compiled yet), and are represented by symbols instead.
Linking mainly consists of replacing those symbols in object code by real addresses. So linking can only be done after all the source code has been compiled into object code.
Linking is much simpler and therefore much faster than the earlier compiling steps.
So, think about how this works with large, complicated projects.
On large projects with hundreds or thousands of files,
-
Typically only a few files are changed on any one day
-
Often only the changed files need to be recompiled
-
Then link the changed and unchanged object code to produce the executable
So separating the linking step allows these large programs to be rebuilt quickly, because not every line of code needs to be put through the entire compilation process.
1.3 Pre-processing
The # Preprocessor
The preprocessor runs before the compiler proper.
The preprocessor:
-
modifies the source code
-
processes preprocessor instructions
-
lines beginning with #
-
-
strips out comments
The common pre-processor instructions are
-
#include
-
insert a file
-
-
#define
-
define a macro
-
-
#ifdef, #ifndef, #endif
-
check to see if a macro has been defined
-
1.3.1 #include
-
Inserts a file or header into the current source code
-
Two versions
-
#include <headerName>
-
inserts a system header file from a location defined when the compiler was installed
-
-
#include "fileName"
-
inserts a file from the current directory
-
-
Example: #include (simple case)
-
Suppose we have three files:
A.h// This is file A.h code from A.h
B.h//This is file B.h #include "A.h" code from B.h more code from B.h
C.cpp//This is file C.cpp #include "A.h" #include "B.h" code from C.cpp
-
We ask the compiler to only run the preprocessor and save the result:
g++ -E C.cpp > C.i
-
The result is file C.i
C.i.listing# 1 "C.cpp" # 1 "<built-in>" # 1 "<command line>" # 1 "C.cpp" # 1 "A.h" 1 code from A.h # 3 "C.cpp" 2 # 1 "B.h" 1 # 1 "A.h" 1 code from A.h # 3 "B.h" 2 code from B.h more code from B.h # 4 "C.cpp" 2 code from C.cpp
-
Note the presence of content from all three files
- includes markers telling where the content came from
A more realistic example
In real programs, most of the code actually seen by the compiler may come from #include
s
- From this source code:
#include <iostream>
using namespace std;
int main() {
cout << "Hello World" << endl;
return 0;
}
- the compiler sees this.
Deja-Vu
-
Code that is in headers (.h files) may actually be compiled many times
-
Code that is in compilation unit (.cpp) files will be compiled only once
This distinction will be important later.
1.4 Other Pre-processing Commands
#define
- Used to define macros (symbols that the preprocessor will later substitute for)
A common example of #define
is to provide special system-specific constants, e.g.,
#define VersionNumber "1.0Beta1"
int main() {
cout << "Running version "
<< VersionNumber
<< endl;
Much more elaborate macros are possible, including ones with parameters.
#ifdef
, #ifndef
, #endif
Used to select code based upon whether a macro has been defined:
#ifdef __GNUG__
/* Compiler is gcc/g++ */
#endif
#ifdef _MSC_VER
/* Compiler is Microsoft Visual C++ */
#endif
#if, #define, and #include
-
All of these macros are used to reduce the amount of code seen by the actual compiler
-
Suppose we have three files:
A2.h#ifndef A2_H #define A2_H // This is file A2.h code from A2.h #endif
B2.h#ifndef B2_H #define B2_H //This is file B2.h #include "A2.h" code from B2.h more code from B2.h #endif
C2.cpp//This is file C.cpp #include "A2.h" #include "B2.h" code from C2.cpp
-
We ask the compiler to only run the preprocessor and save the result:
g++ -E C2.cpp > C2.i
The result is file C2.i.
-
Note that the code from
A2.h
is included only once -
Imagine now, how much we would have saved if that were
iostream
instead ofA2.h
2 Declarations and Definitions
Some of the most common error messages you will encounter as a C++ programmer are
-
… is undeclared
-
… is undefined
-
… is defined multiple times
Fixing these requires that you understand the difference between declarations and definitions.
- and how they relate to the program structure
Warning: Textbooks & C++ websites are often sloppy about this terminology.
(Error messages from compilers, on the other hand, tend to be very precise about it, and assume that you understand the difference.)
2.1 Declarations
A declaration in C++
-
introduces (or repeats) a name for something
-
tells what “kind” of thing it is
-
gives programmers enough information to use it
2.2 Definitions
A definition in C++
-
introduces (or repeats) a name for something
-
tells what “kind” of thing it is
-
tells what value it has and/or how it works
-
gives the compiler enough information to generate this and assign it an address
General rules for declarations & definitions:
- All definitions are also declarations.
- But not vice versa
- A name must be declared before you can write any code that uses it.
- A name can be declared any number of times, as long as the declarations are identical.
- A name must be defined exactly once, somewhere within all the separately compiled files making up a program.
2.3 Decls&Defs: Variables
-
These are definitions of variables:
int x; string s = "abc"; MyFavoriteDataType mfdt (0);
-
These are declarations:
extern int x; extern string s; extern MyFavoriteDataType mfdt;
2.4 Decls&Defs: Functions
-
Declaration:
int myFunction (int x, int y);
-
Definition
int myFunction (int x, int y) { return x + y; }
-
The declaration provides only the header. The definition adds the body.
2.5 Decls&Defs: Data Types
-
Data types in C++ are declared, but never defined.
-
These are declarations:
typedef float Weight; typedef string* StringPointer; enum Colors {red, blue, green}; struct Money { int dollars; int cents; };
3 Dividing code into modules
Most of the source code files that make up a program are divided into modules consisting of a header file and a closely related compilation unit.
How do we divide up the declarations and definitions that make up a program into modules?
I usually focus on the headers first, then the compilation units. But in practice this is an iterative process, in which I make an initial division of the headers, look at what that does to the compilation units, modify the headers, look at what that changes the compilation units, etc.
Dividing up the headers:
-
Headers can only contain declarations – no definitions.
-
Headers are for sharing – if a declaration is only used in one module, it doesn’t need to be in a header. It can be hidden inside the compilation unit, thus reducing some potential coupling.
-
Start by identifying declarations of types, variables & constants, and functions that relate to a common “theme”.
Often the theme is “here’s a structured data type and all of the functions that manipulate it”.
-
Create a module for each such “theme”. Begin by collecting the declarations related to that theme into a single header file.
-
Pair up compilation units with the header files
-
Usually these will have matching names. If we had decided to have a “time” module with a header named
time.h
, I would add a compilation unittime.cpp
. -
The compilation unit provide definitions for each declaration in the header that is it paired with.
-
-
Now look for things that can be improved, e.g.,
-
Are there declarations in a header that are only used from within its own compilation unit? Hide them within the compilation unit.
-
Does a compilation unit try to use a symbol from another module that isn’t in a header? Move the symbol declaration into the header?
-
Do you have circular dependencies where header file 1
#include
s header file 2 that#include
s header file 1? Try to simplify by moving things around of maybe dividing one of those modules into two pieces, only one of which is needed by the other.
-
Dividing a program into modules is something of an art, and takes practice. Experience in reading other people’s code and seeing how they divided things up is useful.
In CS250, you will be introduced to the idea of an “abstract data type”, which is a powerful organizing principle for achieving well-designed modules.