The Structure of a C++ Program

compilation would take minutes, hours, maybe days
- might break compiler
Team members would interfere with one another’s work.

“Are you still editing that file? You’ve had it all afternoon.”

or, even worse,

“What do you mean you’re saving changes to the file? I’ve been editing it for the last 45 minutes!”

By splitting a program up into multiple files that can be separately,

Team members can work in parallel on separate files
Files are compiled separately
- each individual compilation is fast
Separately compiled code is linked to produce the executable
- linking is much faster than compilation

1.1 The Files of a C++ Program

A typical C++ program is divided into many source code files

Some are headers
- Typically end in “.h”
- May be #included from many different places
- May #include other headers
- Not directly compiled
Some are compilation units
- Typically end in “.cpp”, “.cc”, or “.C”
- Should never be #included from elsewhere
- May #include headers
- Are directly compiled

1.2 How C++ Code is Compiled

Each file of source code (programming language text)
is compiled to produce a file of object code.
All object code files are linked to produce the executable

Object Code is

binary code, almost executable
but exact addresses of variables and functions not known (because they may be set in other .cpp files that haven’t been compiled yet), and are represented by symbols instead.

Linking mainly consists of replacing those symbols in object code by real addresses. So linking can only be done after all the source code has been compiled into object code.

Linking is much simpler and therefore much faster than the earlier compiling steps.

So, think about how this works with large, complicated projects.

On large projects with hundreds or thousands of files,

Typically only a few files are changed on any one day
Often only the changed files need to be recompiled
Then link the changed and unchanged object code to produce the executable

So separating the linking step allows these large programs to be rebuilt quickly, because not every line of code needs to be put through the entire compilation process.

1.3 Pre-processing

The # Preprocessor

The preprocessor runs before the compiler proper.

The preprocessor:

modifies the source code
processes preprocessor instructions
- lines beginning with #
strips out comments

The common pre-processor instructions are

#include
- insert a file
#define
- define a macro
#ifdef, #ifndef, #endif
- check to see if a macro has been defined

1.3.1 #include

Inserts a file or header into the current source code
Two versions
- #include <headerName>
  - inserts a system header file from a location defined when the compiler was installed
- #include "fileName"
  - inserts a file from the current directory

Example: #include (simple case)

Suppose we have three files:

A.h
// This is file A.h code from A.h

B.h
//This is file B.h #include "A.h" code from B.h more code from B.h

C.cpp
//This is file C.cpp #include "A.h" #include "B.h" code from C.cpp
We ask the compiler to only run the preprocessor and save the result:

g++ -E C.cpp > C.i

The result is file C.i

C.i.listing

# 1 "C.cpp"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "C.cpp"

# 1 "A.h" 1

code from A.h
# 3 "C.cpp" 2
# 1 "B.h" 1

# 1 "A.h" 1

code from A.h
# 3 "B.h" 2
code from B.h
more code from B.h
# 4 "C.cpp" 2
code from C.cpp

Note the presence of content from all three files
- includes markers telling where the content came from

A more realistic example

In real programs, most of the code actually seen by the compiler may come from #includes

From this source code:

#include <iostream>

using namespace std;

int main() {
  cout << "Hello World" << endl;
  return 0;
}

the compiler sees this.

Deja-Vu

Code that is in headers (.h files) may actually be compiled many times
Code that is in compilation unit (.cpp) files will be compiled only once

This distinction will be important later.

1.4 Other Pre-processing Commands

#define

Used to define macros (symbols that the preprocessor will later substitute for)

A common example of #define is to provide special system-specific constants, e.g.,

#define VersionNumber "1.0Beta1"

int main() {
   cout << "Running version "
       << VersionNumber
       << endl;

Much more elaborate macros are possible, including ones with parameters.

#ifdef, #ifndef, #endif

Used to select code based upon whether a macro has been defined:

#ifdef __GNUG__
  /* Compiler is gcc/g++ */
#endif
#ifdef _MSC_VER
  /* Compiler is Microsoft Visual C++ */
#endif

#if, #define, and #include

All of these macros are used to reduce the amount of code seen by the actual compiler

Suppose we have three files:

A2.h
#ifndef A2_H #define A2_H // This is file A2.h code from A2.h #endif

B2.h
#ifndef B2_H #define B2_H //This is file B2.h #include "A2.h" code from B2.h more code from B2.h #endif

C2.cpp
//This is file C.cpp #include "A2.h" #include "B2.h" code from C2.cpp
We ask the compiler to only run the preprocessor and save the result:

g++ -E C2.cpp > C2.i

The result is file C2.i.

Note that the code from A2.h is included only once
Imagine now, how much we would have saved if that were iostream instead of A2.h

2 Declarations and Definitions

Some of the most common error messages you will encounter as a C++ programmer are

… is undeclared
… is undefined
… is defined multiple times

Fixing these requires that you understand the difference between declarations and definitions.

and how they relate to the program structure

Warning: Textbooks & C++ websites are often sloppy about this terminology.

(Error messages from compilers, on the other hand, tend to be very precise about it, and assume that you understand the difference.)

2.1 Declarations

A declaration in C++

introduces (or repeats) a name for something
tells what “kind” of thing it is
gives programmers enough information to use it

2.2 Definitions

A definition in C++

introduces (or repeats) a name for something
tells what “kind” of thing it is
tells what value it has and/or how it works
gives the compiler enough information to generate this and assign it an address

General rules for declarations & definitions:

All definitions are also declarations.

But not vice versa

A name must be declared before you can write any code that uses it.

A name can be declared any number of times, as long as the declarations are identical.

A name must be defined exactly once, somewhere within all the separately compiled files making up a program.

2.3 Decls&Defs: Variables

These are definitions of variables:

int x;
string s = "abc";
MyFavoriteDataType mfdt (0);

These are declarations:

extern int x;
extern string s;
extern MyFavoriteDataType mfdt;

2.4 Decls&Defs: Functions

Declaration:
```
int myFunction (int x, int y);
```

Definition

int myFunction (int x, int y)
{
   return x + y;
}

The declaration provides only the header. The definition adds the body.

2.5 Decls&Defs: Data Types

Data types in C++ are declared, but never defined.

These are declarations:

typedef float Weight;
typedef string* StringPointer;
enum Colors {red, blue, green};
struct Money {
   int dollars;
   int cents;
};

3 Dividing code into modules

Most of the source code files that make up a program are divided into modules consisting of a header file and a closely related compilation unit.

How do we divide up the declarations and definitions that make up a program into modules?

I usually focus on the headers first, then the compilation units. But in practice this is an iterative process, in which I make an initial division of the headers, look at what that does to the compilation units, modify the headers, look at what that changes the compilation units, etc.

Dividing up the headers:

Headers can only contain declarations – no definitions.
Headers are for sharing – if a declaration is only used in one module, it doesn’t need to be in a header. It can be hidden inside the compilation unit, thus reducing some potential coupling.

Start by identifying declarations of types, variables & constants, and functions that relate to a common “theme”.

Often the theme is “here’s a structured data type and all of the functions that manipulate it”.
Create a module for each such “theme”. Begin by collecting the declarations related to that theme into a single header file.
Pair up compilation units with the header files
- Usually these will have matching names. If we had decided to have a “time” module with a header named time.h, I would add a compilation unit time.cpp.
- The compilation unit provide definitions for each declaration in the header that is it paired with.
Now look for things that can be improved, e.g.,
- Are there declarations in a header that are only used from within its own compilation unit? Hide them within the compilation unit.
- Does a compilation unit try to use a symbol from another module that isn’t in a header? Move the symbol declaration into the header?
- Do you have circular dependencies where header file 1 #includes header file 2 that #includes header file 1? Try to simplify by moving things around of maybe dividing one of those modules into two pieces, only one of which is needed by the other.

Dividing a program into modules is something of an art, and takes practice. Experience in reading other people’s code and seeing how they divided things up is useful.

In CS250, you will be introduced to the idea of an “abstract data type”, which is a powerful organizing principle for achieving well-designed modules.