Phases of translation

The C++ source file is processed by the compiler as if the following phases take place, in this exact order:

1) Whenever backslash appears at the end of a line (immediately followed by the newline character), both backslash and newline are deleted, combining two physical source lines into one logical source line. This is a single-pass operation, a line ending in two backslashes followed by an empty line does not combine three lines into one). If a universal character name (\uXXX) is formed on this phase, the behavior is undefined.

2) If a non-empty source file does not end with a newline character after this step (whether it had no newline originally, or it ended with a backslash)

the behavior is undefined (until C++11)
a terminating newline character is added (since C++11)

[edit] Phase 3

1) The source file is decomposed into comments, sequences of whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), and preprocessing tokens, which are the following

a) header names: <iostream> or "myfile.h"

b) identifiers

c) numbers

d) character and string literals, including user-defined

e) operators and punctuators (including alternative tokens), such as +, <<=, new, <%, ##, or and.

f) individual non-whitespace characters that do not fit in any other category

2) Each comment is replaced by one space character

3) Newlines are kept, and it's unspecified whether non-newline whitespace sequences may be collapsed into single space characters.

[edit] Phase 4

1) Preprocessor is executed.

2) Each file introduced with the #include directive goes through phases 1 through 4, recursively.

3) At the end of this phase, all preprocessor directives are removed from the source.

[edit] Phase 5

1) All characters in character literals and string literals are converted from source character set to execution character set.

2) Escape sequences and universal character names in character literals and non-raw string literals are expanded and converted to execution character set. If the character specified by universal character name isn't a member of the execution character set, the result is implementation-defined, but is guaranteed to not be a null (wide) character.

[edit] Phase 6

Adjacent string literals are concatenated.

[edit] Phase 7

Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.

[edit] Phase 8

Each translation unit is examined to produce a list of required template instantiations, including the ones requested by explicit instantiations). The definitions of the templates are located, the required instantiations are performed to produce instantiation units

[edit] Phase 9

Translation units, instantiation units, and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment.

[edit] Notes

Some compilers don't implement instantiation units (also known as template repositories or template registries) and simply compile each template instantiation at Phase 7, storing the code in the object file where it is implicitly or explicitly requested, and then the linker collapses these compiled instantiations into one at Phase 9

[edit] References

C++11 standard (ISO/IEC 14882:2011):

2.2 Phases of translation [lex.phases]

C++98 standard (ISO/IEC 14882:1998):

2.1 Phases of translation [lex.phases]

Language
Standard library headers
Concepts
Utilities library
Strings library
Containers library
Algorithms library
Iterators library
Numerics library
Input/output library
Localizations library
Regular expressions library (C++11)
Atomic operations library (C++11)
Thread support library (C++11)

language keywords
phases of translation
comments
the main() function
names and identifiers
types
fundamental types
objects
scope
object lifetime
storage duration and linkage
definitions and ODR
name lookup
memory model