lex


  ______________________________________________________________________

  2   Lexical conventions                                          [lex]

  ______________________________________________________________________

1 The text of the program is kept in units called source files  in  this
  International  Standard.   A source file together with all the headers
  (_lib.headers_) and source files included (_cpp.include_) via the pre-
  processing directive #include, less any source lines skipped by any of
  the conditional inclusion (_cpp.cond_)  preprocessing  directives,  is
  called  a  translation  unit.   [Note:  a C++  program need not all be
  translated at the same time.  ]

2 [Note: previously translated translation units and instantiation units
  can  be preserved individually or in libraries.  The separate transla-
  tion units of a program communicate (_basic.link_)  by  (for  example)
  calls  to functions whose identifiers have external linkage, manipula-
  tion of objects whose identifiers have external linkage, or  manipula-
  tion  of  data  files.  Translation units can be separately translated
  and  then   later   linked   to   produce   an   executable   program.
  (_basic.link_).  ]

  2.1  Phases of translation                                [lex.phases]

1 The  precedence  among the syntax rules of translation is specified by
  the following phases.1)

  1 Physical  source  file  characters are mapped, in an implementation-
    defined manner, to the basic source character set (introducing  new-
    line  characters for end-of-line indicators) if necessary.  Trigraph
    sequences (_lex.trigraph_) are  replaced  by  corresponding  single-
    character  internal  representations.  Any source file character not
    in the basic source character set (_lex.charset_) is replaced by the
    universal-character-name that designates that character.  (An imple-
    mentation may use any  internal  encoding,  so  long  as  an  actual
    extended  character  encountered  in  the  source file, and the same
    extended character expressed in the source file as a universal-char-
    acter-name  (i.e.  using  the  \uXXXX notation), are handled equiva-
    lently.)

  2 Each instance of a new-line character and an  immediately  preceding
    backslash  character  is  deleted, splicing physical source lines to
    form logical source lines.  If, as a result,  a  character  sequence
    that  matches  the syntax of a universal-character-name is produced,
  _________________________
  1) Implementations must behave as if these separate phases occur,  al-
  though in practice different phases might be folded together.

    the behavior is undefined.  If a source file that is not empty  does
    not  end  in  a  new-line character, or ends in a new-line character
    immediately preceded by a backslash character, the behavior is unde-
    fined.

  3 The  source file is decomposed into preprocessing tokens (_lex.ppto-
    ken_) and sequences of white-space characters (including  comments).
    A source file shall not end in a partial preprocessing token or par-
    tial comment2).  Each comment is replaced by  one  space  character.
    New-line characters are retained.  Whether each nonempty sequence of
    white-space characters other than new-line is retained  or  replaced
    by  one  space  character is implementation-defined.  The process of
    dividing a source file's characters  into  preprocessing  tokens  is
    context-dependent.   [Example:  see  the  handling  of  <  within  a
    #include preprocessing directive.  ]

  4 Preprocessing directives are  executed  and  macro  invocations  are
    expanded.  If a character sequence that matches the syntax of a uni-
    versal-character-name is produced by token concatenation  (_cpp.con-
    cat_),  the  behavior is undefined.  A #include preprocessing direc-
    tive causes the named header or source file  to  be  processed  from
    phase 1 through phase 4, recursively.

  5 Each  source  character  set  member, escape sequence, or universal-
    character-name in character literals and  string  literals  is  con-
    verted  to  a  member  of  the  execution character set (_lex.ccon_,
    _lex.string_).

  6 Adjacent ordinary string literal tokens are concatenated.   Adjacent
    wide string literal tokens are concatenated.

  7 White-space  characters separating tokens are no longer significant.
    Each preprocessing token is converted into a token.   (_lex.token_).
    The resulting tokens are syntactically and semantically analyzed and
    translated.  [Note: Source files, translation units  and  translated
    translation  units need not necessarily be stored as files, nor need
    there be any one-to-one correspondence between  these  entities  and
    any  external  representation.   The description is conceptual only,
    and does not specify any particular implementation.  ]

  8 Translated translation units and instantiation units are combined as
    follows: [Note: some or all of these may be supplied from a library.
    ] Each translated translation unit is examined to produce a list  of
    required  instantiations.   [Note:  this  may include instantiations
    which have been explicitly requested (_temp.explicit_).  ] The defi-
    nitions  of  the  required templates are located.  It is implementa-
    tion-defined whether the source of the translation units  containing
  _________________________
  2) A partial preprocessing token would arise from a source file ending
  in the first portion of a multi-character token that requires a termi-
  nating sequence of characters, such as a header-name that  is  missing
  the  closing " or >.  A partial comment would arise from a source file
  ending with an unclosed /* comment.

    these  definitions is required to be available.  [Note: an implemen-
    tation could  encode  sufficient  information  into  the  translated
    translation unit so as to ensure the source is not required here.  ]
    All the required instantiations are performed to produce  instantia-
    tion  units.   [Note:  these  are  similar to translated translation
    units, but contain no references to uninstantiated templates and  no
    template definitions.  ] The program is ill-formed if any instantia-
    tion fails.

  9 All external object and function references are  resolved.   Library
    components  are  linked  to satisfy external references to functions
    and objects not defined in the current translation.  All such trans-
    lator output is collected into a program image which contains infor-
    mation needed for execution in its execution environment.

  2.2  Character sets                                      [lex.charset]

1 The basic source character set consists of 96  characters:  the  space
  character,  the control characters representing horizontal tab, verti-
  cal tab, form feed, and new-line,  plus  the  following  91  graphical
  characters:3)
     a b c d e f g h i j k l m n o p q r s t u v w x y z
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
     0 1 2 3 4 5 6 7 8 9
     _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

2 The  universal-character-name  construct  provides a way to name other
  characters.
     hex-quad:
             hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

     universal-character-name:
             \u hex-quad
             \U hex-quad hex-quad
  The character designated by the universal-character-name \UNNNNNNNN is
  that  character  whose  character  short  name  in  ISO/IEC  10646  is
  NNNNNNNN; the character  designated  by  the  universal-character-name
  \uNNNN  is  that character whose character short name in ISO/IEC 10646
  is 0000NNNN.  If the hexadecimal value for a universal character  name
  is  less  than  0x20  or in the range 0x7F-0x9F (inclusive), or if the
  universal character name designates a character in  the  basic  source
  character set, then the program is ill-formed.

3 The basic execution character set and the basic execution wide-charac-
  ter set shall each  contain  all  the  members  of  the  basic  source
  _________________________
  3)  The  glyphs  for the members of the basic source character set are
  intended to identify characters from the subset of ISO/IEC 10646 which
  corresponds  to the ASCII character set.  However, because the mapping
  from source file characters to the source character set (described  in
  translation phase 1) is specified as implementation-defined, an imple-
  mentation is required to document how the basic source characters  are
  represented in source files.

  character  set, plus control characters representing alert, backspace,
  and carriage return, plus a null character  (respectively,  null  wide
  character),  whose  representation  has all zero bits.  For each basic
  execution character set, the values of the members shall be  non-nega-
  tive  and  distinct from one another.  The execution character set and
  the execution wide-character set are supersets of the basic  execution
  character  set  and  the  basic  execution wide-character set, respec-
  tively.  The values of the members of the execution character sets are
  implementation-defined,  and  any  additional  members are locale-spe-
  cific.

  2.3  Trigraph sequences                                 [lex.trigraph]

1 Before any other processing takes place, each occurrence of one of the
  following  sequences  of  three  characters  ("trigraph sequences") is
  replaced by the single character indicated in Table 1.

                       Table 1--trigraph sequences

  +-----------------------+------------------------+------------------------+
  |trigraph   replacement | trigraph   replacement | trigraph   replacement |
  +-----------------------+------------------------+------------------------+
  |  ??=           #      |   ??(           [      |   ??<           {      |
  +-----------------------+------------------------+------------------------+
  |  ??/           \      |   ??)           ]      |   ??>           }      |
  +-----------------------+------------------------+------------------------+
  |  ??'           ^      |   ??!           |      |   ??-           ~      |
  +-----------------------+------------------------+------------------------+

2 [Example:
     ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
  becomes
     #define arraycheck(a,b) a[b] || b[a]
   --end example]

3 No other trigraph sequence exists.  Each ?  that does not begin one of
  the trigraphs listed above is not changed.

  2.4  Preprocessing tokens                                [lex.pptoken]
     preprocessing-token:
             header-name
             identifier
             pp-number
             character-literal
             string-literal
             preprocessing-op-or-punc
             each non-white-space character that cannot be one of the above

1 Each  preprocessing  token  that is converted to a token (_lex.token_)
  shall have the lexical form of a keyword, an identifier, a literal, an
  operator, or a punctuator.

2 A  preprocessing  token is the minimal lexical element of the language
  in translation phases 3 through 6.  The  categories  of  preprocessing
  token are: header names, identifiers, preprocessing numbers, character
  literals, string literals, preprocessing-op-or-punc, and  single  non-
  white-space  characters  that do not lexically match the other prepro-
  cessing token categories.  If a ' or a " character  matches  the  last
  category, the behavior is undefined.  Preprocessing tokens can be sep-
  arated by white space; this consists of comments  (_lex.comment_),  or
  white-space characters (space, horizontal tab, new-line, vertical tab,
  and form-feed), or both.  As described in  clause  _cpp_,  in  certain
  circumstances  during translation phase 4, white space (or the absence
  thereof) serves as more than preprocessing  token  separation.   White
  space can appear within a preprocessing token only as part of a header
  name or between the quotation characters in  a  character  literal  or
  string literal.

3 If  the input stream has been parsed into preprocessing tokens up to a
  given character, the next preprocessing token is the longest  sequence
  of  characters  that  could  constitute a preprocessing token, even if
  that would cause further lexical analysis to fail.

4 [Example: The program fragment 1Ex is parsed as a preprocessing number
  token  (one  that  is  not a valid floating or integer literal token),
  even though a parse as the pair of preprocessing tokens 1 and Ex might
  produce a valid expression (for example, if Ex were a macro defined as
  +1).  Similarly, the program fragment 1E1 is parsed as a preprocessing
  number  (one that is a valid floating literal token), whether or not E
  is a macro name.  ]

5 [Example: The program fragment x+++++y is parsed  as  x  ++  ++  +  y,
  which,  if  x  and  y  are of built-in types, violates a constraint on
  increment operators, even though the parse x ++ + ++ y might  yield  a
  correct expression.  ]

  2.5  Alternative tokens                                  [lex.digraph]

1 Alternative  token representations are provided for some operators and
  punctuators4).

2 In  all  respects  of the language, each alternative token behaves the
  same, respectively, as its primary token, except for  its  spelling5).
  The set of alternative tokens is defined in Table 2.

  _________________________
  4) These include "digraphs" and additional reserved words.   The  term
  "digraph"  (token  consisting  of two characters) is not perfectly de-
  scriptive, since one of the alternative preprocessing-tokens  is  %:%:
  and of course several primary tokens contain two characters.  Nonethe-
  less, those alternative tokens that aren't lexical keywords are collo-
  quially known as "digraphs".
  5) Thus the "stringized" values (_cpp.stringize_) of [ and <: will  be
  different,  maintaining the source spelling, but the tokens can other-
  wise be freely interchanged.

                       Table 2--alternative tokens

  +----------------------+-----------------------+-----------------------+
  |alternative   primary | alternative   primary | alternative   primary |
  +----------------------+-----------------------+-----------------------+
  |    <%           {    |     and         &&    |   and_eq        &=    |
  +----------------------+-----------------------+-----------------------+
  |    %>           }    |    bitor         |    |    or_eq        |=    |
  +----------------------+-----------------------+-----------------------+
  |    <:           [    |     or          ||    |   xor_eq        ^=    |
  +----------------------+-----------------------+-----------------------+
  |    :>           ]    |     xor          ^    |     not          !    |
  +----------------------+-----------------------+-----------------------+
  |    %:           #    |    compl         ~    |   not_eq        !=    |
  +----------------------+-----------------------+-----------------------+
  |   %:%:         ##    |   bitand         &    |                       |
  +----------------------+-----------------------+-----------------------+

  2.6  Tokens                                                [lex.token]
     token:
             identifier
             keyword
             literal
             operator
             punctuator

1 There  are  five  kinds  of tokens: identifiers, keywords, literals,6)
  operators, and other  separators.   Blanks,  horizontal  and  vertical
  tabs, newlines, formfeeds, and comments (collectively, "white space"),
  as described below, are ignored  except  as  they  serve  to  separate
  tokens.   [Note:  Some  white  space is required to separate otherwise
  adjacent identifiers,  keywords,  numeric  literals,  and  alternative
  tokens containing alphabetic characters.  ]

  2.7  Comments                                            [lex.comment]

1 The  characters  /* start a comment, which terminates with the charac-
  ters */.  These comments do not nest.  The characters // start a  com-
  ment,  which terminates with the next new-line character.  If there is
  a form-feed or a vertical-tab character in such a comment, only white-
  space  characters shall appear between it and the new-line that termi-
  nates the comment; no diagnostic  is  required.   [Note:  The  comment
  characters  //, /*, and */ have no special meaning within a // comment
  and are treated just like other characters.   Similarly,  the  comment
  characters // and /* have no special meaning within a /* comment.  ]

  _________________________
  6) Literals include strings and character and numeric literals.

  2.8  Header names                                         [lex.header]
     header-name:
             <h-char-sequence>
             "q-char-sequence"
     h-char-sequence:
             h-char
             h-char-sequence h-char
     h-char:
             any member of the source character set except
                     new-line and >
     q-char-sequence:
             q-char
             q-char-sequence q-char
     q-char:
             any member of the source character set except
                     new-line and "

1 Header  name  preprocessing tokens shall only appear within a #include
  preprocessing directive (_cpp.include_).  The sequences in both  forms
  of  header-names  are  mapped  in  an implementation-defined manner to
  headers  or  to  external  source   file   names   as   specified   in
  _cpp.include_.

2 If  either  of  the  characters  '  or  \,  or either of the character
  sequences /* or // appears in a q-char-sequence or a  h-char-sequence,
  or  the  character  "  appears  in  a h-char-sequence, the behavior is
  undefined.7)

  2.9  Preprocessing numbers                              [lex.ppnumber]
     pp-number:
             digit
             . digit
             pp-number digit
             pp-number nondigit
             pp-number e sign
             pp-number E sign
             pp-number .

1 Preprocessing  number  tokens  lexically  include all integral literal
  tokens (_lex.icon_) and all floating literal tokens (_lex.fcon_).

2 A preprocessing number does not have a type or a  value;  it  acquires
  both  after  a  successful conversion (as part of translation phase 7,
  _lex.phases_) to an integral  literal  token  or  a  floating  literal
  token.

  2.10  Identifiers                                           [lex.name]

  _________________________
  7) Thus, sequences of characters that resemble escape sequences  cause
  undefined behavior.

     identifier:
             nondigit
             identifier nondigit
             identifier digit
     nondigit: one of
             universal-character-name
             _ a b c d e f g h i j k l m
               n o p q r s t u v w x y z
               A B C D E F G H I J K L M
               N O P Q R S T U V W X Y Z
     digit: one of
             0 1 2 3 4 5 6 7 8 9

1 An  identifier  is an arbitrarily long sequence of letters and digits.
  Each universal-character-name in an identifier shall designate a char-
  acter  whose encoding in ISO 10646 falls into one of the ranges speci-
  fied in Annex _extendid_.  Upper- and lower-case letters  are  differ-
  ent.  All characters are significant.8)

2 In  addition, some identifiers are reserved for use by C++ implementa-
  tions and standard libraries (_lib.global.names_)  and  shall  not  be
  used otherwise; no diagnostic is required.

  2.11  Keywords                                               [lex.key]

1 The  identifiers  shown  in  Table  3 are reserved for use as keywords
  (that is, they are unconditionally treated as keywords in phase 7):

  _________________________
  8)  On  systems in which linkers cannot accept extended characters, an
  encoding of the universal-character-name may be used in forming  valid
  external identifiers.  For example, some otherwise unused character or
  sequence of characters may be used to encode the \u  in  a  universal-
  character-name.  Extended characters may produce a long external iden-
  tifier, but C++ does not place  a  translation  limit  on  significant
  characters  for  external  identifiers.  In C++, upper- and lower-case
  letters are considered different for all identifiers, including exter-
  nal identifiers.

                            Table 3--keywords

  +----------------------------------------------------------------------+
  |asm          do             if                 return        typedef  |
  |auto         double         inline             short         typeid   |
  |bool         dynamic_cast   int                signed        typename |
  |break        else           long               sizeof        union    |
  |case         enum           mutable            static        unsigned |
  |catch        explicit       namespace          static_cast   using    |
  |char         export         new                struct        virtual  |
  |class        extern         operator           switch        void     |
  |const        false          private            template      volatile |
  |const_cast   float          protected          this          wchar_t  |
  |continue     for            public             throw         while    |
  |default      friend         register           true                   |
  |delete       goto           reinterpret_cast   try                    |
  +----------------------------------------------------------------------+

2 Furthermore, the alternative representations shown in Table 4 for cer-
  tain  operators and punctuators (_lex.digraph_) are reserved and shall
  not be used otherwise:

                   Table 4--alternative representations

            +------------------------------------------------+
            |and      and_eq   bitand   bitor   compl    not |
            |not_eq   or       or_eq    xor     xor_eq       |
            +------------------------------------------------+

  2.12  Operators and punctuators                        [lex.operators]

1 The lexical representation of C++ programs includes a number  of  pre-
  processing  tokens which are used in the syntax of the preprocessor or
  are converted into tokens for operators and punctuators:
     preprocessing-op-or-punc: one of
             {       }       [       ]       #       ##      (       )
             <:      :>      <%      %>      %:      %:%:    ;       :       ...
             new     delete  ?       ::      .       .*
             +       -       *       /       %       ^       &       |       ~
             !       =       <       >       +=      -=      *=      /=      %=
             ^=      &=      |=      <<      >>      >>=     <<=     ==      !=
             <=      >=      &&      ||      ++      --      ,       ->*     ->
             and     and_eq  bitand  bitor   compl   not     not_eq
             or      or_eq   xor     xor_eq
  Each preprocessing-op-or-punc is converted to a single token in trans-
  lation phase 7 (_lex.phases_).

  2.13  Literals                                           [lex.literal]

1 There are several kinds of literals.9)
     literal:
             integer-literal
             character-literal
             floating-literal
             string-literal
             boolean-literal

  2.13.1  Integer literals                                    [lex.icon]
     integer-literal:
             decimal-literal integer-suffixopt
             octal-literal integer-suffixopt
             hexadecimal-literal integer-suffixopt
     decimal-literal:
             nonzero-digit
             decimal-literal digit
     octal-literal:
             0
             octal-literal octal-digit
     hexadecimal-literal:
             0x hexadecimal-digit
             0X hexadecimal-digit
             hexadecimal-literal hexadecimal-digit
     nonzero-digit: one of
             1  2  3  4  5  6  7  8  9
     octal-digit: one of
             0  1  2  3  4  5  6  7
     hexadecimal-digit: one of
             0  1  2  3  4  5  6  7  8  9
             a  b  c  d  e  f
             A  B  C  D  E  F
     integer-suffix:
             unsigned-suffix long-suffixopt
             long-suffix unsigned-suffixopt
     unsigned-suffix: one of
             u  U
     long-suffix: one of
             l  L

1 An integer literal is a sequence of digits that has no period or expo-
  nent  part.   An  integer literal may have a prefix that specifies its
  base and a suffix that specifies its type.  The lexically first  digit
  of  the sequence of digits is the most significant.  A decimal integer
  literal (base ten) begins with a digit other than 0 and consists of  a
  sequence  of  decimal  digits.   An octal integer literal (base eight)
  begins with the digit 0 and consists of a sequence of octal digits.10)
  A  hexadecimal integer literal (base sixteen) begins with 0x or 0X and
  _________________________
  9)  The  term  "literal"  generally  designates, in this International
  Standard, those tokens that are called "constants" in ISO C.
  10) The digits 8 and 9 are not octal digits.

  consists of a sequence of hexadecimal digits, which include the  deci-
  mal  digits  and  the letters a through f and A through F with decimal
  values ten through fifteen.  [Example: the number twelve can be  writ-
  ten 12, 014, or 0XC.  ]

2 The type of an integer literal depends on its form, value, and suffix.
  If it is decimal and has no suffix, it has the first of these types in
  which its value can be represented: int, long int; if the value cannot
  be represented as a long int, the behavior is  undefined.   If  it  is
  octal  or  hexadecimal  and  has  no suffix, it has the first of these
  types in which its value can be represented: int, unsigned  int,  long
  int,  unsigned long int.  If it is suffixed by u or U, its type is the
  first of these types in which its value can be  represented:  unsigned
  int,  unsigned long int.  If it is suffixed by l or L, its type is the
  first of these types in which its value can be represented: long  int,
  unsigned  long  int.  If it is suffixed by ul, lu, uL, Lu, Ul, lU, UL,
  or LU, its type is unsigned long int.

3 A program is ill-formed if one of its translation  units  contains  an
  integer  literal  that  cannot  be  represented  by any of the allowed
  types.

  2.13.2  Character literals                                  [lex.ccon]
     character-literal:
             'c-char-sequence'
             L'c-char-sequence'
     c-char-sequence:
             c-char
             c-char-sequence c-char
     c-char:
             any member of the source character set except
                     the single-quote ', backslash \, or new-line character
             escape-sequence
             universal-character-name
     escape-sequence:
             simple-escape-sequence
             octal-escape-sequence
             hexadecimal-escape-sequence
     simple-escape-sequence: one of
             \'  \"  \?  \\
             \a  \b  \f  \n  \r  \t  \v
     octal-escape-sequence:
             \ octal-digit
             \ octal-digit octal-digit
             \ octal-digit octal-digit octal-digit
     hexadecimal-escape-sequence:
             \x hexadecimal-digit
             hexadecimal-escape-sequence hexadecimal-digit

1 A character literal is one  or  more  characters  enclosed  in  single
  quotes, as in 'x', optionally preceded by the letter L, as in L'x'.  A
  character literal that does not begin with L is an ordinary  character
  literal,  also referred to as a narrow-character literal.  An ordinary
  character literal that contains a single c-char has  type  char,  with

  value  equal  to  the numerical value of the encoding of the c-char in
  the execution character set.  An ordinary character literal that  con-
  tains  more than one c-char is a multicharacter literal.  A multichar-
  acter literal has type int and implementation-defined value.

2 A character literal that begins with the letter L, such as L'x', is  a
  wide-character literal.  A wide-character literal has type wchar_t.11)
  The  value  of a wide-character literal containing a single c-char has
  value equal to the numerical value of the encoding of  the  c-char  in
  the  execution wide-character set.  The value of a wide-character lit-
  eral containing multiple c-chars is implementation-defined.

3 Certain nongraphic characters, the single quote ', the double quote ",
  the question mark ?, and the backslash \, can be represented according
  to Table 5.

                        Table 5--escape sequences

                   +----------------------------------+
                   |new-line          NL (LF)   \n    |
                   |horizontal tab    HT        \t    |
                   |vertical tab      VT        \v    |
                   |backspace         BS        \b    |
                   |carriage return   CR        \r    |
                   |form feed         FF        \f    |
                   |alert             BEL       \a    |
                   |backslash         \         \\    |
                   |question mark     ?         \?    |
                   |single quote      '         \'    |
                   |double quote      "         \"    |
                   |octal number      ooo       \ooo  |
                   |hex number        hhh       \xhhh |
                   +----------------------------------+
  The double quote " and the question mark  ?,  can  be  represented  as
  themselves or by the escape sequences \" and \?  respectively, but the
  single quote ' and the backslash \ shall be represented by the  escape
  sequences  \' and \\ respectively.  If the character following a back-
  slash is not one of those specified, the behavior  is  undefined.   An
  escape sequence specifies a single character.

4 The  escape  \ooo  consists  of the backslash followed by one, two, or
  three octal digits that are taken to specify the value of the  desired
  character.   The  escape \xhhh consists of the backslash followed by x
  followed by one or more hexadecimal digits that are taken  to  specify
  the  value  of the desired character.  There is no limit to the number
  of digits in a hexadecimal sequence.  A sequence of octal or hexadeci-
  mal  digits  is terminated by the first character that is not an octal
  _________________________
  11)  They  are  intended for character sets where a character does not
  fit into a single byte.

  digit or a hexadecimal digit, respectively.  The value of a  character
  literal is implementation-defined if it falls outside of the implemen-
  tation-defined range defined  for  char  (for  ordinary  literals)  or
  wchar_t (for wide literals).

5 A  universal-character-name is translated to the encoding, in the exe-
  cution character set, of the character named.  If  there  is  no  such
  encoding, the universal-character-name is translated to an implementa-
  tion-defined encoding.  [Note: in translation phase  1,  a  universal-
  character-name  is introduced whenever an actual extended character is
  encountered in the source text.  Therefore,  all  extended  characters
  are  described  in  terms  of universal-character-names.  However, the
  actual compiler implementation may use its own native  character  set,
  so long as the same results are obtained.  ]

  2.13.3  Floating literals                                   [lex.fcon]
     floating-literal:
             fractional-constant exponent-partopt floating-suffixopt
             digit-sequence exponent-part floating-suffixopt
     fractional-constant:
             digit-sequenceopt . digit-sequence
             digit-sequence .
     exponent-part:
             e signopt digit-sequence
             E signopt digit-sequence
     sign: one of
             +  -
     digit-sequence:
             digit
             digit-sequence digit
     floating-suffix: one of
             f  l  F  L

1 A  floating  literal  consists  of an integer part, a decimal point, a
  fraction part, an e or E, an optionally signed integer  exponent,  and
  an  optional type suffix.  The integer and fraction parts both consist
  of a sequence of decimal (base ten) digits.  Either the  integer  part
  or  the  fraction  part  (not both) can be omitted; either the decimal
  point or the letter e (or E) and the exponent (not both) can be  omit-
  ted.   The  integer  part, the optional decimal point and the optional
  fraction part form the significant part of the floating literal.   The
  exponent,  if present, indicates the power of 10 by which the signifi-
  cant part is to be scaled.  If the scaled value is  in  the  range  of
  representable  values  for its type, the result is the scaled value if
  representable, else the larger or smaller representable value  nearest
  the  scaled  value,  chosen  in an implementation-defined manner.  The
  type of a floating literal is double unless explicitly specified by  a
  suffix.   The  suffixes  f  and  F specify float, the suffixes l and L
  specify long double.  If the scaled value is not in the range of  rep-
  resentable values for its type, the program is ill-formed.

  2.13.4  String literals                                   [lex.string]
     string-literal:
             "s-char-sequenceopt"
             L"s-char-sequenceopt"
     s-char-sequence:
             s-char
             s-char-sequence s-char
     s-char:
             any member of the source character set except
                     the double-quote ", backslash \, or new-line character
             escape-sequence
             universal-character-name

1 A   string  literal  is  a  sequence  of  characters  (as  defined  in
  _lex.ccon_) surrounded by double quotes, optionally beginning with the
  letter L, as in "..." or L"...".  A string literal that does not begin
  with L is an ordinary string literal, also referred  to  as  a  narrow
  string literal.  An ordinary string literal has type "array of n const
  char" and static storage duration (_basic.stc_), where n is  the  size
  of  the  string  as  defined  below, and is initialized with the given
  characters.  A string literal that begins with L, such as L"asdf",  is
  a  wide  string  literal.   A wide string literal has type "array of n
  const wchar_t" and has static storage duration, where n is the size of
  the string as defined below, and is initialized with the given charac-
  ters.

2 Whether all string literals are  distinct  (that  is,  are  stored  in
  nonoverlapping  objects)  is  implementation-defined.   The  effect of
  attempting to modify a string literal is undefined.

3 In translation phase 6 (_lex.phases_), adjacent narrow string literals
  are  concatenated  and adjacent wide string literals are concatenated.
  If a narrow string literal token is adjacent to a wide string  literal
  token,  the behavior is undefined.  Characters in concatenated strings
  are kept distinct.  [Example:
     "\xA" "B"
  contains the two characters '\xA' and 'B' after concatenation (and not
  the single hexadecimal character '\xAB').  ]

4 After   any   necessary   concatenation,   in   translation   phase  7
  (_lex.phases_), '\0' is appended to every string literal so that  pro-
  grams that scan a string can find its end.

5 Escape sequences and universal-character-names in string literals have
  the same meaning as in character literals  (_lex.ccon_),  except  that
  the  single quote ' is representable either by itself or by the escape
  sequence \', and the double quote " shall be preceded by a  \.   In  a
  narrow string literal, a universal-character-name may map to more than
  one char element due to multibyte encoding.  The size of a wide string
  literal  is the total number of escape sequences, universal-character-
  names, and other characters, plus one for the terminating L'\0'.   The
  size  of  a  narrow  string  literal  is  the  total  number of escape
  sequences and other characters, plus at least one  for  the  multibyte
  encoding   of   each   universal-character-name,   plus  one  for  the

  terminating '\0'.

  2.13.5  Boolean literals                                    [lex.bool]
     boolean-literal:
             false
             true

1 The Boolean literals are the keywords false and true.   Such  literals
  have type bool.  They are not lvalues.