ECMA-334: 9.4.1 Unicode escape sequences


Jump to: navigation, search
C# Language Specification
© 2006 ECMA International

9.4.1 Unicode escape sequences

A Unicode escape sequence represents a Unicode character. Unicode escape sequences are processed in identifiers (§9.4.2), regular string literals (§9.4.4.5), and character literals (§9.4.4.4). A Unicode character escape is not processed in any other location (for example, to form an operator, punctuator, or keyword).

unicode-escape-sequence::
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit

A Unicode escape sequence represents the single Unicode character formed by the hexadecimal number following the "\u" or "\U" characters. Since C# uses a 16-bit encoding of Unicode characters in characters and string values, a Unicode code point in the range U+10000 to U+10FFFF is represented using two Unicode surrogate code units. Unicode code points above 0x10FFFF are invalid and are not supported.

Multiple translations are not performed. For instance, the string literal "\u005Cu005C" is equivalent to "\u005C" rather than "\". [Note: The Unicode value \u005C is the character "\". end note]

[Example: The example

class Class1
{
  static void Test(bool \u0066)
  {
    char c = '\u0066';
    if (\u0066)
      System.Console.WriteLine(c.ToString());
  }
}

shows several uses of \u0066, which is the escape sequence for the letter "f". The program is equivalent to

class Class1
{
  static void Test(bool f)
  {
    char c = 'f';
    if (f)
      System.Console.WriteLine(c.ToString());
  }
}

end example]


Share this page
  • del.icio.us
  • Facebook
  • Google+
  • StumbleUpon