cpplib.h (CPP_AT_NAME, [...]): New token types.

* cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types. (struct cpp_options): Add narrow_charset, wide_charset, bytes_big_endian fields. Remove EBCDIC field. (cpp_init_iconv, cpp_interpret_string): New external interfaces. * cpphash.h: Include <iconv.h> if we have it, otherwise provide a dummy definition of iconv_t. (struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields. (_cpp_valid_ucn): Update prototype. (_cpp_destroy_iconv): New prototype. * doc/cpp.texi: Document character set handling. * doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=. * doc/extend.texi: Delete entire section on multiline strings. Rewrite section on __FUNCTION__ etc now that these are variables in C. * cppucnid.tab, cppucnid.pl: New files. * cppucnid.h: New generated file. * cppcharset.c: Include cppucnid.h. Lots of commentary added. (iconv_open, iconv, iconv_close): Provide dummy definitions if !HAVE_ICONV. (SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv, _cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn, emit_numeric_escape, convert_hex, convert_oct, convert_escape, cpp_interpret_string, narrow_str_to_charconst, wide_str_to_charconst): New. (ucn_valid_in_identifier): Use a binary search through the ucnranges table defined in cppucnid.h, not a long chain of if statements. (_cpp_valid_ucn): Add a limit pointer. Downgrade "universal character names are only valid in C++ and C99" to a warning. Issue the "meaning of \[uU] is different in traditional C" warning here. Take care not to let iconv see an invalid UCS value if we get a malformed UCN. Issue an error if we don't have iconv. (cpp_interpret_charconst): Moved here from cpplex.c. Use cpp_interpret_string to do the heavy lifting. * cppinit.c (cpp_create_reader): Initialize bytes_big_endian, narrow_charset, wide_charset fields of options structure. (cpp_destroy): Call _cpp_destroy_iconv. * cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn. (maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete. (cpp_interpret_charconst): Moved to cppcharset.c. * cpplib.c (dequote_string): Delete. (interpret_string_notranslate): New. (do_line, do_linemarker): Use interpret_string_notranslate. * Makefile.in (cppcharset.o): Depend on cppucnid.h. * c-common.c (fname_string, combine_strings): Delete. * c-common.h (fname_string, combine_strings): Delete prototypes. * c-lex.c (ignore_escape_flag): Delete. (cb_ident): Use cpp_interpret_string, not lex_string. (get_nonpadding_token): New function. (c_lex): Handle Objective-C @-prefixed identifiers and strings here. Adjust calls to lex_string. Don't write *value twice. (lex_string): Now handles string constant concatenation. Most of the work handed off to cpp_interpret_string. Call fix_string_type here. * c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with FUNC_NAME, throughout. (OBJC_STRING): New token type. (primary:STRING): No need to call fix_string_type here. (primary:objc_string): Make that OBJC_STRING. (objc_string nonterminal): Delete. (yylexname): Delete code to handle fake string constants. (yylexstring): Delete entirely. (_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need to handle CPP_ATSIGN. * c.opt (-fexec-charset=, -fwide-exec-charset=): New options. * c-opts.c (missing_arg, c_common_handle_option): Handle OPT_fexec_charset_ and OPT_fwide_exec_charset_. (c_common_init): Set cpp_opts->bytes_big_endian, not cpp_opts->EBCDIC. Call cpp_init_iconv. (print_help): Document -fexec-charset= and -fexec-wide-charset=. (TARGET_EBCDIC): Delete default definition. * objc/objc-act.c (build_objc_string_object): No need to handle string constant concatenation. cp: * parser.c (cp_lexer_read_token): No need to handle string constant concatenation. testsuite: * gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c everywhere. * gcc.dg/concat.c: Concatenation of string constants with __FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error. * gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp. * gcc.dg/cpp/escape-2.c: Use wide character constants where necessary to avoid multi-character character constant warning. * gcc.dg/cpp/escape.c: Likewise. * gcc.dg/cpp/ucs.c: Likewise. Remove backslashes from dg-bogus comments, as they confuse Tcl. Fix a typo. libstdc++-v3: * testsuite/22_locale/collate/compare/wchar_t/2.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/hash/wchar_t/2.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/transform/wchar_t/2.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc: XFAIL on all targets. From-SVN: r68952
2003-07-05 00:24:00 +00:00 · 2003-07-05 00:24:00 +00:00 · e6cc3a24c2
commit e6cc3a24c2
parent 61aeb06fe5
40 changed files with 2208 additions and 1441 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,88 @@
+2003-07-04  Zack Weinberg  <zack@codesourcery.com>
+
+	* cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types.
+	(struct cpp_options): Add narrow_charset, wide_charset,
+	bytes_big_endian fields.  Remove EBCDIC field.
+	(cpp_init_iconv, cpp_interpret_string): New external interfaces.
+
+	* cpphash.h: Include <iconv.h> if we have it, otherwise
+	provide a dummy definition of iconv_t.
+	(struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields.
+	(_cpp_valid_ucn): Update prototype.
+	(_cpp_destroy_iconv): New prototype.
+
+	* doc/cpp.texi: Document character set handling.
+	* doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=.
+	* doc/extend.texi: Delete entire section on multiline strings.
+	Rewrite section on __FUNCTION__ etc now that these are
+	variables in C.
+
+	* cppucnid.tab, cppucnid.pl: New files.
+	* cppucnid.h: New generated file.
+	* cppcharset.c: Include cppucnid.h.  Lots of commentary added.
+	(iconv_open, iconv, iconv_close): Provide dummy definitions
+	if !HAVE_ICONV.
+	(SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv,
+	_cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn,
+	emit_numeric_escape, convert_hex, convert_oct, convert_escape,
+	cpp_interpret_string, narrow_str_to_charconst,
+	wide_str_to_charconst): New.
+	(ucn_valid_in_identifier): Use a binary search through the
+	ucnranges table defined in cppucnid.h, not a long chain of if
+	statements.
+	(_cpp_valid_ucn): Add a limit pointer.  Downgrade "universal
+	character names are only valid in C++ and C99" to a warning.
+	Issue the "meaning of \[uU] is different in traditional C"
+	warning here.  Take care not to let iconv see an invalid UCS
+	value if we get a malformed UCN.  Issue an error if we don't
+	have iconv.
+	(cpp_interpret_charconst): Moved here from cpplex.c.  Use
+	cpp_interpret_string to do the heavy lifting.
+
+	* cppinit.c (cpp_create_reader): Initialize bytes_big_endian,
+	narrow_charset, wide_charset fields of options structure.
+	(cpp_destroy): Call _cpp_destroy_iconv.
+	* cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn.
+	(maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete.
+	(cpp_interpret_charconst): Moved to cppcharset.c.
+	* cpplib.c (dequote_string): Delete.
+	(interpret_string_notranslate): New.
+	(do_line, do_linemarker): Use interpret_string_notranslate.
+
+	* Makefile.in (cppcharset.o): Depend on cppucnid.h.
+
+	* c-common.c (fname_string, combine_strings): Delete.
+	* c-common.h (fname_string, combine_strings): Delete prototypes.
+	* c-lex.c (ignore_escape_flag): Delete.
+	(cb_ident): Use cpp_interpret_string, not lex_string.
+	(get_nonpadding_token): New function.
+	(c_lex): Handle Objective-C @-prefixed identifiers and strings here.
+	Adjust calls to lex_string.  Don't write *value twice.
+	(lex_string): Now handles string constant concatenation.
+	Most of the work handed off to cpp_interpret_string.
+	Call fix_string_type here.
+	* c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with
+	FUNC_NAME, throughout.
+	(OBJC_STRING): New token type.
+	(primary:STRING): No need to call fix_string_type here.
+	(primary:objc_string): Make that OBJC_STRING.
+	(objc_string nonterminal): Delete.
+	(yylexname): Delete code to handle fake string constants.
+	(yylexstring): Delete entirely.
+	(_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING.  No need
+	to handle CPP_ATSIGN.
+
+	* c.opt (-fexec-charset=, -fwide-exec-charset=): New options.
+	* c-opts.c (missing_arg, c_common_handle_option): Handle
+	OPT_fexec_charset_ and OPT_fwide_exec_charset_.
+	(c_common_init): Set cpp_opts->bytes_big_endian, not
+	cpp_opts->EBCDIC.  Call cpp_init_iconv.
+	(print_help): Document -fexec-charset= and -fexec-wide-charset=.
+	(TARGET_EBCDIC): Delete default definition.
+
+	* objc/objc-act.c (build_objc_string_object): No need to
+	handle string constant concatenation.
+
 2003-07-04  Kazu Hirata  <kazu@cs.umass.edu>

 	* doc/install.texi: Fix typos.
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@ -2351,7 +2351,7 @@ libcpp.a: $(LIBCPP_OBJS)
 	$(AR) $(AR_FLAGS) libcpp.a $(LIBCPP_OBJS)
 	-$(RANLIB) libcpp.a

-cppcharset.o: cppcharset.c $(LIBCPP_DEPS)
+cppcharset.o: cppcharset.c $(LIBCPP_DEPS) cppucnid.h
 cpperror.o: cpperror.c $(LIBCPP_DEPS)
 cppexp.o:   cppexp.c   $(LIBCPP_DEPS)
 cpplex.o:   cpplex.c   $(LIBCPP_DEPS)
--- a/gcc/c-common.c
+++ b/gcc/c-common.c
@ -1084,20 +1084,6 @@ fname_as_string (int pretty_p)
  return name;
 }

-/* Return the text name of the current function, formatted as
-   required by the supplied RID value.  */
-
-const char *
-fname_string (unsigned int rid)
-{
-  unsigned ix;
-
-  for (ix = 0; fname_vars[ix].decl; ix++)
-    if (fname_vars[ix].rid == rid)
-      break;
-  return fname_as_string (fname_vars[ix].pretty);
-}
-
 /* Return the VAR_DECL for a const char array naming the current
   function. If the VAR_DECL has not yet been created, create it
   now. RID indicates how it should be formatted and IDENTIFIER_NODE
@ -1190,111 +1176,6 @@ fix_string_type (tree value)
  TREE_STATIC (value) = 1;
  return value;
 }
-
-/* Given a VARRAY of STRING_CST nodes, concatenate them into one
-   STRING_CST.  */
-
-tree
-combine_strings (varray_type strings)
-{
-  const int wchar_bytes = TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT;
-  const int nstrings = VARRAY_ACTIVE_SIZE (strings);
-  tree value, t;
-  int length = 1;
-  int wide_length = 0;
-  int wide_flag = 0;
-  int i;
-  char *p, *q;
-
-  /* Don't include the \0 at the end of each substring.  Count wide
-     strings and ordinary strings separately.  */
-  for (i = 0; i < nstrings; ++i)
-    {
-      t = VARRAY_TREE (strings, i);
-
-      if (TREE_TYPE (t) == wchar_array_type_node)
-	{
-	  wide_length += TREE_STRING_LENGTH (t) - wchar_bytes;
-	  wide_flag = 1;
-	}
-      else
-	{
-	  length += (TREE_STRING_LENGTH (t) - 1);
-	  if (C_ARTIFICIAL_STRING_P (t) && !in_system_header)
-	    warning ("concatenation of string literals with __FUNCTION__ is deprecated");
-	}
-    }
-
-  /* If anything is wide, the non-wides will be converted,
-     which makes them take more space.  */
-  if (wide_flag)
-    length = length * wchar_bytes + wide_length;
-
-  p = xmalloc (length);
-
-  /* Copy the individual strings into the new combined string.
-     If the combined string is wide, convert the chars to ints
-     for any individual strings that are not wide.  */
-
-  q = p;
-  for (i = 0; i < nstrings; ++i)
-    {
-      int len, this_wide;
-
-      t = VARRAY_TREE (strings, i);
-      this_wide = TREE_TYPE (t) == wchar_array_type_node;
-      len = TREE_STRING_LENGTH (t) - (this_wide ? wchar_bytes : 1);
-      if (this_wide == wide_flag)
-	{
-	  memcpy (q, TREE_STRING_POINTER (t), len);
-	  q += len;
-	}
-      else
-	{
-	  const int nzeros = (TYPE_PRECISION (wchar_type_node)
-			      / BITS_PER_UNIT) - 1;
-	  int j, k;
-
-	  if (BYTES_BIG_ENDIAN)
-	    {
-	      for (k = 0; k < len; k++)
-		{
-		  for (j = 0; j < nzeros; j++)
-		    *q++ = 0;
-		  *q++ = TREE_STRING_POINTER (t)[k];
-		}
-	    }
-	  else
-	    {
-	      for (k = 0; k < len; k++)
-		{
-		  *q++ = TREE_STRING_POINTER (t)[k];
-		  for (j = 0; j < nzeros; j++)
-		    *q++ = 0;
-		}
-	    }
-	}
-    }
-
-  /* Nul terminate the string.  */
-  if (wide_flag)
-    {
-      for (i = 0; i < wchar_bytes; i++)
-	*q++ = 0;
-    }
-  else
-    *q = 0;
-
-  value = build_string (length, p);
-  free (p);
-
-  if (wide_flag)
-    TREE_TYPE (value) = wchar_array_type_node;
-  else
-    TREE_TYPE (value) = char_array_type_node;
-
-  return value;
-}

 static int is_valid_printf_arglist (tree);
 static rtx c_expand_builtin (tree, rtx, enum machine_mode,
--- a/gcc/c-common.h
+++ b/gcc/c-common.h
@ -883,7 +883,6 @@ extern void start_fname_decls (void);
 extern void finish_fname_decls (void);
 extern const char *fname_as_string (int);
 extern tree fname_decl (unsigned, tree);
-extern const char *fname_string (unsigned);

 extern void check_function_arguments (tree, tree);
 extern void check_function_arguments_recurse (void (*)
@ -922,7 +921,6 @@ extern void c_expand_end_cond (void);
 extern tree check_case_value (tree);
 extern tree fix_string_type (tree);
 struct varray_head_tag;
-extern tree combine_strings (struct varray_head_tag *);
 extern void constant_expression_warning (tree);
 extern tree convert_and_check (tree, tree);
 extern void overflow_warning (tree);
--- a/gcc/c-lex.c
+++ b/gcc/c-lex.c
@ -61,16 +61,13 @@ static splay_tree file_info_tree;
 int pending_lang_change; /* If we need to switch languages - C++ only */
 int c_header_level;	 /* depth in C headers - C++ only */

-/* Nonzero tells yylex to ignore \ in string constants.  */
-static int ignore_escape_flag;
-
 static tree interpret_integer (const cpp_token *, unsigned int);
 static tree interpret_float (const cpp_token *, unsigned int);
 static enum integer_type_kind
  narrowest_unsigned_type (tree, unsigned int);
 static enum integer_type_kind
  narrowest_signed_type (tree, unsigned int);
-static tree lex_string (const cpp_string *);
+static enum cpp_ttype lex_string (const cpp_token *, tree *, bool);
 static tree lex_charconst (const cpp_token *);
 static void update_header_times (const char *);
 static int dump_one_header (splay_tree_node, void *);
@ -184,8 +181,12 @@ cb_ident (cpp_reader *pfile ATTRIBUTE_UNUSED,
  if (! flag_no_ident)
    {
      /* Convert escapes in the string.  */
-      tree value ATTRIBUTE_UNUSED = lex_string (str);
-      ASM_OUTPUT_IDENT (asm_out_file, TREE_STRING_POINTER (value));
+      cpp_string cstr = { 0, 0 };
+      if (cpp_interpret_string (pfile, str, 1, &cstr, false))
+	{
+	  ASM_OUTPUT_IDENT (asm_out_file, cstr.text);
+	  free ((void *)cstr.text);
+	}
    }
 #endif
 }
@ -296,12 +297,10 @@ cb_undef (cpp_reader *pfile ATTRIBUTE_UNUSED, unsigned int line,
 			 (const char *) NODE_NAME (node));
 }

-int
-c_lex (tree *value)
+static inline const cpp_token *
+get_nonpadding_token (void)
 {
  const cpp_token *tok;
-
- retry:
  timevar_push (TV_CPP);
  do
    tok = cpp_get_token (parse_in);
@ -310,10 +309,22 @@ c_lex (tree *value)

  /* The C++ front end does horrible things with the current line
     number.  To ensure an accurate line number, we must reset it
-     every time we return a token.  */
+     every time we advance a token.  */
  input_line = src_lineno;

-  *value = NULL_TREE;
+  return tok;
+}  
+
+int
+c_lex (tree *value)
+{
+  const cpp_token *tok;
+  location_t atloc;
+
+ retry:
+  tok = get_nonpadding_token ();
+
+ retry_after_at:
  switch (tok->type)
    {
    case CPP_NAME:
@ -345,6 +356,37 @@ c_lex (tree *value)
      }
      break;

+    case CPP_ATSIGN:
+      /* An @ may give the next token special significance in Objective-C.  */
+      atloc = input_location;
+      tok = get_nonpadding_token ();
+      if (c_dialect_objc ())
+	{
+	  tree val;
+	  switch (tok->type)
+	    {
+	    case CPP_NAME:
+	      val = HT_IDENT_TO_GCC_IDENT (HT_NODE (tok->val.node));
+	      if (C_IS_RESERVED_WORD (val)
+		  && OBJC_IS_AT_KEYWORD (C_RID_CODE (val)))
+		{
+		  *value = val;
+		  return CPP_AT_NAME;
+		}
+	      break;
+
+	    case CPP_STRING:
+	    case CPP_WSTRING:
+	      return lex_string (tok, value, true);
+
+	    default: break;
+	    }
+	}
+
+      /* ... or not.  */
+      error ("%Hstray '@' in program", &atloc);
+      goto retry_after_at;
+
    case CPP_OTHER:
      {
 	cppchar_t c = tok->val.str.text[0];
@ -365,7 +407,7 @@ c_lex (tree *value)

    case CPP_STRING:
    case CPP_WSTRING:
-      *value = lex_string (&tok->val.str);
+      return lex_string (tok, value, false);
      break;

      /* These tokens should not be visible outside cpplib.  */
@ -374,7 +416,9 @@ c_lex (tree *value)
    case CPP_MACRO_ARG:
      abort ();

-    default: break;
+    default:
+      *value = NULL_TREE;
+      break;
    }

  return tok->type;
@ -571,75 +615,100 @@ interpret_float (const cpp_token *token, unsigned int flags)
  return value;
 }

-static tree
-lex_string (const cpp_string *str)
+/* Convert a series of STRING and/or WSTRING tokens into a tree,
+   performing string constant concatenation.  TOK is the first of
+   these.  VALP is the location to write the string into.  OBJC_STRING
+   indicates whether an '@' token preceded the incoming token.
+   Returns the CPP token type of the result (CPP_STRING, CPP_WSTRING,
+   or CPP_OBJC_STRING).
+
+   This is unfortunately more work than it should be.  If any of the
+   strings in the series has an L prefix, the result is a wide string
+   (6.4.5p4).  Whether or not the result is a wide string affects the
+   meaning of octal and hexadecimal escapes (6.4.4.4p6,9).  But escape
+   sequences do not continue across the boundary between two strings in
+   a series (6.4.5p7), so we must not lose the boundaries.  Therefore
+   cpp_interpret_string takes a vector of cpp_string structures, which
+   we must arrange to provide.  */
+
+static enum cpp_ttype
+lex_string (const cpp_token *tok, tree *valp, bool objc_string)
 {
-  bool wide;
  tree value;
-  char *buf, *q;
-  cppchar_t c;
-  const unsigned char *p, *limit;
+  bool wide = false;
+  size_t count = 1;
+  struct obstack str_ob;
+  cpp_string istr;

-  wide = str->text[0] == 'L';
-  p = str->text + 1 + wide;
-  limit = str->text + str->len - 1;
-  q = buf = alloca ((str->len + 1) * (wide ? WCHAR_BYTES : 1));
+  /* Try to avoid the overhead of creating and destroying an obstack
+     for the common case of just one string.  */
+  cpp_string str = tok->val.str;
+  cpp_string *strs = &str;

-  while (p < limit)
+  if (tok->type == CPP_WSTRING)
+    wide = true;
+
+  tok = get_nonpadding_token ();
+  if (c_dialect_objc () && tok->type == CPP_ATSIGN)
    {
-      c = *p++;
+      objc_string = true;
+      tok = get_nonpadding_token ();
+    }
+  if (tok->type == CPP_STRING || tok->type == CPP_WSTRING)
+    {
+      gcc_obstack_init (&str_ob);
+      obstack_grow (&str_ob, &str, sizeof (cpp_string));

-      if (c == '\\' && !ignore_escape_flag)
-	c = cpp_parse_escape (parse_in, &p, limit, wide);
+      do
+	{
+	  count++;
+	  if (tok->type == CPP_WSTRING)
+	    wide = true;
+	  obstack_grow (&str_ob, &tok->val.str, sizeof (cpp_string));
 	  
-      /* Add this single character into the buffer either as a wchar_t,
-	 a multibyte sequence, or as a single byte.  */
+	  tok = get_nonpadding_token ();
+	  if (c_dialect_objc () && tok->type == CPP_ATSIGN)
+	    {
+	      objc_string = true;
+	      tok = get_nonpadding_token ();
+	    }
+	}
+      while (tok->type == CPP_STRING || tok->type == CPP_WSTRING);
+      strs = obstack_finish (&str_ob);
+    }
+
+  /* We have read one more token than we want.  */
+  _cpp_backup_tokens (parse_in, 1);
+
+  if (count > 1 && !objc_string && warn_traditional && !in_system_header)
+    warning ("traditional C rejects string constant concatenation");
+
+  if (cpp_interpret_string (parse_in, strs, count, &istr, wide))
+    {
+      value = build_string (istr.len, (char *)istr.text);
+      free ((void *)istr.text);
+    }
+  else
+    {
+      /* Callers cannot generally handle error_mark_node in this context,
+	 so return the empty string instead.  cpp_interpret_string has
+	 issued an error.  */
      if (wide)
-	{
-	  unsigned charwidth = TYPE_PRECISION (char_type_node);
-	  unsigned bytemask = (1 << charwidth) - 1;
-	  int byte;
-
-	  for (byte = 0; byte < WCHAR_BYTES; ++byte)
-	    {
-	      int n;
-	      if (byte >= (int) sizeof (c))
-		n = 0;
+	value = build_string (TYPE_PRECISION (wchar_type_node)
+			      / TYPE_PRECISION (char_type_node),
+			      "\0\0\0");  /* widest supported wchar_t
+					     is 32 bits */
      else
-		n = (c >> (byte * charwidth)) & bytemask;
-	      if (BYTES_BIG_ENDIAN)
-		q[WCHAR_BYTES - byte - 1] = n;
-	      else
-		q[byte] = n;
-	    }
-	  q += WCHAR_BYTES;
-	}
-      else
-	{
-	  *q++ = c;
-	}
+	value = build_string (1, "");
    }

-  /* Terminate the string value, either with a single byte zero
-     or with a wide zero.  */
+  TREE_TYPE (value) = wide ? wchar_array_type_node : char_array_type_node;
+  *valp = fix_string_type (value);

-  if (wide)
-    {
-      memset (q, 0, WCHAR_BYTES);
-      q += WCHAR_BYTES;
-    }
-  else
-    {
-      *q++ = '\0';
-    }
+  if (strs != &str)
+    obstack_free (&str_ob, 0);

-  value = build_string (q - buf, buf);
-
-  if (wide)
-    TREE_TYPE (value) = wchar_array_type_node;
-  else
-    TREE_TYPE (value) = char_array_type_node;
-  return value;
+  return objc_string ? CPP_OBJC_STRING : wide ? CPP_WSTRING : CPP_STRING;
 }

 /* Converts a (possibly wide) character constant token into a tree.  */
--- a/gcc/c-opts.c
+++ b/gcc/c-opts.c
@ -46,10 +46,6 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
 # define TARGET_SYSTEM_ROOT NULL
 #endif

-#ifndef TARGET_EBCDIC
-# define TARGET_EBCDIC 0
-#endif
-
 static int saved_lineno;

 /* CPP's options.  */
@ -143,6 +139,8 @@ missing_arg (enum opt_code code)
    case OPT_fdump_:
    case OPT_fname_mangling_version_:
    case OPT_ftabstop_:
+    case OPT_fexec_charset_:
+    case OPT_fwide_exec_charset_:
    case OPT_ftemplate_depth_:
    case OPT_iprefix:
    case OPT_iwithprefix:
@ -892,6 +890,14 @@ c_common_handle_option (size_t scode, const char *arg, int value)
 	cpp_opts->tabstop = value;
      break;

+    case OPT_fexec_charset_:
+      cpp_opts->narrow_charset = arg;
+      break;
+
+    case OPT_fwide_exec_charset_:
+      cpp_opts->wide_charset = arg;
+      break;
+
    case OPT_ftemplate_depth_:
      max_tinst_depth = value;
      break;
@ -1145,7 +1151,11 @@ c_common_init (void)
  cpp_opts->int_precision = TYPE_PRECISION (integer_type_node);
  cpp_opts->wchar_precision = TYPE_PRECISION (wchar_type_node);
  cpp_opts->unsigned_wchar = TREE_UNSIGNED (wchar_type_node);
-  cpp_opts->EBCDIC = TARGET_EBCDIC;
+  cpp_opts->bytes_big_endian = BYTES_BIG_ENDIAN;
+
+  /* This can't happen until after wchar_precision and bytes_big_endian
+     are known.  */
+  cpp_init_iconv (parse_in);

  if (flag_preprocess_only)
    {
@ -1571,6 +1581,12 @@ Switches:\n\
  fputs (_("\
  -f[no-]preprocessed       Treat the input file as already preprocessed\n\
  -ftabstop=<number>        Distance between tab stops for column reporting\n\
+  -ftarget-charset=<c>      Convert all strings and character constants\n\
+                            to character set <c>\n\
+  -ftarget-wide-charset=<c> Convert all wide strings and character constants\n\
+                            to character set <c>\n\
+"), stdout);
+  fputs (_("\
  -isysroot <dir>           Set <dir> to be the system root directory\n\
  -P                        Do not generate #line directives\n\
  -remap                    Remap file names when including files\n\
--- a/gcc/c-parse.in
+++ b/gcc/c-parse.in
@ -151,9 +151,7 @@ do {									\
 %token ATTRIBUTE EXTENSION LABEL
 %token REALPART IMAGPART VA_ARG CHOOSE_EXPR TYPES_COMPATIBLE_P
 %token PTR_VALUE PTR_BASE PTR_EXTENT
-
-/* function name can be a string const or a var decl. */
-%token STRING_FUNC_NAME VAR_FUNC_NAME
+%token FUNC_NAME

 /* Add precedence rules to solve dangling else s/r conflict */
 %nonassoc IF
@ -183,6 +181,7 @@ do {									\
   Objective C, so that the token codes are the same in both.  */
 %token INTERFACE IMPLEMENTATION END SELECTOR DEFS ENCODE
 %token CLASSNAME PUBLIC PRIVATE PROTECTED PROTOCOL OBJECTNAME CLASS ALIAS
+%token OBJC_STRING

 %type <code> unop
 %type <ttype> ENUM STRUCT UNION IF ELSE WHILE DO FOR SWITCH CASE DEFAULT
@ -249,9 +248,9 @@ ifobjc
 %type <ttype> keywordexpr keywordarglist keywordarg
 %type <ttype> myparms myparm optparmlist reservedwords objcselectorexpr
 %type <ttype> selectorarg keywordnamelist keywordname objcencodeexpr
-%type <ttype> objc_string non_empty_protocolrefs protocolrefs identifier_list objcprotocolexpr
+%type <ttype> non_empty_protocolrefs protocolrefs identifier_list objcprotocolexpr

-%type <ttype> CLASSNAME OBJECTNAME
+%type <ttype> CLASSNAME OBJECTNAME OBJC_STRING
 end ifobjc

 %{
@ -340,7 +339,6 @@ static bool parsing_iso_function_signature;
 static void yyprint	  PARAMS ((FILE *, int, YYSTYPE));
 static void yyerror	  PARAMS ((const char *));
 static int yylexname	  PARAMS ((void));
-static int yylexstring	  PARAMS ((void));
 static inline int _yylex  PARAMS ((void));
 static int  yylex	  PARAMS ((void));
 static void init_reswords PARAMS ((void));
@ -657,8 +655,7 @@ primary:
 		}
 	| CONSTANT
 	| STRING
-		{ $$ = fix_string_type ($$); }
-	| VAR_FUNC_NAME
+	| FUNC_NAME
 		{ $$ = fname_decl (C_RID_CODE ($$), $$); }
 	| '(' typename ')' '{'
 		{ start_init (NULL_TREE, NULL, 0);
@ -763,22 +760,11 @@ ifobjc
 		{ $$ = build_protocol_expr ($1); }
 	| objcencodeexpr
 		{ $$ = build_encode_expr ($1); }
-	| objc_string
+	| OBJC_STRING
 		{ $$ = build_objc_string_object ($1); }
 end ifobjc
 	;

-ifobjc
-/* Produces an STRING_CST with perhaps more STRING_CSTs chained
-   onto it, which is to be read as an ObjC string object.  */
-objc_string:
-	  '@' STRING
-		{ $$ = $2; }
-	| objc_string '@' STRING
-		{ $$ = chainon ($1, $3); }
-	;
-end ifobjc
-
 old_style_parm_decls:
 	old_style_parm_decls_1
 	{
@ -3494,9 +3480,9 @@ static const short rid_to_yy[RID_MAX] =
  /* RID_CHOOSE_EXPR */			CHOOSE_EXPR,
  /* RID_TYPES_COMPATIBLE_P */		TYPES_COMPATIBLE_P,

-  /* RID_FUNCTION_NAME */		STRING_FUNC_NAME,
-  /* RID_PRETTY_FUNCTION_NAME */	STRING_FUNC_NAME,
-  /* RID_C99_FUNCTION_NAME */		VAR_FUNC_NAME,
+  /* RID_FUNCTION_NAME */		FUNC_NAME,
+  /* RID_PRETTY_FUNCTION_NAME */	FUNC_NAME,
+  /* RID_C99_FUNCTION_NAME */		FUNC_NAME,

  /* C++ */
  /* RID_BOOL */	TYPESPEC,
@ -3627,22 +3613,9 @@ ifobjc
 	  && (!OBJC_IS_PQ_KEYWORD (rid_code) || objc_pq_context))
 end ifobjc
      {
-	int yycode = rid_to_yy[(int) rid_code];
-	if (yycode == STRING_FUNC_NAME)
-	  {
-	    /* __FUNCTION__ and __PRETTY_FUNCTION__ get converted
-	       to string constants.  */
-	    const char *name = fname_string (rid_code);
-
-	    yylval.ttype = build_string (strlen (name) + 1, name);
-	    C_ARTIFICIAL_STRING_P (yylval.ttype) = 1;
-	    last_token = CPP_STRING;  /* so yyerror won't choke */
-	    return STRING;
-	  }
-
 	/* Return the canonical spelling for this keyword.  */
 	yylval.ttype = ridpointers[(int) rid_code];
-	return yycode;
+	return rid_to_yy[(int) rid_code];
      }
    }

@ -3671,57 +3644,6 @@ end ifobjc
  return IDENTIFIER;
 }

-/* Concatenate strings before returning them to the parser.  This isn't quite
-   as good as having it done in the lexer, but it's better than nothing.  */
-
-static int
-yylexstring ()
-{
-  enum cpp_ttype next_type;
-  tree orig = yylval.ttype;
-
-  next_type = c_lex (&yylval.ttype);
-  if (next_type == CPP_STRING
-      || next_type == CPP_WSTRING
-      || (next_type == CPP_NAME && yylexname () == STRING))
-    {
-      varray_type strings;
-
-ifc
-      static location_t last_location;
-      if (warn_traditional && !in_system_header
-	  && (input_location.line != last_location.line
-	      || !last_location.file ||
-	      strcmp (last_location.file, input_location.file)))
-	{
-	  warning ("traditional C rejects string concatenation");
-	  last_location = input_location;
-	}
-end ifc
-
-      VARRAY_TREE_INIT (strings, 32, "strings");
-      VARRAY_PUSH_TREE (strings, orig);
-
-      do
-	{
-	  VARRAY_PUSH_TREE (strings, yylval.ttype);
-	  next_type = c_lex (&yylval.ttype);
-	}
-      while (next_type == CPP_STRING
-	     || next_type == CPP_WSTRING
-	     || (next_type == CPP_NAME && yylexname () == STRING));
-
-      yylval.ttype = combine_strings (strings);
-    }
-  else
-    yylval.ttype = orig;
-
-  /* We will have always read one token too many.  */
-  _cpp_backup_tokens (parse_in, 1);
-
-  return STRING;
-}
-
 static inline int
 _yylex ()
 {
@ -3787,13 +3709,11 @@ _yylex ()
      return 0;

    case CPP_NAME:
-      {
-	int ret = yylexname ();
-	if (ret == STRING)
-	  return yylexstring ();
-	else
-	  return ret;
-      }
+      return yylexname ();
+
+    case CPP_AT_NAME:
+      /* This only happens in Objective-C; it must be a keyword.  */
+      return rid_to_yy [(int) C_RID_CODE (yylval.ttype)];

    case CPP_NUMBER:
    case CPP_CHAR:
@ -3802,30 +3722,10 @@ _yylex ()

    case CPP_STRING:
    case CPP_WSTRING:
-      return yylexstring ();
+      return STRING;
      
-      /* This token is Objective-C specific.  It gives the next token
-	 special significance.  */
-    case CPP_ATSIGN:
-ifobjc
-      {
-	tree after_at;
-	enum cpp_ttype after_at_type;
-
-	after_at_type = c_lex (&after_at);
-
-	if (after_at_type == CPP_NAME
-	    && C_IS_RESERVED_WORD (after_at)
-	    && OBJC_IS_AT_KEYWORD (C_RID_CODE (after_at)))
-	  {
-	    yylval.ttype = after_at;
-	    last_token = after_at_type;
-	    return rid_to_yy [(int) C_RID_CODE (after_at)];
-	  }
-	_cpp_backup_tokens (parse_in, 1);
-	return '@';
-      }
-end ifobjc
+    case CPP_OBJC_STRING:
+      return OBJC_STRING;

      /* These tokens are C++ specific (and will not be generated
         in C mode, but let's be cautious).  */
--- a/gcc/c.opt
+++ b/gcc/c.opt
@ -368,6 +368,9 @@ C++ ObjC++
 fenum-int-equiv
 C++ ObjC++

+fexec-charset=
+C ObjC C++ ObjC++ Joined RejectNegative
+
 fexternal-templates
 C++ ObjC++

@ -509,6 +512,9 @@ C++ ObjC++
 fweak
 C++ ObjC++

+fwide-exec-charset=
+C ObjC C++ ObjC++ Joined RejectNegative
+
 fxref
 C++ ObjC++

--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@ -1,3 +1,8 @@
+2003-07-04  Zack Weinberg  <zack@codesourcery.com>
+
+	* parser.c (cp_lexer_read_token): No need to handle string
+	constant concatenation.
+
 2003-07-03  Kaveh R. Ghazi  <ghazi@caip.rutgers.edu>

 	* cp-tree.h (GCC_DIAG_STYLE, ATTRIBUTE_GCC_CXXDIAG): Define.
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@ -479,54 +479,12 @@ cp_lexer_read_token (cp_lexer* lexer)
  /* Increment LAST_TOKEN.  */
  lexer->last_token = cp_lexer_next_token (lexer, token);

-  /* The preprocessor does not yet do translation phase six, i.e., the
-     combination of adjacent string literals.  Therefore, we do it
-     here.  */
-  if (token->type == CPP_STRING || token->type == CPP_WSTRING)
-    {
-      ptrdiff_t delta;
-      int i;
-
-      /* When we grow the buffer, we may invalidate TOKEN.  So, save
-	 the distance from the beginning of the BUFFER so that we can
-	 recaulate it.  */
-      delta = cp_lexer_token_difference (lexer, lexer->buffer, token);
-      /* Make sure there is room in the buffer for another token.  */
-      cp_lexer_maybe_grow_buffer (lexer);
-      /* Restore TOKEN.  */
-      token = lexer->buffer;
-      for (i = 0; i < delta; ++i)
-	token = cp_lexer_next_token (lexer, token);
-
-      VARRAY_PUSH_TREE (lexer->string_tokens, token->value);
-      while (true)
-	{
-	  /* Read the token after TOKEN.  */
-	  cp_lexer_get_preprocessor_token (lexer, lexer->last_token);
-	  /* See whether it's another string constant.  */
-	  if (lexer->last_token->type != token->type)
-	    {
-	      /* If not, then it will be the next real token.  */
-	      lexer->last_token = cp_lexer_next_token (lexer, 
-						       lexer->last_token);
-	      break;
-	    }
-
-	  /* Chain the strings together.  */
-	  VARRAY_PUSH_TREE (lexer->string_tokens, 
-			    lexer->last_token->value);
-	}
-
-      /* Create a single STRING_CST.  Curiously we have to call
-	 combine_strings even if there is only a single string in
-	 order to get the type set correctly.  */
-      token->value = combine_strings (lexer->string_tokens);
-      VARRAY_CLEAR (lexer->string_tokens);
-      token->value = fix_string_type (token->value);
  /* Strings should have type `const char []'.  Right now, we will
     have an ARRAY_TYPE that is constant rather than an array of
-	 constant elements.  */
-      if (flag_const_strings)
+     constant elements.
+     FIXME: Make fix_string_type get this right in the first place.  */
+  if ((token->type == CPP_STRING || token->type == CPP_WSTRING)
+      && flag_const_strings)
    {
      tree type;

@ -534,12 +492,10 @@ cp_lexer_read_token (cp_lexer* lexer)
      type = TREE_TYPE (token->value);
      /* Use build_cplus_array_type to rebuild the array, thereby
 	 getting the right type.  */
-	  type = build_cplus_array_type (TREE_TYPE (type),
-					 TYPE_DOMAIN (type));
+      type = build_cplus_array_type (TREE_TYPE (type), TYPE_DOMAIN (type));
      /* Reset the type of the token.  */
      TREE_TYPE (token->value) = type;
    }
-    }

  return token;
 }
--- a/gcc/cppcharset.c
+++ b/gcc/cppcharset.c
--- a/gcc/cpphash.h
+++ b/gcc/cpphash.h
@ -25,6 +25,13 @@ Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */

 #include "hashtable.h"

+#ifdef HAVE_ICONV
+#include <iconv.h>
+#else
+#define HAVE_ICONV 0
+typedef int iconv_t;  /* dummy */
+#endif
+
 struct directive;		/* Deliberately incomplete.  */
 struct pending_option;
 struct op;
@ -362,6 +369,15 @@ struct cpp_reader
  unsigned char *macro_buffer;
  unsigned int macro_buffer_len;

+  /* Iconv descriptor for converting from the source character set
+     to the execution character set.  (iconv_t)-1 for no conversion.  */
+  iconv_t narrow_cset_desc;
+
+  /* Iconv descriptor for converting from the execution character set
+     to the wide execution character set.  (iconv_t)-1 for no conversion
+     other than zero-extending each character to the width of wchar_t.  */
+  iconv_t wide_cset_desc;
+
  /* Tree of other included files.  See cppfiles.c.  */
  struct splay_tree_s *all_include_files;

@ -539,7 +555,8 @@ extern uchar *_cpp_copy_replacement_text (const cpp_macro *, uchar *);
 extern size_t _cpp_replacement_text_len (const cpp_macro *);

 /* In cppcharset.c.  */
-cppchar_t _cpp_valid_ucn (cpp_reader *, const uchar **, int identifer_p);
+cppchar_t _cpp_valid_ucn (cpp_reader *, const uchar **, const uchar *, int);
+void _cpp_destroy_iconv (cpp_reader *);

 /* Utility routines and macros.  */
 #define DSC(str) (const uchar *)str, sizeof str - 1
--- a/gcc/cppinit.c
+++ b/gcc/cppinit.c
@ -157,6 +157,11 @@ cpp_create_reader (enum c_lang lang, hash_table *table)
  CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
  CPP_OPTION (pfile, unsigned_char) = 0;
  CPP_OPTION (pfile, unsigned_wchar) = 1;
+  CPP_OPTION (pfile, bytes_big_endian) = 1;  /* does not matter */
+
+  /* Default to no charset conversion.  */
+  CPP_OPTION (pfile, narrow_charset) = 0;
+  CPP_OPTION (pfile, wide_charset) = 0;

  /* Initialize the line map.  Start at logical line 1, so we can use
     a line number of zero for special states.  */
@ -227,6 +232,7 @@ cpp_destroy (cpp_reader *pfile)

  _cpp_destroy_hashtable (pfile);
  _cpp_cleanup_includes (pfile);
+  _cpp_destroy_iconv (pfile);

  _cpp_free_buff (pfile->a_buff);
  _cpp_free_buff (pfile->u_buff);
--- a/gcc/cpplex.c
+++ b/gcc/cpplex.c
@ -64,10 +64,8 @@ static void create_literal (cpp_reader *, cpp_token *, const uchar *,
 			    unsigned int, enum cpp_ttype);
 static bool warn_in_comment (cpp_reader *, _cpp_line_note *);
 static int name_p (cpp_reader *, const cpp_string *);
-static cppchar_t maybe_read_ucn (cpp_reader *, const uchar **);
 static tokenrun *next_tokenrun (tokenrun *);

-static unsigned int hex_digit_value (unsigned int);
 static _cpp_buff *new_buff (size_t);


@ -397,7 +395,7 @@ forms_identifier_p (cpp_reader *pfile, int first)
      && (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
    {
      buffer->cur += 2;
-      if (_cpp_valid_ucn (pfile, &buffer->cur, 1 + !first))
+      if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first))
 	return true;
      buffer->cur -= 2;
    }
@ -1316,291 +1314,6 @@ cpp_output_line (cpp_reader *pfile, FILE *fp)
  putc ('\n', fp);
 }

-/* Returns the value of a hexadecimal digit.  */
-static unsigned int
-hex_digit_value (unsigned int c)
-{
-  if (hex_p (c))
-    return hex_value (c);
-  else
-    abort ();
-}
-
-/* Read a possible universal character name starting at *PSTR.  */
-static cppchar_t
-maybe_read_ucn (cpp_reader *pfile, const uchar **pstr)
-{
-  cppchar_t result, c = (*pstr)[-1];
-
-  result = _cpp_valid_ucn (pfile, pstr, false);
-  if (result)
-    {
-      if (CPP_WTRADITIONAL (pfile))
-	cpp_error (pfile, DL_WARNING,
-		   "the meaning of '\\%c' is different in traditional C",
-		   (int) c);
-
-      if (CPP_OPTION (pfile, EBCDIC))
-	{
-	  cpp_error (pfile, DL_ERROR,
-		     "universal character with an EBCDIC target");
-	  result = 0x3f;  /* EBCDIC invalid character */
-	}
-    }
-
-  return result;
-}
-
-/* Returns the value of an escape sequence, truncated to the correct
-   target precision.  PSTR points to the input pointer, which is just
-   after the backslash.  LIMIT is how much text we have.  WIDE is true
-   if the escape sequence is part of a wide character constant or
-   string literal.  Handles all relevant diagnostics.  */
-cppchar_t
-cpp_parse_escape (cpp_reader *pfile, const unsigned char **pstr,
-		  const unsigned char *limit, int wide)
-{
-  /* Values of \a \b \e \f \n \r \t \v respectively.  */
-  static const uchar ascii[]  = {  7,  8, 27, 12, 10, 13,  9, 11 };
-  static const uchar ebcdic[] = { 47, 22, 39, 12, 21, 13,  5, 11 };
-
-  int unknown = 0;
-  const unsigned char *str = *pstr, *charconsts;
-  cppchar_t c, ucn, mask;
-  unsigned int width;
-
-  if (CPP_OPTION (pfile, EBCDIC))
-    charconsts = ebcdic;
-  else
-    charconsts = ascii;
-
-  if (wide)
-    width = CPP_OPTION (pfile, wchar_precision);
-  else
-    width = CPP_OPTION (pfile, char_precision);
-  if (width < BITS_PER_CPPCHAR_T)
-    mask = ((cppchar_t) 1 << width) - 1;
-  else
-    mask = ~0;
-
-  c = *str++;
-  switch (c)
-    {
-    case '\\': case '\'': case '"': case '?': break;
-    case 'b': c = charconsts[1];  break;
-    case 'f': c = charconsts[3];  break;
-    case 'n': c = charconsts[4];  break;
-    case 'r': c = charconsts[5];  break;
-    case 't': c = charconsts[6];  break;
-    case 'v': c = charconsts[7];  break;
-
-    case '(': case '{': case '[': case '%':
-      /* '\(', etc, are used at beginning of line to avoid confusing Emacs.
-	 '\%' is used to prevent SCCS from getting confused.  */
-      unknown = CPP_PEDANTIC (pfile);
-      break;
-
-    case 'a':
-      if (CPP_WTRADITIONAL (pfile))
-	cpp_error (pfile, DL_WARNING,
-		   "the meaning of '\\a' is different in traditional C");
-      c = charconsts[0];
-      break;
-
-    case 'e': case 'E':
-      if (CPP_PEDANTIC (pfile))
-	cpp_error (pfile, DL_PEDWARN,
-		   "non-ISO-standard escape sequence, '\\%c'", (int) c);
-      c = charconsts[2];
-      break;
-
-    case 'u': case 'U':
-      ucn = maybe_read_ucn (pfile, &str);
-      if (ucn)
-	c = ucn;
-      else
-	unknown = true;
-      break;
-
-    case 'x':
-      if (CPP_WTRADITIONAL (pfile))
-	cpp_error (pfile, DL_WARNING,
-		   "the meaning of '\\x' is different in traditional C");
-
-      {
-	cppchar_t i = 0, overflow = 0;
-	int digits_found = 0;
-
-	while (str < limit)
-	  {
-	    c = *str;
-	    if (! ISXDIGIT (c))
-	      break;
-	    str++;
-	    overflow |= i ^ (i << 4 >> 4);
-	    i = (i << 4) + hex_digit_value (c);
-	    digits_found = 1;
-	  }
-
-	if (!digits_found)
-	  cpp_error (pfile, DL_ERROR,
-		       "\\x used with no following hex digits");
-
-	if (overflow | (i != (i & mask)))
-	  {
-	    cpp_error (pfile, DL_PEDWARN,
-		       "hex escape sequence out of range");
-	    i &= mask;
-	  }
-	c = i;
-      }
-      break;
-
-    case '0':  case '1':  case '2':  case '3':
-    case '4':  case '5':  case '6':  case '7':
-      {
-	size_t count = 0;
-	cppchar_t i = c - '0';
-
-	while (str < limit && ++count < 3)
-	  {
-	    c = *str;
-	    if (c < '0' || c > '7')
-	      break;
-	    str++;
-	    i = (i << 3) + c - '0';
-	  }
-
-	if (i != (i & mask))
-	  {
-	    cpp_error (pfile, DL_PEDWARN,
-		       "octal escape sequence out of range");
-	    i &= mask;
-	  }
-	c = i;
-      }
-      break;
-
-    default:
-      unknown = 1;
-      break;
-    }
-
-  if (unknown)
-    {
-      if (ISGRAPH (c))
-	cpp_error (pfile, DL_PEDWARN,
-		   "unknown escape sequence '\\%c'", (int) c);
-      else
-	cpp_error (pfile, DL_PEDWARN,
-		   "unknown escape sequence: '\\%03o'", (int) c);
-    }
-
-  if (c > mask)
-    {
-      cpp_error (pfile, DL_PEDWARN,
-		 "escape sequence out of range for its type");
-      c &= mask;
-    }
-
-  *pstr = str;
-  return c;
-}
-
-/* Interpret a (possibly wide) character constant in TOKEN.
-   WARN_MULTI warns about multi-character charconsts.  PCHARS_SEEN
-   points to a variable that is filled in with the number of
-   characters seen, and UNSIGNEDP to a variable that indicates whether
-   the result has signed type.  */
-cppchar_t
-cpp_interpret_charconst (cpp_reader *pfile, const cpp_token *token,
-			 unsigned int *pchars_seen, int *unsignedp)
-{
-  const unsigned char *str, *limit;
-  unsigned int chars_seen = 0;
-  size_t width, max_chars;
-  cppchar_t c, mask, result = 0;
-  bool unsigned_p;
-
-  str = token->val.str.text + 1 + (token->type == CPP_WCHAR);
-  limit = token->val.str.text + token->val.str.len - 1;
-
-  if (token->type == CPP_CHAR)
-    {
-      width = CPP_OPTION (pfile, char_precision);
-      max_chars = CPP_OPTION (pfile, int_precision) / width;
-      unsigned_p = CPP_OPTION (pfile, unsigned_char);
-    }
-  else
-    {
-      width = CPP_OPTION (pfile, wchar_precision);
-      max_chars = 1;
-      unsigned_p = CPP_OPTION (pfile, unsigned_wchar);
-    }
-
-  if (width < BITS_PER_CPPCHAR_T)
-    mask = ((cppchar_t) 1 << width) - 1;
-  else
-    mask = ~0;
-
-  while (str < limit)
-    {
-      c = *str++;
-
-      if (c == '\\')
-	c = cpp_parse_escape (pfile, &str, limit, token->type == CPP_WCHAR);
-
-#ifdef MAP_CHARACTER
-      if (ISPRINT (c))
-	c = MAP_CHARACTER (c);
-#endif
-
-      chars_seen++;
-
-      /* Truncate the character, scale the result and merge the two.  */
-      c &= mask;
-      if (width < BITS_PER_CPPCHAR_T)
-	result = (result << width) | c;
-      else
-	result = c;
-    }
-
-  if (chars_seen == 0)
-    cpp_error (pfile, DL_ERROR, "empty character constant");
-  else if (chars_seen > 1)
-    {
-      /* Multichar charconsts are of type int and therefore signed.  */
-      unsigned_p = 0;
-
-      if (chars_seen > max_chars)
-	{
-	  chars_seen = max_chars;
-	  cpp_error (pfile, DL_WARNING,
-		     "character constant too long for its type");
-	}
-      else if (CPP_OPTION (pfile, warn_multichar))
-	cpp_error (pfile, DL_WARNING, "multi-character character constant");
-    }
-
-  /* Sign-extend or truncate the constant to cppchar_t.  The value is
-     in WIDTH bits, but for multi-char charconsts it's value is the
-     full target type's width.  */
-  if (chars_seen > 1)
-    width *= max_chars;
-  if (width < BITS_PER_CPPCHAR_T)
-    {
-      mask = ((cppchar_t) 1 << width) - 1;
-      if (unsigned_p || !(result & (1 << (width - 1))))
-	result &= mask;
-      else
-	result |= ~mask;
-    }
-
-  *pchars_seen = chars_seen;
-  *unsignedp = unsigned_p;
-  return result;
-}
-
 /* Memory buffers.  Changing these three constants can have a dramatic
   effect on performance.  The values here are reasonable defaults,
   but might be tuned.  If you adjust them, be sure to test across a
--- a/gcc/cpplib.c
+++ b/gcc/cpplib.c
@ -106,7 +106,6 @@ static char *glue_header_name (cpp_reader *);
 static const char *parse_include (cpp_reader *, int *);
 static void push_conditional (cpp_reader *, int, int, const cpp_hashnode *);
 static unsigned int read_flag (cpp_reader *, unsigned int);
-static uchar *dequote_string (cpp_reader *, const uchar *, unsigned int);
 static int strtoul_for_line (const uchar *, unsigned int, unsigned long *);
 static void do_diagnostic (cpp_reader *, int, int);
 static cpp_hashnode *lex_macro_node (cpp_reader *);
@ -714,29 +713,6 @@ read_flag (cpp_reader *pfile, unsigned int last)
  return 0;
 }

-/* Subroutine of do_line and do_linemarker.  Returns a version of STR
-   which has a NUL terminator and all escape sequences converted to
-   their equivalents.  Temporary, hopefully.  */
-static uchar *
-dequote_string (cpp_reader *pfile, const uchar *str, unsigned int len)
-{
-  uchar *result = _cpp_unaligned_alloc (pfile, len + 1);
-  uchar *dst = result;
-  const uchar *limit = str + len;
-  cppchar_t c;
-
-  while (str < limit)
-    {
-      c = *str++;
-      if (c != '\\')
-	*dst++ = c;
-      else
-	*dst++ = cpp_parse_escape (pfile, &str, limit, 0);
-    }
-  *dst++ = '\0';
-  return result;
-}
-
 /* Subroutine of do_line and do_linemarker.  Convert a number in STR,
   of length LEN, to binary; store it in NUMP, and return 0 if the
   number was well-formed, 1 if not.  Temporary, hopefully.  */
@ -757,6 +733,21 @@ strtoul_for_line (const uchar *str, unsigned int len, long unsigned int *nump)
  return 0;
 }

+/* Subroutine of do_line and do_linemarker.  Convert escape sequences
+   in a string, but do not perform character set conversion.  */
+static bool
+interpret_string_notranslate (cpp_reader *pfile, const cpp_string *in,
+			      cpp_string *out)
+{
+  iconv_t save_narrow_cset_desc = pfile->narrow_cset_desc;
+  bool retval;
+
+  pfile->narrow_cset_desc = (iconv_t) -1;
+  retval = cpp_interpret_string (pfile, in, 1, out, false);
+  pfile->narrow_cset_desc = save_narrow_cset_desc;
+  return retval;
+}
+
 /* Interpret #line command.
   Note that the filename string (if any) is a true string constant
   (escapes are interpreted), unlike in #line.  */
@ -788,8 +779,9 @@ do_line (cpp_reader *pfile)
  token = cpp_get_token (pfile);
  if (token->type == CPP_STRING)
    {
-      new_file = (const char *) dequote_string (pfile, token->val.str.text + 1,
-						token->val.str.len - 2);
+      cpp_string s = { 0, 0 };
+      if (interpret_string_notranslate (pfile, &token->val.str, &s))
+	new_file = (const char *)s.text;
      check_eol (pfile);
    }
  else if (token->type != CPP_EOF)
@ -836,8 +828,10 @@ do_linemarker (cpp_reader *pfile)
  token = cpp_get_token (pfile);
  if (token->type == CPP_STRING)
    {
-      new_file = (const char *) dequote_string (pfile, token->val.str.text + 1,
-						token->val.str.len - 2);
+      cpp_string s = { 0, 0 };
+      if (interpret_string_notranslate (pfile, &token->val.str, &s))
+	new_file = (const char *)s.text;
+      
      new_sysp = 0;
      flag = read_flag (pfile, 0);
      if (flag == 1)
--- a/gcc/cpplib.h
+++ b/gcc/cpplib.h
@ -124,6 +124,7 @@ struct file_name_map_list;
  OP(CPP_ATSIGN,	"@")  /* used in Objective-C */ \
 \
  TK(CPP_NAME,		SPELL_IDENT)	/* word */			\
+  TK(CPP_AT_NAME,       SPELL_IDENT)    /* @word - Objective-C */       \
  TK(CPP_NUMBER,	SPELL_LITERAL)	/* 34_be+ta  */			\
 \
  TK(CPP_CHAR,		SPELL_LITERAL)	/* 'char' */			\
@ -132,6 +133,7 @@ struct file_name_map_list;
 \
  TK(CPP_STRING,	SPELL_LITERAL)	/* "string" */			\
  TK(CPP_WSTRING,	SPELL_LITERAL)	/* L"string" */			\
+  TK(CPP_OBJC_STRING,   SPELL_LITERAL)  /* @"string" - Objective-C */	\
  TK(CPP_HEADER_NAME,	SPELL_LITERAL)	/* <stdio.h> in #include */	\
 \
  TK(CPP_COMMENT,	SPELL_LITERAL)	/* Only if output comments.  */ \
@ -332,6 +334,12 @@ struct cpp_options
  /* True for traditional preprocessing.  */
  unsigned char traditional;

+  /* Holds the name of the target (execution) character set.  */
+  const char *narrow_charset;
+
+  /* Holds the name of the target wide character set.  */
+  const char *wide_charset;
+
  /* True to warn about precompiled header files we couldn't use.  */
  bool warn_invalid_pch;

@ -364,8 +372,9 @@ struct cpp_options
  /* True means chars (wide chars) are unsigned.  */
  bool unsigned_char, unsigned_wchar;

-  /* True if target is EBCDIC.  */
-  bool EBCDIC;
+  /* True if the most significant byte in a word has the lowest
+     address in memory.  */
+  bool bytes_big_endian;

  /* Nonzero means __STDC__ should have the value 0 in system headers.  */
  unsigned char stdc_0_in_system_headers;
@ -529,6 +538,9 @@ extern const char *cpp_read_main_file (cpp_reader *, const char *);
 /* Set up built-ins like __FILE__.  */
 extern void cpp_init_builtins (cpp_reader *, int);

+/* Set up translation to the target character set.  */
+extern void cpp_init_iconv (cpp_reader *);
+
 /* Call this to finish preprocessing.  If you requested dependency
   generation, pass an open stream to write the information to,
   otherwise NULL.  It is your responsibility to close the stream.
@ -560,6 +572,10 @@ extern void _cpp_backup_tokens (cpp_reader *, unsigned int);
 /* Evaluate a CPP_CHAR or CPP_WCHAR token.  */
 extern cppchar_t cpp_interpret_charconst (cpp_reader *, const cpp_token *,
 					  unsigned int *, int *);
+/* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens.  */
+extern bool cpp_interpret_string (cpp_reader *,
+				  const cpp_string *, size_t,
+				  cpp_string *, bool);

 /* Used to register macros and assertions, perhaps from the command line.
   The text is the same as the command line argument.  */
--- a/gcc/cppucnid.h
+++ b/gcc/cppucnid.h
@ -0,0 +1,336 @@
+/* Table of UCNs which are valid in identifiers.
+   Copyright (C) 2003 Free Software Foundation, Inc.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+/* Automatically generated from cppucnid.tab, do not edit */
+
+/* This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
+   D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
+   the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
+   a reproduction of ISO/IEC PDTR 10176.  Unfortunately these tables
+   are not identical.  */
+
+#ifndef CPPUCNID_H
+#define CPPUCNID_H
+
+#define C99 1
+#define CXX 2
+#define DIG 4
+
+struct ucnrange
+{
+  unsigned short lo, hi;
+  unsigned short flags;
+};
+
+static const struct ucnrange ucnranges[] = {
+  { 0x00aa, 0x00aa,     C99     },  /* Latin */
+  { 0x00b5, 0x00b5,     C99     },  /* Special characters */
+  { 0x00b7, 0x00b7,     C99     },
+  { 0x00ba, 0x00ba,     C99     },  /* Latin */
+  { 0x00c0, 0x00d6, CXX|C99     },
+  { 0x00d8, 0x00f6, CXX|C99     },
+  { 0x00f8, 0x01f5, CXX|C99     },
+  { 0x01fa, 0x0217, CXX|C99     },
+  { 0x0250, 0x02a8, CXX|C99     },
+  { 0x02b0, 0x02b8,     C99     },  /* Special characters */
+  { 0x02bb, 0x02bb,     C99     },
+  { 0x02bd, 0x02c1,     C99     },
+  { 0x02d0, 0x02d1,     C99     },
+  { 0x02e0, 0x02e4,     C99     },
+  { 0x037a, 0x037a,     C99     },
+  { 0x0384, 0x0384, CXX         },  /* Greek */
+  { 0x0386, 0x0386,     C99     },
+  { 0x0388, 0x038a, CXX|C99     },
+  { 0x038c, 0x038c, CXX|C99     },
+  { 0x038e, 0x03a1, CXX|C99     },
+  { 0x03a3, 0x03ce, CXX|C99     },
+  { 0x03d0, 0x03d6, CXX|C99     },
+  { 0x03da, 0x03da, CXX|C99     },
+  { 0x03dc, 0x03dc, CXX|C99     },
+  { 0x03de, 0x03de, CXX|C99     },
+  { 0x03e0, 0x03e0, CXX|C99     },
+  { 0x03e2, 0x03f3, CXX|C99     },
+  { 0x0401, 0x040c, CXX|C99     },  /* Cyrillic */
+  { 0x040d, 0x040d, CXX         },
+  { 0x040e, 0x040e,     C99     },
+  { 0x040f, 0x044f, CXX|C99     },
+  { 0x0451, 0x045c, CXX|C99     },
+  { 0x045e, 0x0481, CXX|C99     },
+  { 0x0490, 0x04c4, CXX|C99     },
+  { 0x04c7, 0x04c8, CXX|C99     },
+  { 0x04cb, 0x04cc, CXX|C99     },
+  { 0x04d0, 0x04eb, CXX|C99     },
+  { 0x04ee, 0x04f5, CXX|C99     },
+  { 0x04f8, 0x04f9, CXX|C99     },
+  { 0x0531, 0x0556, CXX|C99     },  /* Armenian */
+  { 0x0559, 0x0559,     C99     },  /* Special characters */
+  { 0x0561, 0x0587, CXX|C99     },  /* Armenian */
+  { 0x05b0, 0x05b9,     C99     },  /* Hebrew */
+  { 0x05bb, 0x05bd,     C99     },
+  { 0x05bf, 0x05bf,     C99     },
+  { 0x05c1, 0x05c2,     C99     },
+  { 0x05d0, 0x05ea, CXX|C99     },
+  { 0x05f0, 0x05f2, CXX|C99     },
+  { 0x05f3, 0x05f4, CXX         },
+  { 0x0621, 0x063a, CXX|C99     },  /* Arabic */
+  { 0x0640, 0x0652, CXX|C99     },
+  { 0x0660, 0x0669,     C99|DIG },  /* Digits */
+  { 0x0670, 0x06b7, CXX|C99     },  /* Arabic */
+  { 0x06ba, 0x06be, CXX|C99     },
+  { 0x06c0, 0x06ce, CXX|C99     },
+  { 0x06d0, 0x06dc,     C99     },
+  { 0x06e5, 0x06e7, CXX|C99     },
+  { 0x06e8, 0x06e8,     C99     },
+  { 0x06ea, 0x06ed,     C99     },
+  { 0x06f0, 0x06f9,     C99|DIG },  /* Digits */
+  { 0x0901, 0x0903,     C99     },  /* Devanagari */
+  { 0x0905, 0x0939, CXX|C99     },
+  { 0x093d, 0x093d,     C99     },  /* Special characters */
+  { 0x093e, 0x094d,     C99     },  /* Devanagari */
+  { 0x0950, 0x0952,     C99     },
+  { 0x0958, 0x0962, CXX|C99     },
+  { 0x0963, 0x0963,     C99     },
+  { 0x0966, 0x096f,     C99|DIG },  /* Digits */
+  { 0x0981, 0x0983,     C99     },  /* Bengali */
+  { 0x0985, 0x098c, CXX|C99     },
+  { 0x098f, 0x0990, CXX|C99     },
+  { 0x0993, 0x09a8, CXX|C99     },
+  { 0x09aa, 0x09b0, CXX|C99     },
+  { 0x09b2, 0x09b2, CXX|C99     },
+  { 0x09b6, 0x09b9, CXX|C99     },
+  { 0x09be, 0x09c4,     C99     },
+  { 0x09c7, 0x09c8,     C99     },
+  { 0x09cb, 0x09cd,     C99     },
+  { 0x09dc, 0x09dd, CXX|C99     },
+  { 0x09df, 0x09e1, CXX|C99     },
+  { 0x09e2, 0x09e3,     C99     },
+  { 0x09e6, 0x09ef,     C99|DIG },  /* Digits */
+  { 0x09f0, 0x09f1, CXX|C99     },  /* Bengali */
+  { 0x0a02, 0x0a02,     C99     },  /* Gurmukhi */
+  { 0x0a05, 0x0a0a, CXX|C99     },
+  { 0x0a0f, 0x0a10, CXX|C99     },
+  { 0x0a13, 0x0a28, CXX|C99     },
+  { 0x0a2a, 0x0a30, CXX|C99     },
+  { 0x0a32, 0x0a33, CXX|C99     },
+  { 0x0a35, 0x0a36, CXX|C99     },
+  { 0x0a38, 0x0a39, CXX|C99     },
+  { 0x0a3e, 0x0a42,     C99     },
+  { 0x0a47, 0x0a48,     C99     },
+  { 0x0a4b, 0x0a4d,     C99     },
+  { 0x0a59, 0x0a5c, CXX|C99     },
+  { 0x0a5e, 0x0a5e, CXX|C99     },
+  { 0x0a66, 0x0a6f,     C99|DIG },  /* Digits */
+  { 0x0a74, 0x0a74,     C99     },  /* Gurmukhi */
+  { 0x0a81, 0x0a83,     C99     },  /* Gujarati */
+  { 0x0a85, 0x0a8b, CXX|C99     },
+  { 0x0a8d, 0x0a8d, CXX|C99     },
+  { 0x0a8f, 0x0a91, CXX|C99     },
+  { 0x0a93, 0x0aa8, CXX|C99     },
+  { 0x0aaa, 0x0ab0, CXX|C99     },
+  { 0x0ab2, 0x0ab3, CXX|C99     },
+  { 0x0ab5, 0x0ab9, CXX|C99     },
+  { 0x0abd, 0x0ac5,     C99     },
+  { 0x0ac7, 0x0ac9,     C99     },
+  { 0x0acb, 0x0acd,     C99     },
+  { 0x0ad0, 0x0ad0,     C99     },
+  { 0x0ae0, 0x0ae0, CXX|C99     },
+  { 0x0ae6, 0x0aef,     C99|DIG },  /* Digits */
+  { 0x0b01, 0x0b03,     C99     },  /* Oriya */
+  { 0x0b05, 0x0b0c, CXX|C99     },
+  { 0x0b0f, 0x0b10, CXX|C99     },
+  { 0x0b13, 0x0b28, CXX|C99     },
+  { 0x0b2a, 0x0b30, CXX|C99     },
+  { 0x0b32, 0x0b33, CXX|C99     },
+  { 0x0b36, 0x0b39, CXX|C99     },
+  { 0x0b3d, 0x0b3d,     C99     },  /* Special characters */
+  { 0x0b3e, 0x0b43,     C99     },  /* Oriya */
+  { 0x0b47, 0x0b48,     C99     },
+  { 0x0b4b, 0x0b4d,     C99     },
+  { 0x0b5c, 0x0b5d, CXX|C99     },
+  { 0x0b5f, 0x0b61, CXX|C99     },
+  { 0x0b66, 0x0b6f,     C99|DIG },  /* Digits */
+  { 0x0b82, 0x0b83,     C99     },  /* Tamil */
+  { 0x0b85, 0x0b8a, CXX|C99     },
+  { 0x0b8e, 0x0b90, CXX|C99     },
+  { 0x0b92, 0x0b95, CXX|C99     },
+  { 0x0b99, 0x0b9a, CXX|C99     },
+  { 0x0b9c, 0x0b9c, CXX|C99     },
+  { 0x0b9e, 0x0b9f, CXX|C99     },
+  { 0x0ba3, 0x0ba4, CXX|C99     },
+  { 0x0ba8, 0x0baa, CXX|C99     },
+  { 0x0bae, 0x0bb5, CXX|C99     },
+  { 0x0bb7, 0x0bb9, CXX|C99     },
+  { 0x0bbe, 0x0bc2,     C99     },
+  { 0x0bc6, 0x0bc8,     C99     },
+  { 0x0bca, 0x0bcd,     C99     },
+  { 0x0be7, 0x0bef,     C99|DIG },  /* Digits */
+  { 0x0c01, 0x0c03,     C99     },  /* Telugu */
+  { 0x0c05, 0x0c0c, CXX|C99     },
+  { 0x0c0e, 0x0c10, CXX|C99     },
+  { 0x0c12, 0x0c28, CXX|C99     },
+  { 0x0c2a, 0x0c33, CXX|C99     },
+  { 0x0c35, 0x0c39, CXX|C99     },
+  { 0x0c3e, 0x0c44,     C99     },
+  { 0x0c46, 0x0c48,     C99     },
+  { 0x0c4a, 0x0c4d,     C99     },
+  { 0x0c60, 0x0c61, CXX|C99     },
+  { 0x0c66, 0x0c6f,     C99|DIG },  /* Digits */
+  { 0x0c82, 0x0c83,     C99     },  /* Kannada */
+  { 0x0c85, 0x0c8c, CXX|C99     },
+  { 0x0c8e, 0x0c90, CXX|C99     },
+  { 0x0c92, 0x0ca8, CXX|C99     },
+  { 0x0caa, 0x0cb3, CXX|C99     },
+  { 0x0cb5, 0x0cb9, CXX|C99     },
+  { 0x0cbe, 0x0cc4,     C99     },
+  { 0x0cc6, 0x0cc8,     C99     },
+  { 0x0cca, 0x0ccd,     C99     },
+  { 0x0cde, 0x0cde,     C99     },
+  { 0x0ce0, 0x0ce1, CXX|C99     },
+  { 0x0ce6, 0x0cef,     C99|DIG },  /* Digits */
+  { 0x0d02, 0x0d03,     C99     },  /* Malayalam */
+  { 0x0d05, 0x0d0c, CXX|C99     },
+  { 0x0d0e, 0x0d10, CXX|C99     },
+  { 0x0d12, 0x0d28, CXX|C99     },
+  { 0x0d2a, 0x0d39, CXX|C99     },
+  { 0x0d3e, 0x0d43,     C99     },
+  { 0x0d46, 0x0d48,     C99     },
+  { 0x0d4a, 0x0d4d,     C99     },
+  { 0x0d60, 0x0d61, CXX|C99     },
+  { 0x0d66, 0x0d6f,     C99|DIG },  /* Digits */
+  { 0x0e01, 0x0e30, CXX|C99     },  /* Thai */
+  { 0x0e31, 0x0e31,     C99     },
+  { 0x0e32, 0x0e33, CXX|C99     },
+  { 0x0e34, 0x0e3a,     C99     },
+  { 0x0e40, 0x0e46, CXX|C99     },
+  { 0x0e47, 0x0e49,     C99     },
+  { 0x0e50, 0x0e59, CXX|C99|DIG },  /* Digits */
+  { 0x0e5a, 0x0e5b, CXX|C99     },  /* Thai */
+  { 0x0e81, 0x0e82, CXX|C99     },  /* Lao */
+  { 0x0e84, 0x0e84, CXX|C99     },
+  { 0x0e87, 0x0e88, CXX|C99     },
+  { 0x0e8a, 0x0e8a, CXX|C99     },
+  { 0x0e8d, 0x0e8d, CXX|C99     },
+  { 0x0e94, 0x0e97, CXX|C99     },
+  { 0x0e99, 0x0e9f, CXX|C99     },
+  { 0x0ea1, 0x0ea3, CXX|C99     },
+  { 0x0ea5, 0x0ea5, CXX|C99     },
+  { 0x0ea7, 0x0ea7, CXX|C99     },
+  { 0x0eaa, 0x0eab, CXX|C99     },
+  { 0x0ead, 0x0eae, CXX|C99     },
+  { 0x0eaf, 0x0eaf, CXX         },
+  { 0x0eb0, 0x0eb0, CXX|C99     },
+  { 0x0eb1, 0x0eb1,     C99     },
+  { 0x0eb2, 0x0eb3, CXX|C99     },
+  { 0x0eb4, 0x0eb9,     C99     },
+  { 0x0ebb, 0x0ebc,     C99     },
+  { 0x0ebd, 0x0ebd, CXX|C99     },
+  { 0x0ec0, 0x0ec4, CXX|C99     },
+  { 0x0ec6, 0x0ec6, CXX|C99     },
+  { 0x0ec8, 0x0ecd,     C99     },
+  { 0x0ed0, 0x0ed9,     C99|DIG },  /* Digits */
+  { 0x0edc, 0x0edd,     C99     },  /* Lao */
+  { 0x0f00, 0x0f00,     C99     },  /* Tibetan */
+  { 0x0f18, 0x0f19,     C99     },
+  { 0x0f20, 0x0f33,     C99|DIG },  /* Digits */
+  { 0x0f35, 0x0f35,     C99     },  /* Tibetan */
+  { 0x0f37, 0x0f37,     C99     },
+  { 0x0f39, 0x0f39,     C99     },
+  { 0x0f3e, 0x0f47,     C99     },
+  { 0x0f49, 0x0f69,     C99     },
+  { 0x0f71, 0x0f84,     C99     },
+  { 0x0f86, 0x0f8b,     C99     },
+  { 0x0f90, 0x0f95,     C99     },
+  { 0x0f97, 0x0f97,     C99     },
+  { 0x0f99, 0x0fad,     C99     },
+  { 0x0fb1, 0x0fb7,     C99     },
+  { 0x0fb9, 0x0fb9,     C99     },
+  { 0x10a0, 0x10c5, CXX|C99     },  /* Georgian */
+  { 0x10d0, 0x10f6, CXX|C99     },
+  { 0x1100, 0x1159, CXX         },  /* Hangul */
+  { 0x1161, 0x11a2, CXX         },
+  { 0x11a8, 0x11f9, CXX         },
+  { 0x1e00, 0x1e9a, CXX|C99     },  /* Latin */
+  { 0x1e9b, 0x1e9b,     C99     },
+  { 0x1ea0, 0x1ef9, CXX|C99     },
+  { 0x1f00, 0x1f15, CXX|C99     },  /* Greek */
+  { 0x1f18, 0x1f1d, CXX|C99     },
+  { 0x1f20, 0x1f45, CXX|C99     },
+  { 0x1f48, 0x1f4d, CXX|C99     },
+  { 0x1f50, 0x1f57, CXX|C99     },
+  { 0x1f59, 0x1f59, CXX|C99     },
+  { 0x1f5b, 0x1f5b, CXX|C99     },
+  { 0x1f5d, 0x1f5d, CXX|C99     },
+  { 0x1f5f, 0x1f7d, CXX|C99     },
+  { 0x1f80, 0x1fb4, CXX|C99     },
+  { 0x1fb6, 0x1fbc, CXX|C99     },
+  { 0x1fbe, 0x1fbe,     C99     },  /* Special characters */
+  { 0x1fc2, 0x1fc4, CXX|C99     },  /* Greek */
+  { 0x1fc6, 0x1fcc, CXX|C99     },
+  { 0x1fd0, 0x1fd3, CXX|C99     },
+  { 0x1fd6, 0x1fdb, CXX|C99     },
+  { 0x1fe0, 0x1fec, CXX|C99     },
+  { 0x1ff2, 0x1ff4, CXX|C99     },
+  { 0x1ff6, 0x1ffc, CXX|C99     },
+  { 0x203f, 0x2040,     C99     },  /* Special characters */
+  { 0x207f, 0x207f,     C99     },  /* Latin */
+  { 0x2102, 0x2102,     C99     },  /* Special characters */
+  { 0x2107, 0x2107,     C99     },
+  { 0x210a, 0x2113,     C99     },
+  { 0x2115, 0x2115,     C99     },
+  { 0x2118, 0x211d,     C99     },
+  { 0x2124, 0x2124,     C99     },
+  { 0x2126, 0x2126,     C99     },
+  { 0x2128, 0x2128,     C99     },
+  { 0x212a, 0x2131,     C99     },
+  { 0x2133, 0x2138,     C99     },
+  { 0x2160, 0x2182,     C99     },
+  { 0x3005, 0x3007,     C99     },
+  { 0x3021, 0x3029,     C99     },
+  { 0x3041, 0x3093, CXX|C99     },  /* Hiragana */
+  { 0x3094, 0x3094, CXX         },
+  { 0x309b, 0x309c, CXX|C99     },
+  { 0x309d, 0x309e, CXX         },
+  { 0x30a1, 0x30f6, CXX|C99     },  /* Katakana */
+  { 0x30f7, 0x30fa, CXX         },
+  { 0x30fb, 0x30fc, CXX|C99     },
+  { 0x30fd, 0x30fe, CXX         },
+  { 0x3105, 0x312c, CXX|C99     },  /* Bopomofo */
+  { 0x4e00, 0x9fa5, CXX|C99     },  /* CJK Unified Ideographs */
+  { 0xac00, 0xd7a3,     C99     },  /* Hangul */
+  { 0xf900, 0xfa2d, CXX         },  /* CJK Unified Ideographs */
+  { 0xfb1f, 0xfb36, CXX         },
+  { 0xfb38, 0xfb3c, CXX         },
+  { 0xfb3e, 0xfb3e, CXX         },
+  { 0xfb40, 0xfb44, CXX         },
+  { 0xfb46, 0xfbb1, CXX         },
+  { 0xfbd3, 0xfd3f, CXX         },
+  { 0xfd50, 0xfd8f, CXX         },
+  { 0xfd92, 0xfdc7, CXX         },
+  { 0xfdf0, 0xfdfb, CXX         },
+  { 0xfe70, 0xfe72, CXX         },
+  { 0xfe74, 0xfe74, CXX         },
+  { 0xfe76, 0xfefc, CXX         },
+  { 0xff21, 0xff3a, CXX         },
+  { 0xff41, 0xff5a, CXX         },
+  { 0xff66, 0xffbe, CXX         },
+  { 0xffc2, 0xffc7, CXX         },
+  { 0xffca, 0xffcf, CXX         },
+  { 0xffd2, 0xffd7, CXX         },
+  { 0xffda, 0xffdc, CXX         },
+};
+
+#endif /* cppucnid.h */
--- a/gcc/cppucnid.pl
+++ b/gcc/cppucnid.pl
@ -0,0 +1,130 @@
+#! /usr/bin/perl -w
+use strict;
+
+# Convert cppucnid.tab to cppucnid.h.  We use two arrays of length
+# 65536 to represent the table, since this is nice and simple.  The
+# first array holds the tags indicating which ranges are valid in
+# which contexts.  The second array holds the language name associated
+# with each element.
+
+our(@tags, @names);
+@tags = ("") x 65536;
+@names = ("") x 65536;
+
+
+# Array mapping tag numbers to standard #defines
+our @stds;
+
+# Current standard and language
+our($curstd, $curlang);
+
+# First block of the file is a template to be saved for later.
+our @template;
+
+while (<>) {
+    chomp;
+    last if $_ eq '%%';
+    push @template, $_;
+};
+
+# Second block of the file is the UCN tables.
+# The format looks like this:
+#
+# [std]
+#
+# ; language
+# xxxx-xxxx xxxx xxxx-xxxx ....
+#
+# with comment lines starting with #.
+
+while (<>) {
+    chomp;
+    /^#/ and next;
+    /^\s*$/ and next;
+    /^\[(.+)\]$/ and do {
+	$curstd = $1;
+ 	next;
+    };
+    /^; (.+)$/ and do {
+	$curlang = $1;
+	next;
+    };
+
+    process_range(split);
+}
+
+# Print out the template, inserting as requested.
+$\ = "\n";
+for (@template) {
+    print("/* Automatically generated from cppucnid.tab, do not edit */"),
+        next if $_ eq "[dne]";
+    print_table(), next if $_ eq "[table]";
+    print;
+}
+
+sub print_table {
+    my($lo, $hi);
+    my $prevname = "";
+
+    for ($lo = 0; $lo <= $#tags; $lo = $hi) {
+	$hi = $lo;
+	$hi++ while $hi <= $#tags
+	    && $tags[$hi] eq $tags[$lo]
+	    && $names[$hi] eq $names[$lo];
+
+	# Range from $lo to $hi-1.
+	# Don't make entries for ranges that are not valid idchars.
+	next if ($tags[$lo] eq "");
+	my $tag = $tags[$lo];
+        $tag = "    ".$tag if $tag =~ /^C99/;
+
+	if ($names[$lo] eq $prevname) {
+	    printf("  { 0x%04x, 0x%04x, %-11s },\n",
+		   $lo, $hi-1, $tag);
+	} else {
+	    printf("  { 0x%04x, 0x%04x, %-11s },  /* %s */\n",
+		   $lo, $hi-1, $tag, $names[$lo]);
+	}
+	$prevname = $names[$lo];
+    }
+}
+
+# The line is a list of four-digit hexadecimal numbers or
+# pairs of such numbers.  Each is a valid identifier character
+# from the given language, under the given standard.
+sub process_range {
+    for my $range (@_) {
+	if ($range =~ /^[0-9a-f]{4}$/) {
+	    my $i = hex($range);
+	    if ($tags[$i] eq "") {
+		$tags[$i] = $curstd;
+	    } else {
+		$tags[$i] = $curstd . "|" . $tags[$i];
+	    }
+	    if ($names[$i] ne "" && $names[$i] ne $curlang) {
+		warn sprintf ("language overlap: %s/%s at %x (tag %d)",
+			      $names[$i], $curlang, $i, $tags[$i]);
+		next;
+	    }
+	    $names[$i] = $curlang;
+	} elsif ($range =~ /^ ([0-9a-f]{4}) - ([0-9a-f]{4}) $/x) {
+	    my ($start, $end) = (hex($1), hex($2));
+	    my $i;
+	    for ($i = $start; $i <= $end; $i++) {
+		if ($tags[$i] eq "") {
+		    $tags[$i] = $curstd;
+		} else {
+		    $tags[$i] = $curstd . "|" . $tags[$i];
+		}
+		if ($names[$i] ne "" && $names[$i] ne $curlang) {
+		    warn sprintf ("language overlap: %s/%s at %x (tag %d)",
+				  $names[$i], $curlang, $i, $tags[$i]);
+		    next;
+		}
+		$names[$i] = $curlang;
+	    }
+	} else {
+	    warn "malformed range expression $range";
+	}
+    }
+}
--- a/gcc/cppucnid.tab
+++ b/gcc/cppucnid.tab
@ -0,0 +1,239 @@
+/* Table of UCNs which are valid in identifiers.
+   Copyright (C) 2003 Free Software Foundation, Inc.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+[dne]
+
+/* This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
+   D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
+   the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
+   a reproduction of ISO/IEC PDTR 10176.  Unfortunately these tables
+   are not identical.  */
+
+#ifndef CPPUCNID_H
+#define CPPUCNID_H
+
+#define C99 1
+#define CXX 2
+#define DIG 4
+
+struct ucnrange
+{
+  unsigned short lo, hi;
+  unsigned short flags;
+};
+
+static const struct ucnrange ucnranges[] = {
+[table]
+};
+
+#endif /* cppucnid.h */
+%%
+
+[C99]
+
+; Latin
+00aa 00ba 00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9b
+1ea0-1ef9 207f
+
+; Greek
+0386 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0
+03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
+1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3
+1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
+
+; Cyrillic
+0401-040c 040e-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
+04d0-04eb 04ee-04f5 04f8-04f9
+
+; Armenian
+0531-0556 0561-0587
+
+; Hebrew
+05b0-05b9 05bb-05bd 05bf 05c1-05c2 05d0-05ea 05f0-05f2
+
+; Arabic
+0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06d0-06dc 06e5-06e8
+06ea-06ed
+
+; Devanagari
+0901-0903 0905-0939 093e-094d 0950-0952 0958-0963
+
+; Bengali
+0981-0983 0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9
+09be-09c4 09c7-09c8 09cb-09cd 09dc-09dd 09df-09e3 09f0-09f1
+
+; Gurmukhi
+0a02 0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36
+0a38-0a39 0a3e-0a42 0a47-0a48 0a4b-0a4d 0a59-0a5c 0a5e 0a74
+
+; Gujarati
+0a81-0a83 0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3
+0ab5-0ab9 0abd-0ac5 0ac7-0ac9 0acb-0acd 0ad0 0ae0
+
+; Oriya
+0b01-0b03 0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39
+0b3e-0b43 0b47-0b48 0b4b-0b4d 0b5c-0b5d 0b5f-0b61
+
+; Tamil
+0b82-0b83 0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f
+0ba3-0ba4 0ba8-0baa 0bae-0bb5 0bb7-0bb9 0bbe-0bc2 0bc6-0bc8 0bca-0bcd
+
+; Telugu
+0c01-0c03 0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c3e-0c44
+0c46-0c48 0c4a-0c4d 0c60-0c61
+
+; Kannada
+0c82-0c83 0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0cbe-0cc4
+0cc6-0cc8 0cca-0ccd 0cde 0ce0-0ce1
+
+; Malayalam
+0d02-0d03 0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d3e-0d43 0d46-0d48
+0d4a-0d4d 0d60-0d61
+
+# CORRECTION: exclude 0e50-0e59 from the Thai range as it also appears
+# in the Digits range below.
+; Thai
+0e01-0e3a 0e40-0e49 0e5a-0e5b
+
+; Lao
+0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
+0ea7 0eaa-0eab 0ead-0eae 0eb0-0eb9 0ebb-0ebd 0ec0-0ec4 0ec6 0ec8-0ecd
+0edc-0edd
+
+; Tibetan
+0f00 0f18-0f19 0f35 0f37 0f39 0f3e-0f47 0f49-0f69 0f71-0f84 0f86-0f8b
+0f90-0f95 0f97 0f99-0fad 0fb1-0fb7 0fb9
+
+; Georgian
+10a0-10c5 10d0-10f6
+
+; Hiragana
+3041-3093 309b-309c
+
+; Katakana
+30a1-30f6 30fb-30fc
+
+; Bopomofo
+3105-312c
+
+; CJK Unified Ideographs
+4e00-9fa5
+
+; Hangul
+ac00-d7a3
+
+; Special characters
+00b5 00b7 02b0-02b8 02bb 02bd-02c1 02d0-02d1 02e0-02e4 037a 0559 093d
+0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128
+212a-2131 2133-2138 2160-2182 3005-3007 3021-3029
+
+[C99|DIG]
+; Digits
+0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f
+0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33
+
+[CXX]
+
+; Latin
+00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9a 1ea0-1ef9
+
+; Greek
+0384 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0 
+03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
+1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3 
+1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
+
+; Cyrillic
+0401-040d 040f-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
+04d0-04eb 04ee-04f5 04f8-04f9
+
+; Armenian
+0531-0556 0561-0587
+
+; Hebrew
+05d0-05ea 05f0-05f4
+
+; Arabic
+0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06e5-06e7
+
+; Devanagari
+0905-0939 0958-0962
+
+; Bengali
+0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9 09dc-09dd 
+09df-09e1 09f0-09f1
+
+; Gurmukhi
+0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36 0a38-0a39
+0a59-0a5c 0a5e
+
+; Gujarati
+0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3 0ab5-0ab9 0ae0
+
+; Oriya
+0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39 0b5c-0b5d
+0b5f-0b61
+
+; Tamil
+0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f 0ba3-0ba4 
+0ba8-0baa 0bae-0bb5 0bb7-0bb9
+
+; Telugu
+0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c60-0c61
+
+; Kannada
+0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0ce0-0ce1
+
+; Malayalam
+0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d60-0d61
+
+# CORRECTION: Exclude 0e50-0e59 from the Thai range and make a fake
+# Digits range for it, to match C99.  cppcharset.c knows that C++
+# doesn't distinguish digits from other UCNs valid in identifiers.
+; Thai
+0e01-0e30 0e32-0e33 0e40-0e46 0e4f-0e49 0e5a-0e5b
+
+; Digits
+0e50-0e59
+
+# CORRECTION: Change 0e0d to 0e8d (typo in standard; see C++ DR 131)
+; Lao
+0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
+0ea7 0eaa-0eab 0ead-0eb0 0eb2 0eb3 0ebd 0ec0-0ec4 0ec6
+
+; Georgian
+10a0-10c5 10d0-10f6
+
+; Hiragana
+3041-3094 309b-309e
+
+; Katakana
+30a1-30fe
+
+# CORRECTION: language spelled "Bopmofo" in C++98.
+; Bopomofo
+3105-312c
+
+; Hangul
+1100-1159 1161-11a2 11a8-11f9
+
+; CJK Unified Ideographs
+f900-fa2d fb1f-fb36 fb38-fb3c fb3e fb40-fb41 fb42-fb44 fb46-fbb1
+fbd3-fd3f fd50-fd8f fd92-fdc7 fdf0-fdfb fe70-fe72 fe74 fe76-fefc
+ff21-ff3a ff41-ff5a ff66-ffbe ffc2-ffc7 ffca-ffcf ffd2-ffd7
+ffda-ffdc 4e00-9fa5
+
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@ -104,6 +104,7 @@ useful on its own.

 Overview

+* Character sets::
 * Initial processing::
 * Tokenization::
 * The preprocessing language::
@ -233,11 +234,62 @@ manual refer to GNU CPP.
@c man end

@menu
+* Character sets::
 * Initial processing::
 * Tokenization::
 * The preprocessing language::
@end menu

+@node Character sets
+@section Character sets
+
+Source code character set processing in C and related languages is
+rather complicated.  The C standard discusses two character sets, but
+there are really at least four.
+
+The files input to CPP might be in any character set at all.  CPP's
+very first action, before it even looks for line boundaries, is to
+convert the file into the character set it uses for internal
+processing.  That set is what the C standard calls the @dfn{source}
+character set.  It must be isomorphic with ISO 10646, also known as
+Unicode.  CPP uses the UTF-8 encoding of Unicode.
+
+At present, GNU CPP does not implement conversion from arbitrary file
+encodings to the source character set.  Use of any encoding other than
+plain ASCII or UTF-8, except in comments, will cause errors.  Use of
+encodings that are not strict supersets of ASCII, such as Shift JIS,
+may cause errors even if non-ASCII characters appear only in comments. 
+We plan to fix this in the near future.
+
+All preprocessing work (the subject of the rest of this manual) is
+carried out in the source character set.  If you request textual
+output from the preprocessor with the @option{-E} option, it will be
+in UTF-8.
+
+After preprocessing is complete, string and character constants are
+converted again, into the @dfn{execution} character set.  This
+character set is under control of the user; the default is UTF-8,
+matching the source character set.  Wide string and character
+constants have their own character set, which is not called out
+specifically in the standard.  Again, it is under control of the user.
+The default is UTF-16 or UTF-32, whichever fits in the target's
+@code{wchar_t} type, in the target machine's byte
+order.@footnote{UTF-16 does not meet the requirements of the C
+standard for a wide character set, but the choice of 16-bit
+@code{wchar_t} is enshrined in some system ABIs so we cannot fix
+this.}  Octal and hexadecimal escape sequences do not undergo
+conversion; @t{'\x12'} has the value 0x12 regardless of the currently
+selected execution character set.  All other escapes are replaced by
+the character in the source character set that they represent, then
+converted to the execution character set, just like unescaped
+characters.
+
+GCC does not permit the use of characters outside the ASCII range, nor
+@samp{\u} and @samp{\U} escapes, in identifiers.  We hope this will
+change eventually, but there are problems with the standard semantics
+of such ``extended identifiers'' which must be resolved through the
+ISO C and C++ committees first.
+
@node Initial processing
@section Initial processing

@ -251,27 +303,19 @@ standard.

@enumerate
@item
-@cindex character sets
@cindex line endings
 The input file is read into memory and broken into lines.

-CPP expects its input to be a text file, that is, an unstructured
-stream of ASCII characters, with some characters indicating the end of a
-line of text.  Extended ASCII character sets, such as ISO Latin-1 or
-Unicode encoded in UTF-8, are also acceptable.  Character sets that are
-not strict supersets of seven-bit ASCII will not work.  We plan to add
-complete support for international character sets in a future release.
-
 Different systems use different conventions to indicate the end of a
 line.  GCC accepts the ASCII control sequences @kbd{LF}, @kbd{@w{CR
-LF}} and @kbd{CR} as end-of-line markers.  These
-are the canonical sequences used by Unix, DOS and VMS, and the
-classic Mac OS (before OSX) respectively.  You may therefore safely copy
-source code written on any of those systems to a different one and use
-it without conversion.  (GCC may lose track of the current line number
-if a file doesn't consistently use one convention, as sometimes happens
-when it is edited on computers with different conventions that share a
-network file system.)
+LF}} and @kbd{CR} as end-of-line markers.  These are the canonical
+sequences used by Unix, DOS and VMS, and the classic Mac OS (before
+OSX) respectively.  You may therefore safely copy source code written
+on any of those systems to a different one and use it without
+conversion.  (GCC may lose track of the current line number if a file
+doesn't consistently use one convention, as sometimes happens when it
+is edited on computers with different conventions that share a network
+file system.)

 If the last line of any input file lacks an end-of-line marker, the end
 of the file is considered to implicitly supply one.  The C standard says
@ -378,8 +422,9 @@ comment.
@end group
@end example

-Comments are not recognized within string literals.  @t{@w{"/* blah
-*/"}} is the string constant @samp{@w{/* blah */}}, not an empty string.
+Comments are not recognized within string literals.  
+@t{@w{"/* blah */"}} is the string constant @samp{@w{/* blah */}}, not
+an empty string.

 Line comments are not in the 1989 edition of the C standard, but they
 are recognized by GCC as an extension.  In C++ and in the 1999 edition
@ -3706,8 +3751,9 @@ and stick to it.
@item The mapping of physical source file multi-byte characters to the
 execution character set.

-Currently, GNU cpp only supports character sets that are strict supersets
-of ASCII, and performs no translation of characters.
+Currently, CPP requires its input to be ASCII or UTF-8.  The execution
+character set may be controlled by the user, with the
+@code{-ftarget-charset} and @code{-ftarget-wide-charset} options.

@item Identifier characters.
@anchor{Identifier characters}
--- a/gcc/doc/cppopts.texi
+++ b/gcc/doc/cppopts.texi
@ -498,6 +498,21 @@ correct column numbers in warnings or errors, even if tabs appear on the
 line.  If the value is less than 1 or greater than 100, the option is
 ignored.  The default is 8.

+@item -fexec-charset=@var{charset}
+@opindex fexec-charset
+Set the execution character set, used for string and character
+constants.  The default is UTF-8.  @var{charset} can be any encoding
+supported by the system's @code{iconv} library routine.
+
+@item -fwide-exec-charset=@var{charset}
+@opindex fwide-exec-charset
+Set the wide execution character set, used for wide string and
+character constants.  The default is UTF-32 or UTF-16, whichever
+corresponds to the width of @code{wchar_t}.  As with
+@option{-ftarget-charset}, @var{charset} can be any encoding supported
+by the system's @code{iconv} library routine; however, you will have
+problems with encodings that do not fit exactly in @code{wchar_t}.
+
@item -fno-show-column
@opindex fno-show-column
 Do not print column numbers in diagnostics.  This may be necessary if
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@ -439,7 +439,6 @@ extensions, accepted by GCC in C89 mode and in C++.
 * Empty Structures::    Structures with no members.
 * Variadic Macros::	Macros with a variable number of arguments.
 * Escaped Newlines::    Slightly looser rules for escaped newlines.
-* Multi-line Strings::  String literals with embedded newlines.
 * Subscripting::        Any array can be subscripted, even if not an lvalue.
 * Pointer Arith::       Arithmetic on @code{void}-pointers and function pointers.
 * Initializers::        Non-constant initializers.
@ -1529,27 +1528,14 @@ argument, these arguments are not macro expanded.

 Recently, the preprocessor has relaxed its treatment of escaped
 newlines.  Previously, the newline had to immediately follow a
-backslash.  The current implementation allows whitespace in the form of
-spaces, horizontal and vertical tabs, and form feeds between the
+backslash.  The current implementation allows whitespace in the form
+of spaces, horizontal and vertical tabs, and form feeds between the
 backslash and the subsequent newline.  The preprocessor issues a
 warning, but treats it as a valid escaped newline and combines the two
 lines to form a single logical line.  This works within comments and
-tokens, including multi-line strings, as well as between tokens.
-Comments are @emph{not} treated as whitespace for the purposes of this
-relaxation, since they have not yet been replaced with spaces.
-
-@node Multi-line Strings
-@section String Literals with Embedded Newlines
-@cindex multi-line string literals
-
-As an extension, GNU CPP permits string literals to cross multiple lines
-without escaping the embedded newlines.  Each embedded newline is
-replaced with a single @samp{\n} character in the resulting string
-literal, regardless of what form the newline took originally.
-
-CPP currently allows such strings in directives as well (other than the
-@samp{#include} family).  This is deprecated and will eventually be
-removed.
+tokens, as well as between tokens.  Comments are @emph{not} treated as
+whitespace for the purposes of this relaxation, since they have not
+yet been replaced with spaces.

@node Subscripting
@section Non-Lvalue Arrays May Have Subscripts
@ -4437,18 +4423,47 @@ This extension is not supported by GNU C++.

@node Function Names
@section Function Names as Strings
+@cindex @code{__func__} identifier
@cindex @code{__FUNCTION__} identifier
@cindex @code{__PRETTY_FUNCTION__} identifier
-@cindex @code{__func__} identifier

-GCC predefines two magic identifiers to hold the name of the current
-function.  The identifier @code{__FUNCTION__} holds the name of the function
-as it appears in the source.  The identifier @code{__PRETTY_FUNCTION__}
-holds the name of the function pretty printed in a language specific
-fashion.
+GCC provides three magic variables which hold the name of the current
+function, as a string.  The first of these is @code{__func__}, which
+is part of the C99 standard:

-These names are always the same in a C function, but in a C++ function
-they may be different.  For example, this program:
+@display
+The identifier @code{__func__} is implicitly declared by the translator
+as if, immediately following the opening brace of each function
+definition, the declaration
+
+@smallexample
+static const char __func__[] = "function-name";
+@end smallexample
+
+appeared, where function-name is the name of the lexically-enclosing
+function.  This name is the unadorned name of the function.
+@end display
+
+@code{__FUNCTION__} is another name for @code{__func__}.  Older
+versions of GCC recognize only this name.  However, it is not
+standardized.  For maximum portability, we recommend you use
+@code{__func__}, but provide a fallback definition with the
+preprocessor:
+
+@smallexample
+#if __STDC_VERSION__ < 199901L
+# if __GNUC__ >= 2
+#  define __func__ __FUNCTION__
+# else
+#  define __func__ "<unknown>"
+# endif
+#endif
+@end smallexample
+
+In C, @code{__PRETTY_FUNCTION__} is yet another name for
+@code{__func__}.  However, in C++, @code{__PRETTY_FUNCTION__} contains
+the type signature of the function as well as its bare name.  For
+example, this program:

@smallexample
 extern "C" @{
@ -4478,46 +4493,16 @@ gives this output:

@smallexample
 __FUNCTION__ = sub
-__PRETTY_FUNCTION__ = int  a::sub (int)
+__PRETTY_FUNCTION__ = void a::sub(int)
@end smallexample

-The compiler automagically replaces the identifiers with a string
-literal containing the appropriate name.  Thus, they are neither
-preprocessor macros, like @code{__FILE__} and @code{__LINE__}, nor
-variables.  This means that they catenate with other string literals, and
-that they can be used to initialize char arrays.  For example
-
-@smallexample
-char here[] = "Function " __FUNCTION__ " in " __FILE__;
-@end smallexample
-
-On the other hand, @samp{#ifdef __FUNCTION__} does not have any special
-meaning inside a function, since the preprocessor does not do anything
-special with the identifier @code{__FUNCTION__}.
-
-Note that these semantics are deprecated, and that GCC 3.2 will handle
-@code{__FUNCTION__} and @code{__PRETTY_FUNCTION__} the same way as
-@code{__func__}.  @code{__func__} is defined by the ISO standard C99:
-
-@display
-The identifier @code{__func__} is implicitly declared by the translator
-as if, immediately following the opening brace of each function
-definition, the declaration
-
-@smallexample
-static const char __func__[] = "function-name";
-@end smallexample
-
-appeared, where function-name is the name of the lexically-enclosing
-function.  This name is the unadorned name of the function.
-@end display
-
-By this definition, @code{__func__} is a variable, not a string literal.
-In particular, @code{__func__} does not catenate with other string
-literals.
-
-In @code{C++}, @code{__FUNCTION__} and @code{__PRETTY_FUNCTION__} are
-variables, declared in the same way as @code{__func__}.
+These identifiers are not preprocessor macros.  In GCC 3.3 and
+earlier, in C only, @code{__FUNCTION__} and @code{__PRETTY_FUNCTION__}
+were treated as string literals; they could be used to initialize
+@code{char} arrays, and they could be concatenated with other string
+literals.  GCC 3.4 and later treat them as variables, like
+@code{__func__}.  In C++, @code{__FUNCTION__} and
+@code{__PRETTY_FUNCTION__} have always been variables.

@node Return Address
@section Getting the Return or Frame Address of a Function
--- a/gcc/objc/objc-act.c
+++ b/gcc/objc/objc-act.c
@ -1274,18 +1274,18 @@ my_build_string (len, str)
  return fix_string_type (build_string (len, str));
 }

-/* Given a chain of STRING_CST's, build a static instance of
-   NXConstantString which points at the concatenation of those strings.
+/* Build a static instance of NXConstantString which points at the
+   string constant STRING.
   We place the string object in the __string_objects section of the
   __OBJC segment.  The Objective-C runtime will initialize the isa
   pointers of the string objects to point at the NXConstantString
   class object.  */

 tree
-build_objc_string_object (strings)
-     tree strings;
+build_objc_string_object (string)
+     tree string;
 {
-  tree string, initlist, constructor;
+  tree initlist, constructor;
  int length;

  if (lookup_interface (constant_string_id) == NULL_TREE)
@ -1297,22 +1297,6 @@ build_objc_string_object (strings)

  add_class_reference (constant_string_id);

-  if (TREE_CHAIN (strings))
-    {
-      varray_type vstrings;
-      VARRAY_TREE_INIT (vstrings, 32, "strings");
-
-      for (; strings ; strings = TREE_CHAIN (strings))
-	VARRAY_PUSH_TREE (vstrings, strings);
-
-      string = combine_strings (vstrings);
-    }
-  else
-    string = strings;
-
-  string = fix_string_type (string);
-
-  TREE_SET_CODE (string, STRING_CST);
  length = TREE_STRING_LENGTH (string) - 1;

  /* We could not properly create NXConstantString in synth_module_prologue,
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@ -1,3 +1,17 @@
+2003-07-04  Zack Weinberg  <zack@codesourcery.com>
+
+	* gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c
+	everywhere.
+	* gcc.dg/concat.c: Concatenation of string constants with
+	__FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error.
+	* gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp.
+	* gcc.dg/cpp/escape-2.c: Use wide character constants where
+	necessary to avoid multi-character character constant warning.
+	* gcc.dg/cpp/escape.c: Likewise.
+	* gcc.dg/cpp/ucs.c: Likewise.
+	Remove backslashes from dg-bogus comments, as they confuse Tcl.
+	Fix a typo.
+
 2003-07-04  Kazu Hirata  <kazu@cs.umass.edu>

 	PR c/11428
--- a/gcc/testsuite/gcc.c-torture/execute/wchar_t-1.x
+++ b/gcc/testsuite/gcc.c-torture/execute/wchar_t-1.x
@ -0,0 +1,3 @@
+# Doesn't compile due to use of literal ISO8859.1 characters.  PR 11439.
+set torture_compile_xfail "*-*-*"
+return 0
--- a/gcc/testsuite/gcc.dg/concat.c
+++ b/gcc/testsuite/gcc.dg/concat.c
@ -2,15 +2,15 @@

 /* { dg-do compile } */

-/* Test we output a warning for concatenation of artificial strings.
+/* Test we output an error for concatenation of artificial strings.

   Neil Booth, 10 Dec 2001.  */

 void foo ()
 {
-  char str1[] = __FUNCTION__ ".";	/* { dg-warning "deprecated" } */
-  char str2[] = __PRETTY_FUNCTION__ ".";/* { dg-warning "deprecated" } */
-  char str3[] = "." __FUNCTION__;	/* { dg-warning "deprecated" } */
-  char str4[] = "." __PRETTY_FUNCTION__;/* { dg-warning "deprecated" } */
-  char str5[] = "." ".";	/* No warning.  */
+  char s1[] = __FUNCTION__".";	     /* { dg-error "(parse|syntax|invalid)" } */
+  char s2[] = __PRETTY_FUNCTION__".";/* { dg-error "(parse|syntax|invalid)" } */
+  char s3[] = "."__FUNCTION__;	     /* { dg-error "(parse|syntax|invalid)" } */
+  char s4[] = "."__PRETTY_FUNCTION__;/* { dg-error "(parse|syntax|invalid)" } */
+  char s5[] = "."".";                /* No error.  */
 }
--- a/gcc/testsuite/gcc.dg/cpp/escape-2.c
+++ b/gcc/testsuite/gcc.dg/cpp/escape-2.c
@ -10,11 +10,11 @@

 #if '\e'		/* { dg-warning "non-ISO" "non-ISO \\e" } */
 #endif
-#if '\u00a0'		/* { dg-bogus "unknown" "\\u is known in C99" } */
+#if L'\u00a0'		/* { dg-bogus "unknown" "\\u is known in C99" } */
 #endif

 void foo ()
 {
  int c = '\E';		/* { dg-warning "non-ISO" "non-ISO \\E" } */
-  c = '\u00a0';		/* { dg-bogus "unknown" "\\u is known in C99" } */
+  c = L'\u00a0';	/* { dg-bogus "unknown" "\\u is known in C99" } */
 }
--- a/gcc/testsuite/gcc.dg/cpp/escape.c
+++ b/gcc/testsuite/gcc.dg/cpp/escape.c
@ -13,7 +13,7 @@
 #if '\x1a' != 26	/* { dg-warning "traditional" "traditional hex" } */
 #error bad hex		/* { dg-bogus "bad" "bad hexadecimal evaluation" } */
 #endif
-#if '\u'		/* { dg-warning "unknown" "\u is unknown in C89" } */
+#if L'\u00a1'		/* { dg-warning "only valid" "\u is unknown in C89" } */
 #endif

 void foo ()
@ -21,5 +21,5 @@ void foo ()
  int c = '\a';		/* { dg-warning "traditional" "traditional bell" } */

  c = '\xa1';		/* { dg-warning "traditional" "traditional hex" } */
-  c = '\u';		/* { dg-warning "unknown" "\u is unknown in C89" } */
+  c = L'\u00a1';	/* { dg-warning "only valid" "\u is unknown in C89" } */
 }
--- a/gcc/testsuite/gcc.dg/cpp/ucs.c
+++ b/gcc/testsuite/gcc.dg/cpp/ucs.c
@ -35,12 +35,12 @@
 #undef long

 #if L'\u1234' != 0x1234
-#error bad short ucs	/* { dg-bogus "bad" "bad \u1234 evaluation" } */
+#error bad short ucs	/* { dg-bogus "bad" "bad u1234 evaluation" } */
 #endif

 #if WCHAR_MAX >= 0x7ffffff
 # if L'\U1234abcd' != 0x1234abcd
-#  error bad long ucs	/* { dg-bogus "bad" "bad \U1234abcd evaluation" } */
+#  error bad long ucs	/* { dg-bogus "bad" "bad U1234abcd evaluation" } */
 # endif
 #endif

@ -48,7 +48,7 @@ void foo ()
 {
  int c;

-  c = L'\ubad';		/* { dg-error "incomplete" "incompete UCN 1" } */
+  c = L'\ubad';		/* { dg-error "incomplete" "incomplete UCN 1" } */
  c = L"\U1234"[0];	/* { dg-error "incomplete" "incompete UCN 2" } */

  c = L'\u000x';	/* { dg-error "incomplete" "non-hex digit in UCN" } */
@ -58,7 +58,7 @@ void foo ()

  c = '\u0024';		/* { dg-bogus "invalid" "0024 is a valid UCN" } */
  c = "\u0040"[0];	/* { dg-bogus "invalid" "0040 is a valid UCN" } */
-  c = '\u00a0';		/* { dg-bogus "invalid" "00a0 is a valid UCN" } */
+  c = L'\u00a0';	/* { dg-bogus "invalid" "00a0 is a valid UCN" } */
  c = '\U00000060';	/* { dg-bogus "invalid" "0060 is a valid UCN" } */

  c = '\u0025';		/* { dg-error "not a valid" "0025 invalid UCN" } */
--- a/gcc/testsuite/gcc.dg/wtr-strcat-1.c
+++ b/gcc/testsuite/gcc.dg/wtr-strcat-1.c
@ -9,7 +9,7 @@ testfunc ()
 {
  const char *foo;
  
-  foo = "hello" "hello"; /* { dg-warning "string concatenation" "string concatenation" } */
+  foo = "hello" "hello"; /* { dg-warning "concatenation" "string concatenation" } */

 # 15 "sys-header.h" 3
 /* We are in system headers now, no -Wtraditional warnings should issue.  */
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@ -1,3 +1,16 @@
+2003-07-04  Zack Weinberg  <zack@codesourcery.com>
+
+	* testsuite/22_locale/collate/compare/wchar_t/2.cc
+	* testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc
+	* testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc
+	* testsuite/22_locale/collate/hash/wchar_t/2.cc
+	* testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc
+	* testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc
+	* testsuite/22_locale/collate/transform/wchar_t/2.cc
+	* testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc
+	* testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc:
+	XFAIL on all targets.
+
 2003-07-04  Benjamin Kosnik  <bkoz@redhat.com>

 	* acinclude.m4 (GLIBCPP_ENABLE_PCH): Fix missed variable.
--- a/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/2.cc
@ -18,6 +18,10 @@
 // Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307,
 // USA.

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 // 22.2.4.1.1 collate members

 #include <locale>
--- a/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_1
--- a/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_1
--- a/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/2.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <locale>
 #include <testsuite_hooks.h>

--- a/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_1
--- a/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_1
--- a/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/2.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <locale>
 #include <testsuite_hooks.h>

--- a/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_2
--- a/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc
+++ b/libstdc++-v3/testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc
@ -20,6 +20,10 @@

 // 22.2.4.1.1 collate members

+// Doesn't work due to use of literal ISO8859.1 characters.  PR 11439
+// { dg-do compile { xfail *-*-* } } should be run
+// { dg-excess-errors "" }
+
 #include <testsuite_hooks.h>

 #define main discard_main_2