cpplib.h (CPP_AT_NAME, [...]): New token types.
* cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types. (struct cpp_options): Add narrow_charset, wide_charset, bytes_big_endian fields. Remove EBCDIC field. (cpp_init_iconv, cpp_interpret_string): New external interfaces. * cpphash.h: Include <iconv.h> if we have it, otherwise provide a dummy definition of iconv_t. (struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields. (_cpp_valid_ucn): Update prototype. (_cpp_destroy_iconv): New prototype. * doc/cpp.texi: Document character set handling. * doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=. * doc/extend.texi: Delete entire section on multiline strings. Rewrite section on __FUNCTION__ etc now that these are variables in C. * cppucnid.tab, cppucnid.pl: New files. * cppucnid.h: New generated file. * cppcharset.c: Include cppucnid.h. Lots of commentary added. (iconv_open, iconv, iconv_close): Provide dummy definitions if !HAVE_ICONV. (SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv, _cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn, emit_numeric_escape, convert_hex, convert_oct, convert_escape, cpp_interpret_string, narrow_str_to_charconst, wide_str_to_charconst): New. (ucn_valid_in_identifier): Use a binary search through the ucnranges table defined in cppucnid.h, not a long chain of if statements. (_cpp_valid_ucn): Add a limit pointer. Downgrade "universal character names are only valid in C++ and C99" to a warning. Issue the "meaning of \[uU] is different in traditional C" warning here. Take care not to let iconv see an invalid UCS value if we get a malformed UCN. Issue an error if we don't have iconv. (cpp_interpret_charconst): Moved here from cpplex.c. Use cpp_interpret_string to do the heavy lifting. * cppinit.c (cpp_create_reader): Initialize bytes_big_endian, narrow_charset, wide_charset fields of options structure. (cpp_destroy): Call _cpp_destroy_iconv. * cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn. (maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete. (cpp_interpret_charconst): Moved to cppcharset.c. * cpplib.c (dequote_string): Delete. (interpret_string_notranslate): New. (do_line, do_linemarker): Use interpret_string_notranslate. * Makefile.in (cppcharset.o): Depend on cppucnid.h. * c-common.c (fname_string, combine_strings): Delete. * c-common.h (fname_string, combine_strings): Delete prototypes. * c-lex.c (ignore_escape_flag): Delete. (cb_ident): Use cpp_interpret_string, not lex_string. (get_nonpadding_token): New function. (c_lex): Handle Objective-C @-prefixed identifiers and strings here. Adjust calls to lex_string. Don't write *value twice. (lex_string): Now handles string constant concatenation. Most of the work handed off to cpp_interpret_string. Call fix_string_type here. * c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with FUNC_NAME, throughout. (OBJC_STRING): New token type. (primary:STRING): No need to call fix_string_type here. (primary:objc_string): Make that OBJC_STRING. (objc_string nonterminal): Delete. (yylexname): Delete code to handle fake string constants. (yylexstring): Delete entirely. (_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need to handle CPP_ATSIGN. * c.opt (-fexec-charset=, -fwide-exec-charset=): New options. * c-opts.c (missing_arg, c_common_handle_option): Handle OPT_fexec_charset_ and OPT_fwide_exec_charset_. (c_common_init): Set cpp_opts->bytes_big_endian, not cpp_opts->EBCDIC. Call cpp_init_iconv. (print_help): Document -fexec-charset= and -fexec-wide-charset=. (TARGET_EBCDIC): Delete default definition. * objc/objc-act.c (build_objc_string_object): No need to handle string constant concatenation. cp: * parser.c (cp_lexer_read_token): No need to handle string constant concatenation. testsuite: * gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c everywhere. * gcc.dg/concat.c: Concatenation of string constants with __FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error. * gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp. * gcc.dg/cpp/escape-2.c: Use wide character constants where necessary to avoid multi-character character constant warning. * gcc.dg/cpp/escape.c: Likewise. * gcc.dg/cpp/ucs.c: Likewise. Remove backslashes from dg-bogus comments, as they confuse Tcl. Fix a typo. libstdc++-v3: * testsuite/22_locale/collate/compare/wchar_t/2.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/hash/wchar_t/2.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/transform/wchar_t/2.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc: XFAIL on all targets. From-SVN: r68952
This commit is contained in:
parent
61aeb06fe5
commit
e6cc3a24c2
40 changed files with 2208 additions and 1441 deletions
|
@ -1,3 +1,88 @@
|
|||
2003-07-04 Zack Weinberg <zack@codesourcery.com>
|
||||
|
||||
* cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types.
|
||||
(struct cpp_options): Add narrow_charset, wide_charset,
|
||||
bytes_big_endian fields. Remove EBCDIC field.
|
||||
(cpp_init_iconv, cpp_interpret_string): New external interfaces.
|
||||
|
||||
* cpphash.h: Include <iconv.h> if we have it, otherwise
|
||||
provide a dummy definition of iconv_t.
|
||||
(struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields.
|
||||
(_cpp_valid_ucn): Update prototype.
|
||||
(_cpp_destroy_iconv): New prototype.
|
||||
|
||||
* doc/cpp.texi: Document character set handling.
|
||||
* doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=.
|
||||
* doc/extend.texi: Delete entire section on multiline strings.
|
||||
Rewrite section on __FUNCTION__ etc now that these are
|
||||
variables in C.
|
||||
|
||||
* cppucnid.tab, cppucnid.pl: New files.
|
||||
* cppucnid.h: New generated file.
|
||||
* cppcharset.c: Include cppucnid.h. Lots of commentary added.
|
||||
(iconv_open, iconv, iconv_close): Provide dummy definitions
|
||||
if !HAVE_ICONV.
|
||||
(SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv,
|
||||
_cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn,
|
||||
emit_numeric_escape, convert_hex, convert_oct, convert_escape,
|
||||
cpp_interpret_string, narrow_str_to_charconst,
|
||||
wide_str_to_charconst): New.
|
||||
(ucn_valid_in_identifier): Use a binary search through the
|
||||
ucnranges table defined in cppucnid.h, not a long chain of if
|
||||
statements.
|
||||
(_cpp_valid_ucn): Add a limit pointer. Downgrade "universal
|
||||
character names are only valid in C++ and C99" to a warning.
|
||||
Issue the "meaning of \[uU] is different in traditional C"
|
||||
warning here. Take care not to let iconv see an invalid UCS
|
||||
value if we get a malformed UCN. Issue an error if we don't
|
||||
have iconv.
|
||||
(cpp_interpret_charconst): Moved here from cpplex.c. Use
|
||||
cpp_interpret_string to do the heavy lifting.
|
||||
|
||||
* cppinit.c (cpp_create_reader): Initialize bytes_big_endian,
|
||||
narrow_charset, wide_charset fields of options structure.
|
||||
(cpp_destroy): Call _cpp_destroy_iconv.
|
||||
* cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn.
|
||||
(maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete.
|
||||
(cpp_interpret_charconst): Moved to cppcharset.c.
|
||||
* cpplib.c (dequote_string): Delete.
|
||||
(interpret_string_notranslate): New.
|
||||
(do_line, do_linemarker): Use interpret_string_notranslate.
|
||||
|
||||
* Makefile.in (cppcharset.o): Depend on cppucnid.h.
|
||||
|
||||
* c-common.c (fname_string, combine_strings): Delete.
|
||||
* c-common.h (fname_string, combine_strings): Delete prototypes.
|
||||
* c-lex.c (ignore_escape_flag): Delete.
|
||||
(cb_ident): Use cpp_interpret_string, not lex_string.
|
||||
(get_nonpadding_token): New function.
|
||||
(c_lex): Handle Objective-C @-prefixed identifiers and strings here.
|
||||
Adjust calls to lex_string. Don't write *value twice.
|
||||
(lex_string): Now handles string constant concatenation.
|
||||
Most of the work handed off to cpp_interpret_string.
|
||||
Call fix_string_type here.
|
||||
* c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with
|
||||
FUNC_NAME, throughout.
|
||||
(OBJC_STRING): New token type.
|
||||
(primary:STRING): No need to call fix_string_type here.
|
||||
(primary:objc_string): Make that OBJC_STRING.
|
||||
(objc_string nonterminal): Delete.
|
||||
(yylexname): Delete code to handle fake string constants.
|
||||
(yylexstring): Delete entirely.
|
||||
(_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need
|
||||
to handle CPP_ATSIGN.
|
||||
|
||||
* c.opt (-fexec-charset=, -fwide-exec-charset=): New options.
|
||||
* c-opts.c (missing_arg, c_common_handle_option): Handle
|
||||
OPT_fexec_charset_ and OPT_fwide_exec_charset_.
|
||||
(c_common_init): Set cpp_opts->bytes_big_endian, not
|
||||
cpp_opts->EBCDIC. Call cpp_init_iconv.
|
||||
(print_help): Document -fexec-charset= and -fexec-wide-charset=.
|
||||
(TARGET_EBCDIC): Delete default definition.
|
||||
|
||||
* objc/objc-act.c (build_objc_string_object): No need to
|
||||
handle string constant concatenation.
|
||||
|
||||
2003-07-04 Kazu Hirata <kazu@cs.umass.edu>
|
||||
|
||||
* doc/install.texi: Fix typos.
|
||||
|
|
|
@ -2351,7 +2351,7 @@ libcpp.a: $(LIBCPP_OBJS)
|
|||
$(AR) $(AR_FLAGS) libcpp.a $(LIBCPP_OBJS)
|
||||
-$(RANLIB) libcpp.a
|
||||
|
||||
cppcharset.o: cppcharset.c $(LIBCPP_DEPS)
|
||||
cppcharset.o: cppcharset.c $(LIBCPP_DEPS) cppucnid.h
|
||||
cpperror.o: cpperror.c $(LIBCPP_DEPS)
|
||||
cppexp.o: cppexp.c $(LIBCPP_DEPS)
|
||||
cpplex.o: cpplex.c $(LIBCPP_DEPS)
|
||||
|
|
119
gcc/c-common.c
119
gcc/c-common.c
|
@ -1084,20 +1084,6 @@ fname_as_string (int pretty_p)
|
|||
return name;
|
||||
}
|
||||
|
||||
/* Return the text name of the current function, formatted as
|
||||
required by the supplied RID value. */
|
||||
|
||||
const char *
|
||||
fname_string (unsigned int rid)
|
||||
{
|
||||
unsigned ix;
|
||||
|
||||
for (ix = 0; fname_vars[ix].decl; ix++)
|
||||
if (fname_vars[ix].rid == rid)
|
||||
break;
|
||||
return fname_as_string (fname_vars[ix].pretty);
|
||||
}
|
||||
|
||||
/* Return the VAR_DECL for a const char array naming the current
|
||||
function. If the VAR_DECL has not yet been created, create it
|
||||
now. RID indicates how it should be formatted and IDENTIFIER_NODE
|
||||
|
@ -1190,111 +1176,6 @@ fix_string_type (tree value)
|
|||
TREE_STATIC (value) = 1;
|
||||
return value;
|
||||
}
|
||||
|
||||
/* Given a VARRAY of STRING_CST nodes, concatenate them into one
|
||||
STRING_CST. */
|
||||
|
||||
tree
|
||||
combine_strings (varray_type strings)
|
||||
{
|
||||
const int wchar_bytes = TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT;
|
||||
const int nstrings = VARRAY_ACTIVE_SIZE (strings);
|
||||
tree value, t;
|
||||
int length = 1;
|
||||
int wide_length = 0;
|
||||
int wide_flag = 0;
|
||||
int i;
|
||||
char *p, *q;
|
||||
|
||||
/* Don't include the \0 at the end of each substring. Count wide
|
||||
strings and ordinary strings separately. */
|
||||
for (i = 0; i < nstrings; ++i)
|
||||
{
|
||||
t = VARRAY_TREE (strings, i);
|
||||
|
||||
if (TREE_TYPE (t) == wchar_array_type_node)
|
||||
{
|
||||
wide_length += TREE_STRING_LENGTH (t) - wchar_bytes;
|
||||
wide_flag = 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
length += (TREE_STRING_LENGTH (t) - 1);
|
||||
if (C_ARTIFICIAL_STRING_P (t) && !in_system_header)
|
||||
warning ("concatenation of string literals with __FUNCTION__ is deprecated");
|
||||
}
|
||||
}
|
||||
|
||||
/* If anything is wide, the non-wides will be converted,
|
||||
which makes them take more space. */
|
||||
if (wide_flag)
|
||||
length = length * wchar_bytes + wide_length;
|
||||
|
||||
p = xmalloc (length);
|
||||
|
||||
/* Copy the individual strings into the new combined string.
|
||||
If the combined string is wide, convert the chars to ints
|
||||
for any individual strings that are not wide. */
|
||||
|
||||
q = p;
|
||||
for (i = 0; i < nstrings; ++i)
|
||||
{
|
||||
int len, this_wide;
|
||||
|
||||
t = VARRAY_TREE (strings, i);
|
||||
this_wide = TREE_TYPE (t) == wchar_array_type_node;
|
||||
len = TREE_STRING_LENGTH (t) - (this_wide ? wchar_bytes : 1);
|
||||
if (this_wide == wide_flag)
|
||||
{
|
||||
memcpy (q, TREE_STRING_POINTER (t), len);
|
||||
q += len;
|
||||
}
|
||||
else
|
||||
{
|
||||
const int nzeros = (TYPE_PRECISION (wchar_type_node)
|
||||
/ BITS_PER_UNIT) - 1;
|
||||
int j, k;
|
||||
|
||||
if (BYTES_BIG_ENDIAN)
|
||||
{
|
||||
for (k = 0; k < len; k++)
|
||||
{
|
||||
for (j = 0; j < nzeros; j++)
|
||||
*q++ = 0;
|
||||
*q++ = TREE_STRING_POINTER (t)[k];
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
for (k = 0; k < len; k++)
|
||||
{
|
||||
*q++ = TREE_STRING_POINTER (t)[k];
|
||||
for (j = 0; j < nzeros; j++)
|
||||
*q++ = 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Nul terminate the string. */
|
||||
if (wide_flag)
|
||||
{
|
||||
for (i = 0; i < wchar_bytes; i++)
|
||||
*q++ = 0;
|
||||
}
|
||||
else
|
||||
*q = 0;
|
||||
|
||||
value = build_string (length, p);
|
||||
free (p);
|
||||
|
||||
if (wide_flag)
|
||||
TREE_TYPE (value) = wchar_array_type_node;
|
||||
else
|
||||
TREE_TYPE (value) = char_array_type_node;
|
||||
|
||||
return value;
|
||||
}
|
||||
|
||||
static int is_valid_printf_arglist (tree);
|
||||
static rtx c_expand_builtin (tree, rtx, enum machine_mode,
|
||||
|
|
|
@ -883,7 +883,6 @@ extern void start_fname_decls (void);
|
|||
extern void finish_fname_decls (void);
|
||||
extern const char *fname_as_string (int);
|
||||
extern tree fname_decl (unsigned, tree);
|
||||
extern const char *fname_string (unsigned);
|
||||
|
||||
extern void check_function_arguments (tree, tree);
|
||||
extern void check_function_arguments_recurse (void (*)
|
||||
|
@ -922,7 +921,6 @@ extern void c_expand_end_cond (void);
|
|||
extern tree check_case_value (tree);
|
||||
extern tree fix_string_type (tree);
|
||||
struct varray_head_tag;
|
||||
extern tree combine_strings (struct varray_head_tag *);
|
||||
extern void constant_expression_warning (tree);
|
||||
extern tree convert_and_check (tree, tree);
|
||||
extern void overflow_warning (tree);
|
||||
|
|
209
gcc/c-lex.c
209
gcc/c-lex.c
|
@ -61,16 +61,13 @@ static splay_tree file_info_tree;
|
|||
int pending_lang_change; /* If we need to switch languages - C++ only */
|
||||
int c_header_level; /* depth in C headers - C++ only */
|
||||
|
||||
/* Nonzero tells yylex to ignore \ in string constants. */
|
||||
static int ignore_escape_flag;
|
||||
|
||||
static tree interpret_integer (const cpp_token *, unsigned int);
|
||||
static tree interpret_float (const cpp_token *, unsigned int);
|
||||
static enum integer_type_kind
|
||||
narrowest_unsigned_type (tree, unsigned int);
|
||||
static enum integer_type_kind
|
||||
narrowest_signed_type (tree, unsigned int);
|
||||
static tree lex_string (const cpp_string *);
|
||||
static enum cpp_ttype lex_string (const cpp_token *, tree *, bool);
|
||||
static tree lex_charconst (const cpp_token *);
|
||||
static void update_header_times (const char *);
|
||||
static int dump_one_header (splay_tree_node, void *);
|
||||
|
@ -184,8 +181,12 @@ cb_ident (cpp_reader *pfile ATTRIBUTE_UNUSED,
|
|||
if (! flag_no_ident)
|
||||
{
|
||||
/* Convert escapes in the string. */
|
||||
tree value ATTRIBUTE_UNUSED = lex_string (str);
|
||||
ASM_OUTPUT_IDENT (asm_out_file, TREE_STRING_POINTER (value));
|
||||
cpp_string cstr = { 0, 0 };
|
||||
if (cpp_interpret_string (pfile, str, 1, &cstr, false))
|
||||
{
|
||||
ASM_OUTPUT_IDENT (asm_out_file, cstr.text);
|
||||
free ((void *)cstr.text);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
@ -296,12 +297,10 @@ cb_undef (cpp_reader *pfile ATTRIBUTE_UNUSED, unsigned int line,
|
|||
(const char *) NODE_NAME (node));
|
||||
}
|
||||
|
||||
int
|
||||
c_lex (tree *value)
|
||||
static inline const cpp_token *
|
||||
get_nonpadding_token (void)
|
||||
{
|
||||
const cpp_token *tok;
|
||||
|
||||
retry:
|
||||
timevar_push (TV_CPP);
|
||||
do
|
||||
tok = cpp_get_token (parse_in);
|
||||
|
@ -310,10 +309,22 @@ c_lex (tree *value)
|
|||
|
||||
/* The C++ front end does horrible things with the current line
|
||||
number. To ensure an accurate line number, we must reset it
|
||||
every time we return a token. */
|
||||
every time we advance a token. */
|
||||
input_line = src_lineno;
|
||||
|
||||
*value = NULL_TREE;
|
||||
return tok;
|
||||
}
|
||||
|
||||
int
|
||||
c_lex (tree *value)
|
||||
{
|
||||
const cpp_token *tok;
|
||||
location_t atloc;
|
||||
|
||||
retry:
|
||||
tok = get_nonpadding_token ();
|
||||
|
||||
retry_after_at:
|
||||
switch (tok->type)
|
||||
{
|
||||
case CPP_NAME:
|
||||
|
@ -345,6 +356,37 @@ c_lex (tree *value)
|
|||
}
|
||||
break;
|
||||
|
||||
case CPP_ATSIGN:
|
||||
/* An @ may give the next token special significance in Objective-C. */
|
||||
atloc = input_location;
|
||||
tok = get_nonpadding_token ();
|
||||
if (c_dialect_objc ())
|
||||
{
|
||||
tree val;
|
||||
switch (tok->type)
|
||||
{
|
||||
case CPP_NAME:
|
||||
val = HT_IDENT_TO_GCC_IDENT (HT_NODE (tok->val.node));
|
||||
if (C_IS_RESERVED_WORD (val)
|
||||
&& OBJC_IS_AT_KEYWORD (C_RID_CODE (val)))
|
||||
{
|
||||
*value = val;
|
||||
return CPP_AT_NAME;
|
||||
}
|
||||
break;
|
||||
|
||||
case CPP_STRING:
|
||||
case CPP_WSTRING:
|
||||
return lex_string (tok, value, true);
|
||||
|
||||
default: break;
|
||||
}
|
||||
}
|
||||
|
||||
/* ... or not. */
|
||||
error ("%Hstray '@' in program", &atloc);
|
||||
goto retry_after_at;
|
||||
|
||||
case CPP_OTHER:
|
||||
{
|
||||
cppchar_t c = tok->val.str.text[0];
|
||||
|
@ -365,7 +407,7 @@ c_lex (tree *value)
|
|||
|
||||
case CPP_STRING:
|
||||
case CPP_WSTRING:
|
||||
*value = lex_string (&tok->val.str);
|
||||
return lex_string (tok, value, false);
|
||||
break;
|
||||
|
||||
/* These tokens should not be visible outside cpplib. */
|
||||
|
@ -374,7 +416,9 @@ c_lex (tree *value)
|
|||
case CPP_MACRO_ARG:
|
||||
abort ();
|
||||
|
||||
default: break;
|
||||
default:
|
||||
*value = NULL_TREE;
|
||||
break;
|
||||
}
|
||||
|
||||
return tok->type;
|
||||
|
@ -571,75 +615,100 @@ interpret_float (const cpp_token *token, unsigned int flags)
|
|||
return value;
|
||||
}
|
||||
|
||||
static tree
|
||||
lex_string (const cpp_string *str)
|
||||
/* Convert a series of STRING and/or WSTRING tokens into a tree,
|
||||
performing string constant concatenation. TOK is the first of
|
||||
these. VALP is the location to write the string into. OBJC_STRING
|
||||
indicates whether an '@' token preceded the incoming token.
|
||||
Returns the CPP token type of the result (CPP_STRING, CPP_WSTRING,
|
||||
or CPP_OBJC_STRING).
|
||||
|
||||
This is unfortunately more work than it should be. If any of the
|
||||
strings in the series has an L prefix, the result is a wide string
|
||||
(6.4.5p4). Whether or not the result is a wide string affects the
|
||||
meaning of octal and hexadecimal escapes (6.4.4.4p6,9). But escape
|
||||
sequences do not continue across the boundary between two strings in
|
||||
a series (6.4.5p7), so we must not lose the boundaries. Therefore
|
||||
cpp_interpret_string takes a vector of cpp_string structures, which
|
||||
we must arrange to provide. */
|
||||
|
||||
static enum cpp_ttype
|
||||
lex_string (const cpp_token *tok, tree *valp, bool objc_string)
|
||||
{
|
||||
bool wide;
|
||||
tree value;
|
||||
char *buf, *q;
|
||||
cppchar_t c;
|
||||
const unsigned char *p, *limit;
|
||||
bool wide = false;
|
||||
size_t count = 1;
|
||||
struct obstack str_ob;
|
||||
cpp_string istr;
|
||||
|
||||
wide = str->text[0] == 'L';
|
||||
p = str->text + 1 + wide;
|
||||
limit = str->text + str->len - 1;
|
||||
q = buf = alloca ((str->len + 1) * (wide ? WCHAR_BYTES : 1));
|
||||
/* Try to avoid the overhead of creating and destroying an obstack
|
||||
for the common case of just one string. */
|
||||
cpp_string str = tok->val.str;
|
||||
cpp_string *strs = &str;
|
||||
|
||||
while (p < limit)
|
||||
if (tok->type == CPP_WSTRING)
|
||||
wide = true;
|
||||
|
||||
tok = get_nonpadding_token ();
|
||||
if (c_dialect_objc () && tok->type == CPP_ATSIGN)
|
||||
{
|
||||
c = *p++;
|
||||
objc_string = true;
|
||||
tok = get_nonpadding_token ();
|
||||
}
|
||||
if (tok->type == CPP_STRING || tok->type == CPP_WSTRING)
|
||||
{
|
||||
gcc_obstack_init (&str_ob);
|
||||
obstack_grow (&str_ob, &str, sizeof (cpp_string));
|
||||
|
||||
if (c == '\\' && !ignore_escape_flag)
|
||||
c = cpp_parse_escape (parse_in, &p, limit, wide);
|
||||
do
|
||||
{
|
||||
count++;
|
||||
if (tok->type == CPP_WSTRING)
|
||||
wide = true;
|
||||
obstack_grow (&str_ob, &tok->val.str, sizeof (cpp_string));
|
||||
|
||||
/* Add this single character into the buffer either as a wchar_t,
|
||||
a multibyte sequence, or as a single byte. */
|
||||
tok = get_nonpadding_token ();
|
||||
if (c_dialect_objc () && tok->type == CPP_ATSIGN)
|
||||
{
|
||||
objc_string = true;
|
||||
tok = get_nonpadding_token ();
|
||||
}
|
||||
}
|
||||
while (tok->type == CPP_STRING || tok->type == CPP_WSTRING);
|
||||
strs = obstack_finish (&str_ob);
|
||||
}
|
||||
|
||||
/* We have read one more token than we want. */
|
||||
_cpp_backup_tokens (parse_in, 1);
|
||||
|
||||
if (count > 1 && !objc_string && warn_traditional && !in_system_header)
|
||||
warning ("traditional C rejects string constant concatenation");
|
||||
|
||||
if (cpp_interpret_string (parse_in, strs, count, &istr, wide))
|
||||
{
|
||||
value = build_string (istr.len, (char *)istr.text);
|
||||
free ((void *)istr.text);
|
||||
}
|
||||
else
|
||||
{
|
||||
/* Callers cannot generally handle error_mark_node in this context,
|
||||
so return the empty string instead. cpp_interpret_string has
|
||||
issued an error. */
|
||||
if (wide)
|
||||
{
|
||||
unsigned charwidth = TYPE_PRECISION (char_type_node);
|
||||
unsigned bytemask = (1 << charwidth) - 1;
|
||||
int byte;
|
||||
|
||||
for (byte = 0; byte < WCHAR_BYTES; ++byte)
|
||||
{
|
||||
int n;
|
||||
if (byte >= (int) sizeof (c))
|
||||
n = 0;
|
||||
value = build_string (TYPE_PRECISION (wchar_type_node)
|
||||
/ TYPE_PRECISION (char_type_node),
|
||||
"\0\0\0"); /* widest supported wchar_t
|
||||
is 32 bits */
|
||||
else
|
||||
n = (c >> (byte * charwidth)) & bytemask;
|
||||
if (BYTES_BIG_ENDIAN)
|
||||
q[WCHAR_BYTES - byte - 1] = n;
|
||||
else
|
||||
q[byte] = n;
|
||||
}
|
||||
q += WCHAR_BYTES;
|
||||
}
|
||||
else
|
||||
{
|
||||
*q++ = c;
|
||||
}
|
||||
value = build_string (1, "");
|
||||
}
|
||||
|
||||
/* Terminate the string value, either with a single byte zero
|
||||
or with a wide zero. */
|
||||
TREE_TYPE (value) = wide ? wchar_array_type_node : char_array_type_node;
|
||||
*valp = fix_string_type (value);
|
||||
|
||||
if (wide)
|
||||
{
|
||||
memset (q, 0, WCHAR_BYTES);
|
||||
q += WCHAR_BYTES;
|
||||
}
|
||||
else
|
||||
{
|
||||
*q++ = '\0';
|
||||
}
|
||||
if (strs != &str)
|
||||
obstack_free (&str_ob, 0);
|
||||
|
||||
value = build_string (q - buf, buf);
|
||||
|
||||
if (wide)
|
||||
TREE_TYPE (value) = wchar_array_type_node;
|
||||
else
|
||||
TREE_TYPE (value) = char_array_type_node;
|
||||
return value;
|
||||
return objc_string ? CPP_OBJC_STRING : wide ? CPP_WSTRING : CPP_STRING;
|
||||
}
|
||||
|
||||
/* Converts a (possibly wide) character constant token into a tree. */
|
||||
|
|
26
gcc/c-opts.c
26
gcc/c-opts.c
|
@ -46,10 +46,6 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
|
|||
# define TARGET_SYSTEM_ROOT NULL
|
||||
#endif
|
||||
|
||||
#ifndef TARGET_EBCDIC
|
||||
# define TARGET_EBCDIC 0
|
||||
#endif
|
||||
|
||||
static int saved_lineno;
|
||||
|
||||
/* CPP's options. */
|
||||
|
@ -143,6 +139,8 @@ missing_arg (enum opt_code code)
|
|||
case OPT_fdump_:
|
||||
case OPT_fname_mangling_version_:
|
||||
case OPT_ftabstop_:
|
||||
case OPT_fexec_charset_:
|
||||
case OPT_fwide_exec_charset_:
|
||||
case OPT_ftemplate_depth_:
|
||||
case OPT_iprefix:
|
||||
case OPT_iwithprefix:
|
||||
|
@ -892,6 +890,14 @@ c_common_handle_option (size_t scode, const char *arg, int value)
|
|||
cpp_opts->tabstop = value;
|
||||
break;
|
||||
|
||||
case OPT_fexec_charset_:
|
||||
cpp_opts->narrow_charset = arg;
|
||||
break;
|
||||
|
||||
case OPT_fwide_exec_charset_:
|
||||
cpp_opts->wide_charset = arg;
|
||||
break;
|
||||
|
||||
case OPT_ftemplate_depth_:
|
||||
max_tinst_depth = value;
|
||||
break;
|
||||
|
@ -1145,7 +1151,11 @@ c_common_init (void)
|
|||
cpp_opts->int_precision = TYPE_PRECISION (integer_type_node);
|
||||
cpp_opts->wchar_precision = TYPE_PRECISION (wchar_type_node);
|
||||
cpp_opts->unsigned_wchar = TREE_UNSIGNED (wchar_type_node);
|
||||
cpp_opts->EBCDIC = TARGET_EBCDIC;
|
||||
cpp_opts->bytes_big_endian = BYTES_BIG_ENDIAN;
|
||||
|
||||
/* This can't happen until after wchar_precision and bytes_big_endian
|
||||
are known. */
|
||||
cpp_init_iconv (parse_in);
|
||||
|
||||
if (flag_preprocess_only)
|
||||
{
|
||||
|
@ -1571,6 +1581,12 @@ Switches:\n\
|
|||
fputs (_("\
|
||||
-f[no-]preprocessed Treat the input file as already preprocessed\n\
|
||||
-ftabstop=<number> Distance between tab stops for column reporting\n\
|
||||
-ftarget-charset=<c> Convert all strings and character constants\n\
|
||||
to character set <c>\n\
|
||||
-ftarget-wide-charset=<c> Convert all wide strings and character constants\n\
|
||||
to character set <c>\n\
|
||||
"), stdout);
|
||||
fputs (_("\
|
||||
-isysroot <dir> Set <dir> to be the system root directory\n\
|
||||
-P Do not generate #line directives\n\
|
||||
-remap Remap file names when including files\n\
|
||||
|
|
136
gcc/c-parse.in
136
gcc/c-parse.in
|
@ -151,9 +151,7 @@ do { \
|
|||
%token ATTRIBUTE EXTENSION LABEL
|
||||
%token REALPART IMAGPART VA_ARG CHOOSE_EXPR TYPES_COMPATIBLE_P
|
||||
%token PTR_VALUE PTR_BASE PTR_EXTENT
|
||||
|
||||
/* function name can be a string const or a var decl. */
|
||||
%token STRING_FUNC_NAME VAR_FUNC_NAME
|
||||
%token FUNC_NAME
|
||||
|
||||
/* Add precedence rules to solve dangling else s/r conflict */
|
||||
%nonassoc IF
|
||||
|
@ -183,6 +181,7 @@ do { \
|
|||
Objective C, so that the token codes are the same in both. */
|
||||
%token INTERFACE IMPLEMENTATION END SELECTOR DEFS ENCODE
|
||||
%token CLASSNAME PUBLIC PRIVATE PROTECTED PROTOCOL OBJECTNAME CLASS ALIAS
|
||||
%token OBJC_STRING
|
||||
|
||||
%type <code> unop
|
||||
%type <ttype> ENUM STRUCT UNION IF ELSE WHILE DO FOR SWITCH CASE DEFAULT
|
||||
|
@ -249,9 +248,9 @@ ifobjc
|
|||
%type <ttype> keywordexpr keywordarglist keywordarg
|
||||
%type <ttype> myparms myparm optparmlist reservedwords objcselectorexpr
|
||||
%type <ttype> selectorarg keywordnamelist keywordname objcencodeexpr
|
||||
%type <ttype> objc_string non_empty_protocolrefs protocolrefs identifier_list objcprotocolexpr
|
||||
%type <ttype> non_empty_protocolrefs protocolrefs identifier_list objcprotocolexpr
|
||||
|
||||
%type <ttype> CLASSNAME OBJECTNAME
|
||||
%type <ttype> CLASSNAME OBJECTNAME OBJC_STRING
|
||||
end ifobjc
|
||||
|
||||
%{
|
||||
|
@ -340,7 +339,6 @@ static bool parsing_iso_function_signature;
|
|||
static void yyprint PARAMS ((FILE *, int, YYSTYPE));
|
||||
static void yyerror PARAMS ((const char *));
|
||||
static int yylexname PARAMS ((void));
|
||||
static int yylexstring PARAMS ((void));
|
||||
static inline int _yylex PARAMS ((void));
|
||||
static int yylex PARAMS ((void));
|
||||
static void init_reswords PARAMS ((void));
|
||||
|
@ -657,8 +655,7 @@ primary:
|
|||
}
|
||||
| CONSTANT
|
||||
| STRING
|
||||
{ $$ = fix_string_type ($$); }
|
||||
| VAR_FUNC_NAME
|
||||
| FUNC_NAME
|
||||
{ $$ = fname_decl (C_RID_CODE ($$), $$); }
|
||||
| '(' typename ')' '{'
|
||||
{ start_init (NULL_TREE, NULL, 0);
|
||||
|
@ -763,22 +760,11 @@ ifobjc
|
|||
{ $$ = build_protocol_expr ($1); }
|
||||
| objcencodeexpr
|
||||
{ $$ = build_encode_expr ($1); }
|
||||
| objc_string
|
||||
| OBJC_STRING
|
||||
{ $$ = build_objc_string_object ($1); }
|
||||
end ifobjc
|
||||
;
|
||||
|
||||
ifobjc
|
||||
/* Produces an STRING_CST with perhaps more STRING_CSTs chained
|
||||
onto it, which is to be read as an ObjC string object. */
|
||||
objc_string:
|
||||
'@' STRING
|
||||
{ $$ = $2; }
|
||||
| objc_string '@' STRING
|
||||
{ $$ = chainon ($1, $3); }
|
||||
;
|
||||
end ifobjc
|
||||
|
||||
old_style_parm_decls:
|
||||
old_style_parm_decls_1
|
||||
{
|
||||
|
@ -3494,9 +3480,9 @@ static const short rid_to_yy[RID_MAX] =
|
|||
/* RID_CHOOSE_EXPR */ CHOOSE_EXPR,
|
||||
/* RID_TYPES_COMPATIBLE_P */ TYPES_COMPATIBLE_P,
|
||||
|
||||
/* RID_FUNCTION_NAME */ STRING_FUNC_NAME,
|
||||
/* RID_PRETTY_FUNCTION_NAME */ STRING_FUNC_NAME,
|
||||
/* RID_C99_FUNCTION_NAME */ VAR_FUNC_NAME,
|
||||
/* RID_FUNCTION_NAME */ FUNC_NAME,
|
||||
/* RID_PRETTY_FUNCTION_NAME */ FUNC_NAME,
|
||||
/* RID_C99_FUNCTION_NAME */ FUNC_NAME,
|
||||
|
||||
/* C++ */
|
||||
/* RID_BOOL */ TYPESPEC,
|
||||
|
@ -3627,22 +3613,9 @@ ifobjc
|
|||
&& (!OBJC_IS_PQ_KEYWORD (rid_code) || objc_pq_context))
|
||||
end ifobjc
|
||||
{
|
||||
int yycode = rid_to_yy[(int) rid_code];
|
||||
if (yycode == STRING_FUNC_NAME)
|
||||
{
|
||||
/* __FUNCTION__ and __PRETTY_FUNCTION__ get converted
|
||||
to string constants. */
|
||||
const char *name = fname_string (rid_code);
|
||||
|
||||
yylval.ttype = build_string (strlen (name) + 1, name);
|
||||
C_ARTIFICIAL_STRING_P (yylval.ttype) = 1;
|
||||
last_token = CPP_STRING; /* so yyerror won't choke */
|
||||
return STRING;
|
||||
}
|
||||
|
||||
/* Return the canonical spelling for this keyword. */
|
||||
yylval.ttype = ridpointers[(int) rid_code];
|
||||
return yycode;
|
||||
return rid_to_yy[(int) rid_code];
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -3671,57 +3644,6 @@ end ifobjc
|
|||
return IDENTIFIER;
|
||||
}
|
||||
|
||||
/* Concatenate strings before returning them to the parser. This isn't quite
|
||||
as good as having it done in the lexer, but it's better than nothing. */
|
||||
|
||||
static int
|
||||
yylexstring ()
|
||||
{
|
||||
enum cpp_ttype next_type;
|
||||
tree orig = yylval.ttype;
|
||||
|
||||
next_type = c_lex (&yylval.ttype);
|
||||
if (next_type == CPP_STRING
|
||||
|| next_type == CPP_WSTRING
|
||||
|| (next_type == CPP_NAME && yylexname () == STRING))
|
||||
{
|
||||
varray_type strings;
|
||||
|
||||
ifc
|
||||
static location_t last_location;
|
||||
if (warn_traditional && !in_system_header
|
||||
&& (input_location.line != last_location.line
|
||||
|| !last_location.file ||
|
||||
strcmp (last_location.file, input_location.file)))
|
||||
{
|
||||
warning ("traditional C rejects string concatenation");
|
||||
last_location = input_location;
|
||||
}
|
||||
end ifc
|
||||
|
||||
VARRAY_TREE_INIT (strings, 32, "strings");
|
||||
VARRAY_PUSH_TREE (strings, orig);
|
||||
|
||||
do
|
||||
{
|
||||
VARRAY_PUSH_TREE (strings, yylval.ttype);
|
||||
next_type = c_lex (&yylval.ttype);
|
||||
}
|
||||
while (next_type == CPP_STRING
|
||||
|| next_type == CPP_WSTRING
|
||||
|| (next_type == CPP_NAME && yylexname () == STRING));
|
||||
|
||||
yylval.ttype = combine_strings (strings);
|
||||
}
|
||||
else
|
||||
yylval.ttype = orig;
|
||||
|
||||
/* We will have always read one token too many. */
|
||||
_cpp_backup_tokens (parse_in, 1);
|
||||
|
||||
return STRING;
|
||||
}
|
||||
|
||||
static inline int
|
||||
_yylex ()
|
||||
{
|
||||
|
@ -3787,13 +3709,11 @@ _yylex ()
|
|||
return 0;
|
||||
|
||||
case CPP_NAME:
|
||||
{
|
||||
int ret = yylexname ();
|
||||
if (ret == STRING)
|
||||
return yylexstring ();
|
||||
else
|
||||
return ret;
|
||||
}
|
||||
return yylexname ();
|
||||
|
||||
case CPP_AT_NAME:
|
||||
/* This only happens in Objective-C; it must be a keyword. */
|
||||
return rid_to_yy [(int) C_RID_CODE (yylval.ttype)];
|
||||
|
||||
case CPP_NUMBER:
|
||||
case CPP_CHAR:
|
||||
|
@ -3802,30 +3722,10 @@ _yylex ()
|
|||
|
||||
case CPP_STRING:
|
||||
case CPP_WSTRING:
|
||||
return yylexstring ();
|
||||
return STRING;
|
||||
|
||||
/* This token is Objective-C specific. It gives the next token
|
||||
special significance. */
|
||||
case CPP_ATSIGN:
|
||||
ifobjc
|
||||
{
|
||||
tree after_at;
|
||||
enum cpp_ttype after_at_type;
|
||||
|
||||
after_at_type = c_lex (&after_at);
|
||||
|
||||
if (after_at_type == CPP_NAME
|
||||
&& C_IS_RESERVED_WORD (after_at)
|
||||
&& OBJC_IS_AT_KEYWORD (C_RID_CODE (after_at)))
|
||||
{
|
||||
yylval.ttype = after_at;
|
||||
last_token = after_at_type;
|
||||
return rid_to_yy [(int) C_RID_CODE (after_at)];
|
||||
}
|
||||
_cpp_backup_tokens (parse_in, 1);
|
||||
return '@';
|
||||
}
|
||||
end ifobjc
|
||||
case CPP_OBJC_STRING:
|
||||
return OBJC_STRING;
|
||||
|
||||
/* These tokens are C++ specific (and will not be generated
|
||||
in C mode, but let's be cautious). */
|
||||
|
|
|
@ -368,6 +368,9 @@ C++ ObjC++
|
|||
fenum-int-equiv
|
||||
C++ ObjC++
|
||||
|
||||
fexec-charset=
|
||||
C ObjC C++ ObjC++ Joined RejectNegative
|
||||
|
||||
fexternal-templates
|
||||
C++ ObjC++
|
||||
|
||||
|
@ -509,6 +512,9 @@ C++ ObjC++
|
|||
fweak
|
||||
C++ ObjC++
|
||||
|
||||
fwide-exec-charset=
|
||||
C ObjC C++ ObjC++ Joined RejectNegative
|
||||
|
||||
fxref
|
||||
C++ ObjC++
|
||||
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
2003-07-04 Zack Weinberg <zack@codesourcery.com>
|
||||
|
||||
* parser.c (cp_lexer_read_token): No need to handle string
|
||||
constant concatenation.
|
||||
|
||||
2003-07-03 Kaveh R. Ghazi <ghazi@caip.rutgers.edu>
|
||||
|
||||
* cp-tree.h (GCC_DIAG_STYLE, ATTRIBUTE_GCC_CXXDIAG): Define.
|
||||
|
|
|
@ -479,54 +479,12 @@ cp_lexer_read_token (cp_lexer* lexer)
|
|||
/* Increment LAST_TOKEN. */
|
||||
lexer->last_token = cp_lexer_next_token (lexer, token);
|
||||
|
||||
/* The preprocessor does not yet do translation phase six, i.e., the
|
||||
combination of adjacent string literals. Therefore, we do it
|
||||
here. */
|
||||
if (token->type == CPP_STRING || token->type == CPP_WSTRING)
|
||||
{
|
||||
ptrdiff_t delta;
|
||||
int i;
|
||||
|
||||
/* When we grow the buffer, we may invalidate TOKEN. So, save
|
||||
the distance from the beginning of the BUFFER so that we can
|
||||
recaulate it. */
|
||||
delta = cp_lexer_token_difference (lexer, lexer->buffer, token);
|
||||
/* Make sure there is room in the buffer for another token. */
|
||||
cp_lexer_maybe_grow_buffer (lexer);
|
||||
/* Restore TOKEN. */
|
||||
token = lexer->buffer;
|
||||
for (i = 0; i < delta; ++i)
|
||||
token = cp_lexer_next_token (lexer, token);
|
||||
|
||||
VARRAY_PUSH_TREE (lexer->string_tokens, token->value);
|
||||
while (true)
|
||||
{
|
||||
/* Read the token after TOKEN. */
|
||||
cp_lexer_get_preprocessor_token (lexer, lexer->last_token);
|
||||
/* See whether it's another string constant. */
|
||||
if (lexer->last_token->type != token->type)
|
||||
{
|
||||
/* If not, then it will be the next real token. */
|
||||
lexer->last_token = cp_lexer_next_token (lexer,
|
||||
lexer->last_token);
|
||||
break;
|
||||
}
|
||||
|
||||
/* Chain the strings together. */
|
||||
VARRAY_PUSH_TREE (lexer->string_tokens,
|
||||
lexer->last_token->value);
|
||||
}
|
||||
|
||||
/* Create a single STRING_CST. Curiously we have to call
|
||||
combine_strings even if there is only a single string in
|
||||
order to get the type set correctly. */
|
||||
token->value = combine_strings (lexer->string_tokens);
|
||||
VARRAY_CLEAR (lexer->string_tokens);
|
||||
token->value = fix_string_type (token->value);
|
||||
/* Strings should have type `const char []'. Right now, we will
|
||||
have an ARRAY_TYPE that is constant rather than an array of
|
||||
constant elements. */
|
||||
if (flag_const_strings)
|
||||
constant elements.
|
||||
FIXME: Make fix_string_type get this right in the first place. */
|
||||
if ((token->type == CPP_STRING || token->type == CPP_WSTRING)
|
||||
&& flag_const_strings)
|
||||
{
|
||||
tree type;
|
||||
|
||||
|
@ -534,12 +492,10 @@ cp_lexer_read_token (cp_lexer* lexer)
|
|||
type = TREE_TYPE (token->value);
|
||||
/* Use build_cplus_array_type to rebuild the array, thereby
|
||||
getting the right type. */
|
||||
type = build_cplus_array_type (TREE_TYPE (type),
|
||||
TYPE_DOMAIN (type));
|
||||
type = build_cplus_array_type (TREE_TYPE (type), TYPE_DOMAIN (type));
|
||||
/* Reset the type of the token. */
|
||||
TREE_TYPE (token->value) = type;
|
||||
}
|
||||
}
|
||||
|
||||
return token;
|
||||
}
|
||||
|
|
1236
gcc/cppcharset.c
1236
gcc/cppcharset.c
File diff suppressed because it is too large
Load diff
|
@ -25,6 +25,13 @@ Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
|
|||
|
||||
#include "hashtable.h"
|
||||
|
||||
#ifdef HAVE_ICONV
|
||||
#include <iconv.h>
|
||||
#else
|
||||
#define HAVE_ICONV 0
|
||||
typedef int iconv_t; /* dummy */
|
||||
#endif
|
||||
|
||||
struct directive; /* Deliberately incomplete. */
|
||||
struct pending_option;
|
||||
struct op;
|
||||
|
@ -362,6 +369,15 @@ struct cpp_reader
|
|||
unsigned char *macro_buffer;
|
||||
unsigned int macro_buffer_len;
|
||||
|
||||
/* Iconv descriptor for converting from the source character set
|
||||
to the execution character set. (iconv_t)-1 for no conversion. */
|
||||
iconv_t narrow_cset_desc;
|
||||
|
||||
/* Iconv descriptor for converting from the execution character set
|
||||
to the wide execution character set. (iconv_t)-1 for no conversion
|
||||
other than zero-extending each character to the width of wchar_t. */
|
||||
iconv_t wide_cset_desc;
|
||||
|
||||
/* Tree of other included files. See cppfiles.c. */
|
||||
struct splay_tree_s *all_include_files;
|
||||
|
||||
|
@ -539,7 +555,8 @@ extern uchar *_cpp_copy_replacement_text (const cpp_macro *, uchar *);
|
|||
extern size_t _cpp_replacement_text_len (const cpp_macro *);
|
||||
|
||||
/* In cppcharset.c. */
|
||||
cppchar_t _cpp_valid_ucn (cpp_reader *, const uchar **, int identifer_p);
|
||||
cppchar_t _cpp_valid_ucn (cpp_reader *, const uchar **, const uchar *, int);
|
||||
void _cpp_destroy_iconv (cpp_reader *);
|
||||
|
||||
/* Utility routines and macros. */
|
||||
#define DSC(str) (const uchar *)str, sizeof str - 1
|
||||
|
|
|
@ -157,6 +157,11 @@ cpp_create_reader (enum c_lang lang, hash_table *table)
|
|||
CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
|
||||
CPP_OPTION (pfile, unsigned_char) = 0;
|
||||
CPP_OPTION (pfile, unsigned_wchar) = 1;
|
||||
CPP_OPTION (pfile, bytes_big_endian) = 1; /* does not matter */
|
||||
|
||||
/* Default to no charset conversion. */
|
||||
CPP_OPTION (pfile, narrow_charset) = 0;
|
||||
CPP_OPTION (pfile, wide_charset) = 0;
|
||||
|
||||
/* Initialize the line map. Start at logical line 1, so we can use
|
||||
a line number of zero for special states. */
|
||||
|
@ -227,6 +232,7 @@ cpp_destroy (cpp_reader *pfile)
|
|||
|
||||
_cpp_destroy_hashtable (pfile);
|
||||
_cpp_cleanup_includes (pfile);
|
||||
_cpp_destroy_iconv (pfile);
|
||||
|
||||
_cpp_free_buff (pfile->a_buff);
|
||||
_cpp_free_buff (pfile->u_buff);
|
||||
|
|
289
gcc/cpplex.c
289
gcc/cpplex.c
|
@ -64,10 +64,8 @@ static void create_literal (cpp_reader *, cpp_token *, const uchar *,
|
|||
unsigned int, enum cpp_ttype);
|
||||
static bool warn_in_comment (cpp_reader *, _cpp_line_note *);
|
||||
static int name_p (cpp_reader *, const cpp_string *);
|
||||
static cppchar_t maybe_read_ucn (cpp_reader *, const uchar **);
|
||||
static tokenrun *next_tokenrun (tokenrun *);
|
||||
|
||||
static unsigned int hex_digit_value (unsigned int);
|
||||
static _cpp_buff *new_buff (size_t);
|
||||
|
||||
|
||||
|
@ -397,7 +395,7 @@ forms_identifier_p (cpp_reader *pfile, int first)
|
|||
&& (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
|
||||
{
|
||||
buffer->cur += 2;
|
||||
if (_cpp_valid_ucn (pfile, &buffer->cur, 1 + !first))
|
||||
if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first))
|
||||
return true;
|
||||
buffer->cur -= 2;
|
||||
}
|
||||
|
@ -1316,291 +1314,6 @@ cpp_output_line (cpp_reader *pfile, FILE *fp)
|
|||
putc ('\n', fp);
|
||||
}
|
||||
|
||||
/* Returns the value of a hexadecimal digit. */
|
||||
static unsigned int
|
||||
hex_digit_value (unsigned int c)
|
||||
{
|
||||
if (hex_p (c))
|
||||
return hex_value (c);
|
||||
else
|
||||
abort ();
|
||||
}
|
||||
|
||||
/* Read a possible universal character name starting at *PSTR. */
|
||||
static cppchar_t
|
||||
maybe_read_ucn (cpp_reader *pfile, const uchar **pstr)
|
||||
{
|
||||
cppchar_t result, c = (*pstr)[-1];
|
||||
|
||||
result = _cpp_valid_ucn (pfile, pstr, false);
|
||||
if (result)
|
||||
{
|
||||
if (CPP_WTRADITIONAL (pfile))
|
||||
cpp_error (pfile, DL_WARNING,
|
||||
"the meaning of '\\%c' is different in traditional C",
|
||||
(int) c);
|
||||
|
||||
if (CPP_OPTION (pfile, EBCDIC))
|
||||
{
|
||||
cpp_error (pfile, DL_ERROR,
|
||||
"universal character with an EBCDIC target");
|
||||
result = 0x3f; /* EBCDIC invalid character */
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/* Returns the value of an escape sequence, truncated to the correct
|
||||
target precision. PSTR points to the input pointer, which is just
|
||||
after the backslash. LIMIT is how much text we have. WIDE is true
|
||||
if the escape sequence is part of a wide character constant or
|
||||
string literal. Handles all relevant diagnostics. */
|
||||
cppchar_t
|
||||
cpp_parse_escape (cpp_reader *pfile, const unsigned char **pstr,
|
||||
const unsigned char *limit, int wide)
|
||||
{
|
||||
/* Values of \a \b \e \f \n \r \t \v respectively. */
|
||||
static const uchar ascii[] = { 7, 8, 27, 12, 10, 13, 9, 11 };
|
||||
static const uchar ebcdic[] = { 47, 22, 39, 12, 21, 13, 5, 11 };
|
||||
|
||||
int unknown = 0;
|
||||
const unsigned char *str = *pstr, *charconsts;
|
||||
cppchar_t c, ucn, mask;
|
||||
unsigned int width;
|
||||
|
||||
if (CPP_OPTION (pfile, EBCDIC))
|
||||
charconsts = ebcdic;
|
||||
else
|
||||
charconsts = ascii;
|
||||
|
||||
if (wide)
|
||||
width = CPP_OPTION (pfile, wchar_precision);
|
||||
else
|
||||
width = CPP_OPTION (pfile, char_precision);
|
||||
if (width < BITS_PER_CPPCHAR_T)
|
||||
mask = ((cppchar_t) 1 << width) - 1;
|
||||
else
|
||||
mask = ~0;
|
||||
|
||||
c = *str++;
|
||||
switch (c)
|
||||
{
|
||||
case '\\': case '\'': case '"': case '?': break;
|
||||
case 'b': c = charconsts[1]; break;
|
||||
case 'f': c = charconsts[3]; break;
|
||||
case 'n': c = charconsts[4]; break;
|
||||
case 'r': c = charconsts[5]; break;
|
||||
case 't': c = charconsts[6]; break;
|
||||
case 'v': c = charconsts[7]; break;
|
||||
|
||||
case '(': case '{': case '[': case '%':
|
||||
/* '\(', etc, are used at beginning of line to avoid confusing Emacs.
|
||||
'\%' is used to prevent SCCS from getting confused. */
|
||||
unknown = CPP_PEDANTIC (pfile);
|
||||
break;
|
||||
|
||||
case 'a':
|
||||
if (CPP_WTRADITIONAL (pfile))
|
||||
cpp_error (pfile, DL_WARNING,
|
||||
"the meaning of '\\a' is different in traditional C");
|
||||
c = charconsts[0];
|
||||
break;
|
||||
|
||||
case 'e': case 'E':
|
||||
if (CPP_PEDANTIC (pfile))
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"non-ISO-standard escape sequence, '\\%c'", (int) c);
|
||||
c = charconsts[2];
|
||||
break;
|
||||
|
||||
case 'u': case 'U':
|
||||
ucn = maybe_read_ucn (pfile, &str);
|
||||
if (ucn)
|
||||
c = ucn;
|
||||
else
|
||||
unknown = true;
|
||||
break;
|
||||
|
||||
case 'x':
|
||||
if (CPP_WTRADITIONAL (pfile))
|
||||
cpp_error (pfile, DL_WARNING,
|
||||
"the meaning of '\\x' is different in traditional C");
|
||||
|
||||
{
|
||||
cppchar_t i = 0, overflow = 0;
|
||||
int digits_found = 0;
|
||||
|
||||
while (str < limit)
|
||||
{
|
||||
c = *str;
|
||||
if (! ISXDIGIT (c))
|
||||
break;
|
||||
str++;
|
||||
overflow |= i ^ (i << 4 >> 4);
|
||||
i = (i << 4) + hex_digit_value (c);
|
||||
digits_found = 1;
|
||||
}
|
||||
|
||||
if (!digits_found)
|
||||
cpp_error (pfile, DL_ERROR,
|
||||
"\\x used with no following hex digits");
|
||||
|
||||
if (overflow | (i != (i & mask)))
|
||||
{
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"hex escape sequence out of range");
|
||||
i &= mask;
|
||||
}
|
||||
c = i;
|
||||
}
|
||||
break;
|
||||
|
||||
case '0': case '1': case '2': case '3':
|
||||
case '4': case '5': case '6': case '7':
|
||||
{
|
||||
size_t count = 0;
|
||||
cppchar_t i = c - '0';
|
||||
|
||||
while (str < limit && ++count < 3)
|
||||
{
|
||||
c = *str;
|
||||
if (c < '0' || c > '7')
|
||||
break;
|
||||
str++;
|
||||
i = (i << 3) + c - '0';
|
||||
}
|
||||
|
||||
if (i != (i & mask))
|
||||
{
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"octal escape sequence out of range");
|
||||
i &= mask;
|
||||
}
|
||||
c = i;
|
||||
}
|
||||
break;
|
||||
|
||||
default:
|
||||
unknown = 1;
|
||||
break;
|
||||
}
|
||||
|
||||
if (unknown)
|
||||
{
|
||||
if (ISGRAPH (c))
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"unknown escape sequence '\\%c'", (int) c);
|
||||
else
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"unknown escape sequence: '\\%03o'", (int) c);
|
||||
}
|
||||
|
||||
if (c > mask)
|
||||
{
|
||||
cpp_error (pfile, DL_PEDWARN,
|
||||
"escape sequence out of range for its type");
|
||||
c &= mask;
|
||||
}
|
||||
|
||||
*pstr = str;
|
||||
return c;
|
||||
}
|
||||
|
||||
/* Interpret a (possibly wide) character constant in TOKEN.
|
||||
WARN_MULTI warns about multi-character charconsts. PCHARS_SEEN
|
||||
points to a variable that is filled in with the number of
|
||||
characters seen, and UNSIGNEDP to a variable that indicates whether
|
||||
the result has signed type. */
|
||||
cppchar_t
|
||||
cpp_interpret_charconst (cpp_reader *pfile, const cpp_token *token,
|
||||
unsigned int *pchars_seen, int *unsignedp)
|
||||
{
|
||||
const unsigned char *str, *limit;
|
||||
unsigned int chars_seen = 0;
|
||||
size_t width, max_chars;
|
||||
cppchar_t c, mask, result = 0;
|
||||
bool unsigned_p;
|
||||
|
||||
str = token->val.str.text + 1 + (token->type == CPP_WCHAR);
|
||||
limit = token->val.str.text + token->val.str.len - 1;
|
||||
|
||||
if (token->type == CPP_CHAR)
|
||||
{
|
||||
width = CPP_OPTION (pfile, char_precision);
|
||||
max_chars = CPP_OPTION (pfile, int_precision) / width;
|
||||
unsigned_p = CPP_OPTION (pfile, unsigned_char);
|
||||
}
|
||||
else
|
||||
{
|
||||
width = CPP_OPTION (pfile, wchar_precision);
|
||||
max_chars = 1;
|
||||
unsigned_p = CPP_OPTION (pfile, unsigned_wchar);
|
||||
}
|
||||
|
||||
if (width < BITS_PER_CPPCHAR_T)
|
||||
mask = ((cppchar_t) 1 << width) - 1;
|
||||
else
|
||||
mask = ~0;
|
||||
|
||||
while (str < limit)
|
||||
{
|
||||
c = *str++;
|
||||
|
||||
if (c == '\\')
|
||||
c = cpp_parse_escape (pfile, &str, limit, token->type == CPP_WCHAR);
|
||||
|
||||
#ifdef MAP_CHARACTER
|
||||
if (ISPRINT (c))
|
||||
c = MAP_CHARACTER (c);
|
||||
#endif
|
||||
|
||||
chars_seen++;
|
||||
|
||||
/* Truncate the character, scale the result and merge the two. */
|
||||
c &= mask;
|
||||
if (width < BITS_PER_CPPCHAR_T)
|
||||
result = (result << width) | c;
|
||||
else
|
||||
result = c;
|
||||
}
|
||||
|
||||
if (chars_seen == 0)
|
||||
cpp_error (pfile, DL_ERROR, "empty character constant");
|
||||
else if (chars_seen > 1)
|
||||
{
|
||||
/* Multichar charconsts are of type int and therefore signed. */
|
||||
unsigned_p = 0;
|
||||
|
||||
if (chars_seen > max_chars)
|
||||
{
|
||||
chars_seen = max_chars;
|
||||
cpp_error (pfile, DL_WARNING,
|
||||
"character constant too long for its type");
|
||||
}
|
||||
else if (CPP_OPTION (pfile, warn_multichar))
|
||||
cpp_error (pfile, DL_WARNING, "multi-character character constant");
|
||||
}
|
||||
|
||||
/* Sign-extend or truncate the constant to cppchar_t. The value is
|
||||
in WIDTH bits, but for multi-char charconsts it's value is the
|
||||
full target type's width. */
|
||||
if (chars_seen > 1)
|
||||
width *= max_chars;
|
||||
if (width < BITS_PER_CPPCHAR_T)
|
||||
{
|
||||
mask = ((cppchar_t) 1 << width) - 1;
|
||||
if (unsigned_p || !(result & (1 << (width - 1))))
|
||||
result &= mask;
|
||||
else
|
||||
result |= ~mask;
|
||||
}
|
||||
|
||||
*pchars_seen = chars_seen;
|
||||
*unsignedp = unsigned_p;
|
||||
return result;
|
||||
}
|
||||
|
||||
/* Memory buffers. Changing these three constants can have a dramatic
|
||||
effect on performance. The values here are reasonable defaults,
|
||||
but might be tuned. If you adjust them, be sure to test across a
|
||||
|
|
50
gcc/cpplib.c
50
gcc/cpplib.c
|
@ -106,7 +106,6 @@ static char *glue_header_name (cpp_reader *);
|
|||
static const char *parse_include (cpp_reader *, int *);
|
||||
static void push_conditional (cpp_reader *, int, int, const cpp_hashnode *);
|
||||
static unsigned int read_flag (cpp_reader *, unsigned int);
|
||||
static uchar *dequote_string (cpp_reader *, const uchar *, unsigned int);
|
||||
static int strtoul_for_line (const uchar *, unsigned int, unsigned long *);
|
||||
static void do_diagnostic (cpp_reader *, int, int);
|
||||
static cpp_hashnode *lex_macro_node (cpp_reader *);
|
||||
|
@ -714,29 +713,6 @@ read_flag (cpp_reader *pfile, unsigned int last)
|
|||
return 0;
|
||||
}
|
||||
|
||||
/* Subroutine of do_line and do_linemarker. Returns a version of STR
|
||||
which has a NUL terminator and all escape sequences converted to
|
||||
their equivalents. Temporary, hopefully. */
|
||||
static uchar *
|
||||
dequote_string (cpp_reader *pfile, const uchar *str, unsigned int len)
|
||||
{
|
||||
uchar *result = _cpp_unaligned_alloc (pfile, len + 1);
|
||||
uchar *dst = result;
|
||||
const uchar *limit = str + len;
|
||||
cppchar_t c;
|
||||
|
||||
while (str < limit)
|
||||
{
|
||||
c = *str++;
|
||||
if (c != '\\')
|
||||
*dst++ = c;
|
||||
else
|
||||
*dst++ = cpp_parse_escape (pfile, &str, limit, 0);
|
||||
}
|
||||
*dst++ = '\0';
|
||||
return result;
|
||||
}
|
||||
|
||||
/* Subroutine of do_line and do_linemarker. Convert a number in STR,
|
||||
of length LEN, to binary; store it in NUMP, and return 0 if the
|
||||
number was well-formed, 1 if not. Temporary, hopefully. */
|
||||
|
@ -757,6 +733,21 @@ strtoul_for_line (const uchar *str, unsigned int len, long unsigned int *nump)
|
|||
return 0;
|
||||
}
|
||||
|
||||
/* Subroutine of do_line and do_linemarker. Convert escape sequences
|
||||
in a string, but do not perform character set conversion. */
|
||||
static bool
|
||||
interpret_string_notranslate (cpp_reader *pfile, const cpp_string *in,
|
||||
cpp_string *out)
|
||||
{
|
||||
iconv_t save_narrow_cset_desc = pfile->narrow_cset_desc;
|
||||
bool retval;
|
||||
|
||||
pfile->narrow_cset_desc = (iconv_t) -1;
|
||||
retval = cpp_interpret_string (pfile, in, 1, out, false);
|
||||
pfile->narrow_cset_desc = save_narrow_cset_desc;
|
||||
return retval;
|
||||
}
|
||||
|
||||
/* Interpret #line command.
|
||||
Note that the filename string (if any) is a true string constant
|
||||
(escapes are interpreted), unlike in #line. */
|
||||
|
@ -788,8 +779,9 @@ do_line (cpp_reader *pfile)
|
|||
token = cpp_get_token (pfile);
|
||||
if (token->type == CPP_STRING)
|
||||
{
|
||||
new_file = (const char *) dequote_string (pfile, token->val.str.text + 1,
|
||||
token->val.str.len - 2);
|
||||
cpp_string s = { 0, 0 };
|
||||
if (interpret_string_notranslate (pfile, &token->val.str, &s))
|
||||
new_file = (const char *)s.text;
|
||||
check_eol (pfile);
|
||||
}
|
||||
else if (token->type != CPP_EOF)
|
||||
|
@ -836,8 +828,10 @@ do_linemarker (cpp_reader *pfile)
|
|||
token = cpp_get_token (pfile);
|
||||
if (token->type == CPP_STRING)
|
||||
{
|
||||
new_file = (const char *) dequote_string (pfile, token->val.str.text + 1,
|
||||
token->val.str.len - 2);
|
||||
cpp_string s = { 0, 0 };
|
||||
if (interpret_string_notranslate (pfile, &token->val.str, &s))
|
||||
new_file = (const char *)s.text;
|
||||
|
||||
new_sysp = 0;
|
||||
flag = read_flag (pfile, 0);
|
||||
if (flag == 1)
|
||||
|
|
20
gcc/cpplib.h
20
gcc/cpplib.h
|
@ -124,6 +124,7 @@ struct file_name_map_list;
|
|||
OP(CPP_ATSIGN, "@") /* used in Objective-C */ \
|
||||
\
|
||||
TK(CPP_NAME, SPELL_IDENT) /* word */ \
|
||||
TK(CPP_AT_NAME, SPELL_IDENT) /* @word - Objective-C */ \
|
||||
TK(CPP_NUMBER, SPELL_LITERAL) /* 34_be+ta */ \
|
||||
\
|
||||
TK(CPP_CHAR, SPELL_LITERAL) /* 'char' */ \
|
||||
|
@ -132,6 +133,7 @@ struct file_name_map_list;
|
|||
\
|
||||
TK(CPP_STRING, SPELL_LITERAL) /* "string" */ \
|
||||
TK(CPP_WSTRING, SPELL_LITERAL) /* L"string" */ \
|
||||
TK(CPP_OBJC_STRING, SPELL_LITERAL) /* @"string" - Objective-C */ \
|
||||
TK(CPP_HEADER_NAME, SPELL_LITERAL) /* <stdio.h> in #include */ \
|
||||
\
|
||||
TK(CPP_COMMENT, SPELL_LITERAL) /* Only if output comments. */ \
|
||||
|
@ -332,6 +334,12 @@ struct cpp_options
|
|||
/* True for traditional preprocessing. */
|
||||
unsigned char traditional;
|
||||
|
||||
/* Holds the name of the target (execution) character set. */
|
||||
const char *narrow_charset;
|
||||
|
||||
/* Holds the name of the target wide character set. */
|
||||
const char *wide_charset;
|
||||
|
||||
/* True to warn about precompiled header files we couldn't use. */
|
||||
bool warn_invalid_pch;
|
||||
|
||||
|
@ -364,8 +372,9 @@ struct cpp_options
|
|||
/* True means chars (wide chars) are unsigned. */
|
||||
bool unsigned_char, unsigned_wchar;
|
||||
|
||||
/* True if target is EBCDIC. */
|
||||
bool EBCDIC;
|
||||
/* True if the most significant byte in a word has the lowest
|
||||
address in memory. */
|
||||
bool bytes_big_endian;
|
||||
|
||||
/* Nonzero means __STDC__ should have the value 0 in system headers. */
|
||||
unsigned char stdc_0_in_system_headers;
|
||||
|
@ -529,6 +538,9 @@ extern const char *cpp_read_main_file (cpp_reader *, const char *);
|
|||
/* Set up built-ins like __FILE__. */
|
||||
extern void cpp_init_builtins (cpp_reader *, int);
|
||||
|
||||
/* Set up translation to the target character set. */
|
||||
extern void cpp_init_iconv (cpp_reader *);
|
||||
|
||||
/* Call this to finish preprocessing. If you requested dependency
|
||||
generation, pass an open stream to write the information to,
|
||||
otherwise NULL. It is your responsibility to close the stream.
|
||||
|
@ -560,6 +572,10 @@ extern void _cpp_backup_tokens (cpp_reader *, unsigned int);
|
|||
/* Evaluate a CPP_CHAR or CPP_WCHAR token. */
|
||||
extern cppchar_t cpp_interpret_charconst (cpp_reader *, const cpp_token *,
|
||||
unsigned int *, int *);
|
||||
/* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens. */
|
||||
extern bool cpp_interpret_string (cpp_reader *,
|
||||
const cpp_string *, size_t,
|
||||
cpp_string *, bool);
|
||||
|
||||
/* Used to register macros and assertions, perhaps from the command line.
|
||||
The text is the same as the command line argument. */
|
||||
|
|
336
gcc/cppucnid.h
Normal file
336
gcc/cppucnid.h
Normal file
|
@ -0,0 +1,336 @@
|
|||
/* Table of UCNs which are valid in identifiers.
|
||||
Copyright (C) 2003 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
|
||||
|
||||
/* Automatically generated from cppucnid.tab, do not edit */
|
||||
|
||||
/* This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
|
||||
D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
|
||||
the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
|
||||
a reproduction of ISO/IEC PDTR 10176. Unfortunately these tables
|
||||
are not identical. */
|
||||
|
||||
#ifndef CPPUCNID_H
|
||||
#define CPPUCNID_H
|
||||
|
||||
#define C99 1
|
||||
#define CXX 2
|
||||
#define DIG 4
|
||||
|
||||
struct ucnrange
|
||||
{
|
||||
unsigned short lo, hi;
|
||||
unsigned short flags;
|
||||
};
|
||||
|
||||
static const struct ucnrange ucnranges[] = {
|
||||
{ 0x00aa, 0x00aa, C99 }, /* Latin */
|
||||
{ 0x00b5, 0x00b5, C99 }, /* Special characters */
|
||||
{ 0x00b7, 0x00b7, C99 },
|
||||
{ 0x00ba, 0x00ba, C99 }, /* Latin */
|
||||
{ 0x00c0, 0x00d6, CXX|C99 },
|
||||
{ 0x00d8, 0x00f6, CXX|C99 },
|
||||
{ 0x00f8, 0x01f5, CXX|C99 },
|
||||
{ 0x01fa, 0x0217, CXX|C99 },
|
||||
{ 0x0250, 0x02a8, CXX|C99 },
|
||||
{ 0x02b0, 0x02b8, C99 }, /* Special characters */
|
||||
{ 0x02bb, 0x02bb, C99 },
|
||||
{ 0x02bd, 0x02c1, C99 },
|
||||
{ 0x02d0, 0x02d1, C99 },
|
||||
{ 0x02e0, 0x02e4, C99 },
|
||||
{ 0x037a, 0x037a, C99 },
|
||||
{ 0x0384, 0x0384, CXX }, /* Greek */
|
||||
{ 0x0386, 0x0386, C99 },
|
||||
{ 0x0388, 0x038a, CXX|C99 },
|
||||
{ 0x038c, 0x038c, CXX|C99 },
|
||||
{ 0x038e, 0x03a1, CXX|C99 },
|
||||
{ 0x03a3, 0x03ce, CXX|C99 },
|
||||
{ 0x03d0, 0x03d6, CXX|C99 },
|
||||
{ 0x03da, 0x03da, CXX|C99 },
|
||||
{ 0x03dc, 0x03dc, CXX|C99 },
|
||||
{ 0x03de, 0x03de, CXX|C99 },
|
||||
{ 0x03e0, 0x03e0, CXX|C99 },
|
||||
{ 0x03e2, 0x03f3, CXX|C99 },
|
||||
{ 0x0401, 0x040c, CXX|C99 }, /* Cyrillic */
|
||||
{ 0x040d, 0x040d, CXX },
|
||||
{ 0x040e, 0x040e, C99 },
|
||||
{ 0x040f, 0x044f, CXX|C99 },
|
||||
{ 0x0451, 0x045c, CXX|C99 },
|
||||
{ 0x045e, 0x0481, CXX|C99 },
|
||||
{ 0x0490, 0x04c4, CXX|C99 },
|
||||
{ 0x04c7, 0x04c8, CXX|C99 },
|
||||
{ 0x04cb, 0x04cc, CXX|C99 },
|
||||
{ 0x04d0, 0x04eb, CXX|C99 },
|
||||
{ 0x04ee, 0x04f5, CXX|C99 },
|
||||
{ 0x04f8, 0x04f9, CXX|C99 },
|
||||
{ 0x0531, 0x0556, CXX|C99 }, /* Armenian */
|
||||
{ 0x0559, 0x0559, C99 }, /* Special characters */
|
||||
{ 0x0561, 0x0587, CXX|C99 }, /* Armenian */
|
||||
{ 0x05b0, 0x05b9, C99 }, /* Hebrew */
|
||||
{ 0x05bb, 0x05bd, C99 },
|
||||
{ 0x05bf, 0x05bf, C99 },
|
||||
{ 0x05c1, 0x05c2, C99 },
|
||||
{ 0x05d0, 0x05ea, CXX|C99 },
|
||||
{ 0x05f0, 0x05f2, CXX|C99 },
|
||||
{ 0x05f3, 0x05f4, CXX },
|
||||
{ 0x0621, 0x063a, CXX|C99 }, /* Arabic */
|
||||
{ 0x0640, 0x0652, CXX|C99 },
|
||||
{ 0x0660, 0x0669, C99|DIG }, /* Digits */
|
||||
{ 0x0670, 0x06b7, CXX|C99 }, /* Arabic */
|
||||
{ 0x06ba, 0x06be, CXX|C99 },
|
||||
{ 0x06c0, 0x06ce, CXX|C99 },
|
||||
{ 0x06d0, 0x06dc, C99 },
|
||||
{ 0x06e5, 0x06e7, CXX|C99 },
|
||||
{ 0x06e8, 0x06e8, C99 },
|
||||
{ 0x06ea, 0x06ed, C99 },
|
||||
{ 0x06f0, 0x06f9, C99|DIG }, /* Digits */
|
||||
{ 0x0901, 0x0903, C99 }, /* Devanagari */
|
||||
{ 0x0905, 0x0939, CXX|C99 },
|
||||
{ 0x093d, 0x093d, C99 }, /* Special characters */
|
||||
{ 0x093e, 0x094d, C99 }, /* Devanagari */
|
||||
{ 0x0950, 0x0952, C99 },
|
||||
{ 0x0958, 0x0962, CXX|C99 },
|
||||
{ 0x0963, 0x0963, C99 },
|
||||
{ 0x0966, 0x096f, C99|DIG }, /* Digits */
|
||||
{ 0x0981, 0x0983, C99 }, /* Bengali */
|
||||
{ 0x0985, 0x098c, CXX|C99 },
|
||||
{ 0x098f, 0x0990, CXX|C99 },
|
||||
{ 0x0993, 0x09a8, CXX|C99 },
|
||||
{ 0x09aa, 0x09b0, CXX|C99 },
|
||||
{ 0x09b2, 0x09b2, CXX|C99 },
|
||||
{ 0x09b6, 0x09b9, CXX|C99 },
|
||||
{ 0x09be, 0x09c4, C99 },
|
||||
{ 0x09c7, 0x09c8, C99 },
|
||||
{ 0x09cb, 0x09cd, C99 },
|
||||
{ 0x09dc, 0x09dd, CXX|C99 },
|
||||
{ 0x09df, 0x09e1, CXX|C99 },
|
||||
{ 0x09e2, 0x09e3, C99 },
|
||||
{ 0x09e6, 0x09ef, C99|DIG }, /* Digits */
|
||||
{ 0x09f0, 0x09f1, CXX|C99 }, /* Bengali */
|
||||
{ 0x0a02, 0x0a02, C99 }, /* Gurmukhi */
|
||||
{ 0x0a05, 0x0a0a, CXX|C99 },
|
||||
{ 0x0a0f, 0x0a10, CXX|C99 },
|
||||
{ 0x0a13, 0x0a28, CXX|C99 },
|
||||
{ 0x0a2a, 0x0a30, CXX|C99 },
|
||||
{ 0x0a32, 0x0a33, CXX|C99 },
|
||||
{ 0x0a35, 0x0a36, CXX|C99 },
|
||||
{ 0x0a38, 0x0a39, CXX|C99 },
|
||||
{ 0x0a3e, 0x0a42, C99 },
|
||||
{ 0x0a47, 0x0a48, C99 },
|
||||
{ 0x0a4b, 0x0a4d, C99 },
|
||||
{ 0x0a59, 0x0a5c, CXX|C99 },
|
||||
{ 0x0a5e, 0x0a5e, CXX|C99 },
|
||||
{ 0x0a66, 0x0a6f, C99|DIG }, /* Digits */
|
||||
{ 0x0a74, 0x0a74, C99 }, /* Gurmukhi */
|
||||
{ 0x0a81, 0x0a83, C99 }, /* Gujarati */
|
||||
{ 0x0a85, 0x0a8b, CXX|C99 },
|
||||
{ 0x0a8d, 0x0a8d, CXX|C99 },
|
||||
{ 0x0a8f, 0x0a91, CXX|C99 },
|
||||
{ 0x0a93, 0x0aa8, CXX|C99 },
|
||||
{ 0x0aaa, 0x0ab0, CXX|C99 },
|
||||
{ 0x0ab2, 0x0ab3, CXX|C99 },
|
||||
{ 0x0ab5, 0x0ab9, CXX|C99 },
|
||||
{ 0x0abd, 0x0ac5, C99 },
|
||||
{ 0x0ac7, 0x0ac9, C99 },
|
||||
{ 0x0acb, 0x0acd, C99 },
|
||||
{ 0x0ad0, 0x0ad0, C99 },
|
||||
{ 0x0ae0, 0x0ae0, CXX|C99 },
|
||||
{ 0x0ae6, 0x0aef, C99|DIG }, /* Digits */
|
||||
{ 0x0b01, 0x0b03, C99 }, /* Oriya */
|
||||
{ 0x0b05, 0x0b0c, CXX|C99 },
|
||||
{ 0x0b0f, 0x0b10, CXX|C99 },
|
||||
{ 0x0b13, 0x0b28, CXX|C99 },
|
||||
{ 0x0b2a, 0x0b30, CXX|C99 },
|
||||
{ 0x0b32, 0x0b33, CXX|C99 },
|
||||
{ 0x0b36, 0x0b39, CXX|C99 },
|
||||
{ 0x0b3d, 0x0b3d, C99 }, /* Special characters */
|
||||
{ 0x0b3e, 0x0b43, C99 }, /* Oriya */
|
||||
{ 0x0b47, 0x0b48, C99 },
|
||||
{ 0x0b4b, 0x0b4d, C99 },
|
||||
{ 0x0b5c, 0x0b5d, CXX|C99 },
|
||||
{ 0x0b5f, 0x0b61, CXX|C99 },
|
||||
{ 0x0b66, 0x0b6f, C99|DIG }, /* Digits */
|
||||
{ 0x0b82, 0x0b83, C99 }, /* Tamil */
|
||||
{ 0x0b85, 0x0b8a, CXX|C99 },
|
||||
{ 0x0b8e, 0x0b90, CXX|C99 },
|
||||
{ 0x0b92, 0x0b95, CXX|C99 },
|
||||
{ 0x0b99, 0x0b9a, CXX|C99 },
|
||||
{ 0x0b9c, 0x0b9c, CXX|C99 },
|
||||
{ 0x0b9e, 0x0b9f, CXX|C99 },
|
||||
{ 0x0ba3, 0x0ba4, CXX|C99 },
|
||||
{ 0x0ba8, 0x0baa, CXX|C99 },
|
||||
{ 0x0bae, 0x0bb5, CXX|C99 },
|
||||
{ 0x0bb7, 0x0bb9, CXX|C99 },
|
||||
{ 0x0bbe, 0x0bc2, C99 },
|
||||
{ 0x0bc6, 0x0bc8, C99 },
|
||||
{ 0x0bca, 0x0bcd, C99 },
|
||||
{ 0x0be7, 0x0bef, C99|DIG }, /* Digits */
|
||||
{ 0x0c01, 0x0c03, C99 }, /* Telugu */
|
||||
{ 0x0c05, 0x0c0c, CXX|C99 },
|
||||
{ 0x0c0e, 0x0c10, CXX|C99 },
|
||||
{ 0x0c12, 0x0c28, CXX|C99 },
|
||||
{ 0x0c2a, 0x0c33, CXX|C99 },
|
||||
{ 0x0c35, 0x0c39, CXX|C99 },
|
||||
{ 0x0c3e, 0x0c44, C99 },
|
||||
{ 0x0c46, 0x0c48, C99 },
|
||||
{ 0x0c4a, 0x0c4d, C99 },
|
||||
{ 0x0c60, 0x0c61, CXX|C99 },
|
||||
{ 0x0c66, 0x0c6f, C99|DIG }, /* Digits */
|
||||
{ 0x0c82, 0x0c83, C99 }, /* Kannada */
|
||||
{ 0x0c85, 0x0c8c, CXX|C99 },
|
||||
{ 0x0c8e, 0x0c90, CXX|C99 },
|
||||
{ 0x0c92, 0x0ca8, CXX|C99 },
|
||||
{ 0x0caa, 0x0cb3, CXX|C99 },
|
||||
{ 0x0cb5, 0x0cb9, CXX|C99 },
|
||||
{ 0x0cbe, 0x0cc4, C99 },
|
||||
{ 0x0cc6, 0x0cc8, C99 },
|
||||
{ 0x0cca, 0x0ccd, C99 },
|
||||
{ 0x0cde, 0x0cde, C99 },
|
||||
{ 0x0ce0, 0x0ce1, CXX|C99 },
|
||||
{ 0x0ce6, 0x0cef, C99|DIG }, /* Digits */
|
||||
{ 0x0d02, 0x0d03, C99 }, /* Malayalam */
|
||||
{ 0x0d05, 0x0d0c, CXX|C99 },
|
||||
{ 0x0d0e, 0x0d10, CXX|C99 },
|
||||
{ 0x0d12, 0x0d28, CXX|C99 },
|
||||
{ 0x0d2a, 0x0d39, CXX|C99 },
|
||||
{ 0x0d3e, 0x0d43, C99 },
|
||||
{ 0x0d46, 0x0d48, C99 },
|
||||
{ 0x0d4a, 0x0d4d, C99 },
|
||||
{ 0x0d60, 0x0d61, CXX|C99 },
|
||||
{ 0x0d66, 0x0d6f, C99|DIG }, /* Digits */
|
||||
{ 0x0e01, 0x0e30, CXX|C99 }, /* Thai */
|
||||
{ 0x0e31, 0x0e31, C99 },
|
||||
{ 0x0e32, 0x0e33, CXX|C99 },
|
||||
{ 0x0e34, 0x0e3a, C99 },
|
||||
{ 0x0e40, 0x0e46, CXX|C99 },
|
||||
{ 0x0e47, 0x0e49, C99 },
|
||||
{ 0x0e50, 0x0e59, CXX|C99|DIG }, /* Digits */
|
||||
{ 0x0e5a, 0x0e5b, CXX|C99 }, /* Thai */
|
||||
{ 0x0e81, 0x0e82, CXX|C99 }, /* Lao */
|
||||
{ 0x0e84, 0x0e84, CXX|C99 },
|
||||
{ 0x0e87, 0x0e88, CXX|C99 },
|
||||
{ 0x0e8a, 0x0e8a, CXX|C99 },
|
||||
{ 0x0e8d, 0x0e8d, CXX|C99 },
|
||||
{ 0x0e94, 0x0e97, CXX|C99 },
|
||||
{ 0x0e99, 0x0e9f, CXX|C99 },
|
||||
{ 0x0ea1, 0x0ea3, CXX|C99 },
|
||||
{ 0x0ea5, 0x0ea5, CXX|C99 },
|
||||
{ 0x0ea7, 0x0ea7, CXX|C99 },
|
||||
{ 0x0eaa, 0x0eab, CXX|C99 },
|
||||
{ 0x0ead, 0x0eae, CXX|C99 },
|
||||
{ 0x0eaf, 0x0eaf, CXX },
|
||||
{ 0x0eb0, 0x0eb0, CXX|C99 },
|
||||
{ 0x0eb1, 0x0eb1, C99 },
|
||||
{ 0x0eb2, 0x0eb3, CXX|C99 },
|
||||
{ 0x0eb4, 0x0eb9, C99 },
|
||||
{ 0x0ebb, 0x0ebc, C99 },
|
||||
{ 0x0ebd, 0x0ebd, CXX|C99 },
|
||||
{ 0x0ec0, 0x0ec4, CXX|C99 },
|
||||
{ 0x0ec6, 0x0ec6, CXX|C99 },
|
||||
{ 0x0ec8, 0x0ecd, C99 },
|
||||
{ 0x0ed0, 0x0ed9, C99|DIG }, /* Digits */
|
||||
{ 0x0edc, 0x0edd, C99 }, /* Lao */
|
||||
{ 0x0f00, 0x0f00, C99 }, /* Tibetan */
|
||||
{ 0x0f18, 0x0f19, C99 },
|
||||
{ 0x0f20, 0x0f33, C99|DIG }, /* Digits */
|
||||
{ 0x0f35, 0x0f35, C99 }, /* Tibetan */
|
||||
{ 0x0f37, 0x0f37, C99 },
|
||||
{ 0x0f39, 0x0f39, C99 },
|
||||
{ 0x0f3e, 0x0f47, C99 },
|
||||
{ 0x0f49, 0x0f69, C99 },
|
||||
{ 0x0f71, 0x0f84, C99 },
|
||||
{ 0x0f86, 0x0f8b, C99 },
|
||||
{ 0x0f90, 0x0f95, C99 },
|
||||
{ 0x0f97, 0x0f97, C99 },
|
||||
{ 0x0f99, 0x0fad, C99 },
|
||||
{ 0x0fb1, 0x0fb7, C99 },
|
||||
{ 0x0fb9, 0x0fb9, C99 },
|
||||
{ 0x10a0, 0x10c5, CXX|C99 }, /* Georgian */
|
||||
{ 0x10d0, 0x10f6, CXX|C99 },
|
||||
{ 0x1100, 0x1159, CXX }, /* Hangul */
|
||||
{ 0x1161, 0x11a2, CXX },
|
||||
{ 0x11a8, 0x11f9, CXX },
|
||||
{ 0x1e00, 0x1e9a, CXX|C99 }, /* Latin */
|
||||
{ 0x1e9b, 0x1e9b, C99 },
|
||||
{ 0x1ea0, 0x1ef9, CXX|C99 },
|
||||
{ 0x1f00, 0x1f15, CXX|C99 }, /* Greek */
|
||||
{ 0x1f18, 0x1f1d, CXX|C99 },
|
||||
{ 0x1f20, 0x1f45, CXX|C99 },
|
||||
{ 0x1f48, 0x1f4d, CXX|C99 },
|
||||
{ 0x1f50, 0x1f57, CXX|C99 },
|
||||
{ 0x1f59, 0x1f59, CXX|C99 },
|
||||
{ 0x1f5b, 0x1f5b, CXX|C99 },
|
||||
{ 0x1f5d, 0x1f5d, CXX|C99 },
|
||||
{ 0x1f5f, 0x1f7d, CXX|C99 },
|
||||
{ 0x1f80, 0x1fb4, CXX|C99 },
|
||||
{ 0x1fb6, 0x1fbc, CXX|C99 },
|
||||
{ 0x1fbe, 0x1fbe, C99 }, /* Special characters */
|
||||
{ 0x1fc2, 0x1fc4, CXX|C99 }, /* Greek */
|
||||
{ 0x1fc6, 0x1fcc, CXX|C99 },
|
||||
{ 0x1fd0, 0x1fd3, CXX|C99 },
|
||||
{ 0x1fd6, 0x1fdb, CXX|C99 },
|
||||
{ 0x1fe0, 0x1fec, CXX|C99 },
|
||||
{ 0x1ff2, 0x1ff4, CXX|C99 },
|
||||
{ 0x1ff6, 0x1ffc, CXX|C99 },
|
||||
{ 0x203f, 0x2040, C99 }, /* Special characters */
|
||||
{ 0x207f, 0x207f, C99 }, /* Latin */
|
||||
{ 0x2102, 0x2102, C99 }, /* Special characters */
|
||||
{ 0x2107, 0x2107, C99 },
|
||||
{ 0x210a, 0x2113, C99 },
|
||||
{ 0x2115, 0x2115, C99 },
|
||||
{ 0x2118, 0x211d, C99 },
|
||||
{ 0x2124, 0x2124, C99 },
|
||||
{ 0x2126, 0x2126, C99 },
|
||||
{ 0x2128, 0x2128, C99 },
|
||||
{ 0x212a, 0x2131, C99 },
|
||||
{ 0x2133, 0x2138, C99 },
|
||||
{ 0x2160, 0x2182, C99 },
|
||||
{ 0x3005, 0x3007, C99 },
|
||||
{ 0x3021, 0x3029, C99 },
|
||||
{ 0x3041, 0x3093, CXX|C99 }, /* Hiragana */
|
||||
{ 0x3094, 0x3094, CXX },
|
||||
{ 0x309b, 0x309c, CXX|C99 },
|
||||
{ 0x309d, 0x309e, CXX },
|
||||
{ 0x30a1, 0x30f6, CXX|C99 }, /* Katakana */
|
||||
{ 0x30f7, 0x30fa, CXX },
|
||||
{ 0x30fb, 0x30fc, CXX|C99 },
|
||||
{ 0x30fd, 0x30fe, CXX },
|
||||
{ 0x3105, 0x312c, CXX|C99 }, /* Bopomofo */
|
||||
{ 0x4e00, 0x9fa5, CXX|C99 }, /* CJK Unified Ideographs */
|
||||
{ 0xac00, 0xd7a3, C99 }, /* Hangul */
|
||||
{ 0xf900, 0xfa2d, CXX }, /* CJK Unified Ideographs */
|
||||
{ 0xfb1f, 0xfb36, CXX },
|
||||
{ 0xfb38, 0xfb3c, CXX },
|
||||
{ 0xfb3e, 0xfb3e, CXX },
|
||||
{ 0xfb40, 0xfb44, CXX },
|
||||
{ 0xfb46, 0xfbb1, CXX },
|
||||
{ 0xfbd3, 0xfd3f, CXX },
|
||||
{ 0xfd50, 0xfd8f, CXX },
|
||||
{ 0xfd92, 0xfdc7, CXX },
|
||||
{ 0xfdf0, 0xfdfb, CXX },
|
||||
{ 0xfe70, 0xfe72, CXX },
|
||||
{ 0xfe74, 0xfe74, CXX },
|
||||
{ 0xfe76, 0xfefc, CXX },
|
||||
{ 0xff21, 0xff3a, CXX },
|
||||
{ 0xff41, 0xff5a, CXX },
|
||||
{ 0xff66, 0xffbe, CXX },
|
||||
{ 0xffc2, 0xffc7, CXX },
|
||||
{ 0xffca, 0xffcf, CXX },
|
||||
{ 0xffd2, 0xffd7, CXX },
|
||||
{ 0xffda, 0xffdc, CXX },
|
||||
};
|
||||
|
||||
#endif /* cppucnid.h */
|
130
gcc/cppucnid.pl
Normal file
130
gcc/cppucnid.pl
Normal file
|
@ -0,0 +1,130 @@
|
|||
#! /usr/bin/perl -w
|
||||
use strict;
|
||||
|
||||
# Convert cppucnid.tab to cppucnid.h. We use two arrays of length
|
||||
# 65536 to represent the table, since this is nice and simple. The
|
||||
# first array holds the tags indicating which ranges are valid in
|
||||
# which contexts. The second array holds the language name associated
|
||||
# with each element.
|
||||
|
||||
our(@tags, @names);
|
||||
@tags = ("") x 65536;
|
||||
@names = ("") x 65536;
|
||||
|
||||
|
||||
# Array mapping tag numbers to standard #defines
|
||||
our @stds;
|
||||
|
||||
# Current standard and language
|
||||
our($curstd, $curlang);
|
||||
|
||||
# First block of the file is a template to be saved for later.
|
||||
our @template;
|
||||
|
||||
while (<>) {
|
||||
chomp;
|
||||
last if $_ eq '%%';
|
||||
push @template, $_;
|
||||
};
|
||||
|
||||
# Second block of the file is the UCN tables.
|
||||
# The format looks like this:
|
||||
#
|
||||
# [std]
|
||||
#
|
||||
# ; language
|
||||
# xxxx-xxxx xxxx xxxx-xxxx ....
|
||||
#
|
||||
# with comment lines starting with #.
|
||||
|
||||
while (<>) {
|
||||
chomp;
|
||||
/^#/ and next;
|
||||
/^\s*$/ and next;
|
||||
/^\[(.+)\]$/ and do {
|
||||
$curstd = $1;
|
||||
next;
|
||||
};
|
||||
/^; (.+)$/ and do {
|
||||
$curlang = $1;
|
||||
next;
|
||||
};
|
||||
|
||||
process_range(split);
|
||||
}
|
||||
|
||||
# Print out the template, inserting as requested.
|
||||
$\ = "\n";
|
||||
for (@template) {
|
||||
print("/* Automatically generated from cppucnid.tab, do not edit */"),
|
||||
next if $_ eq "[dne]";
|
||||
print_table(), next if $_ eq "[table]";
|
||||
print;
|
||||
}
|
||||
|
||||
sub print_table {
|
||||
my($lo, $hi);
|
||||
my $prevname = "";
|
||||
|
||||
for ($lo = 0; $lo <= $#tags; $lo = $hi) {
|
||||
$hi = $lo;
|
||||
$hi++ while $hi <= $#tags
|
||||
&& $tags[$hi] eq $tags[$lo]
|
||||
&& $names[$hi] eq $names[$lo];
|
||||
|
||||
# Range from $lo to $hi-1.
|
||||
# Don't make entries for ranges that are not valid idchars.
|
||||
next if ($tags[$lo] eq "");
|
||||
my $tag = $tags[$lo];
|
||||
$tag = " ".$tag if $tag =~ /^C99/;
|
||||
|
||||
if ($names[$lo] eq $prevname) {
|
||||
printf(" { 0x%04x, 0x%04x, %-11s },\n",
|
||||
$lo, $hi-1, $tag);
|
||||
} else {
|
||||
printf(" { 0x%04x, 0x%04x, %-11s }, /* %s */\n",
|
||||
$lo, $hi-1, $tag, $names[$lo]);
|
||||
}
|
||||
$prevname = $names[$lo];
|
||||
}
|
||||
}
|
||||
|
||||
# The line is a list of four-digit hexadecimal numbers or
|
||||
# pairs of such numbers. Each is a valid identifier character
|
||||
# from the given language, under the given standard.
|
||||
sub process_range {
|
||||
for my $range (@_) {
|
||||
if ($range =~ /^[0-9a-f]{4}$/) {
|
||||
my $i = hex($range);
|
||||
if ($tags[$i] eq "") {
|
||||
$tags[$i] = $curstd;
|
||||
} else {
|
||||
$tags[$i] = $curstd . "|" . $tags[$i];
|
||||
}
|
||||
if ($names[$i] ne "" && $names[$i] ne $curlang) {
|
||||
warn sprintf ("language overlap: %s/%s at %x (tag %d)",
|
||||
$names[$i], $curlang, $i, $tags[$i]);
|
||||
next;
|
||||
}
|
||||
$names[$i] = $curlang;
|
||||
} elsif ($range =~ /^ ([0-9a-f]{4}) - ([0-9a-f]{4}) $/x) {
|
||||
my ($start, $end) = (hex($1), hex($2));
|
||||
my $i;
|
||||
for ($i = $start; $i <= $end; $i++) {
|
||||
if ($tags[$i] eq "") {
|
||||
$tags[$i] = $curstd;
|
||||
} else {
|
||||
$tags[$i] = $curstd . "|" . $tags[$i];
|
||||
}
|
||||
if ($names[$i] ne "" && $names[$i] ne $curlang) {
|
||||
warn sprintf ("language overlap: %s/%s at %x (tag %d)",
|
||||
$names[$i], $curlang, $i, $tags[$i]);
|
||||
next;
|
||||
}
|
||||
$names[$i] = $curlang;
|
||||
}
|
||||
} else {
|
||||
warn "malformed range expression $range";
|
||||
}
|
||||
}
|
||||
}
|
239
gcc/cppucnid.tab
Normal file
239
gcc/cppucnid.tab
Normal file
|
@ -0,0 +1,239 @@
|
|||
/* Table of UCNs which are valid in identifiers.
|
||||
Copyright (C) 2003 Free Software Foundation, Inc.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the
|
||||
Free Software Foundation; either version 2, or (at your option) any
|
||||
later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
|
||||
|
||||
[dne]
|
||||
|
||||
/* This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
|
||||
D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
|
||||
the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
|
||||
a reproduction of ISO/IEC PDTR 10176. Unfortunately these tables
|
||||
are not identical. */
|
||||
|
||||
#ifndef CPPUCNID_H
|
||||
#define CPPUCNID_H
|
||||
|
||||
#define C99 1
|
||||
#define CXX 2
|
||||
#define DIG 4
|
||||
|
||||
struct ucnrange
|
||||
{
|
||||
unsigned short lo, hi;
|
||||
unsigned short flags;
|
||||
};
|
||||
|
||||
static const struct ucnrange ucnranges[] = {
|
||||
[table]
|
||||
};
|
||||
|
||||
#endif /* cppucnid.h */
|
||||
%%
|
||||
|
||||
[C99]
|
||||
|
||||
; Latin
|
||||
00aa 00ba 00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9b
|
||||
1ea0-1ef9 207f
|
||||
|
||||
; Greek
|
||||
0386 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0
|
||||
03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
|
||||
1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3
|
||||
1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
|
||||
|
||||
; Cyrillic
|
||||
0401-040c 040e-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
|
||||
04d0-04eb 04ee-04f5 04f8-04f9
|
||||
|
||||
; Armenian
|
||||
0531-0556 0561-0587
|
||||
|
||||
; Hebrew
|
||||
05b0-05b9 05bb-05bd 05bf 05c1-05c2 05d0-05ea 05f0-05f2
|
||||
|
||||
; Arabic
|
||||
0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06d0-06dc 06e5-06e8
|
||||
06ea-06ed
|
||||
|
||||
; Devanagari
|
||||
0901-0903 0905-0939 093e-094d 0950-0952 0958-0963
|
||||
|
||||
; Bengali
|
||||
0981-0983 0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9
|
||||
09be-09c4 09c7-09c8 09cb-09cd 09dc-09dd 09df-09e3 09f0-09f1
|
||||
|
||||
; Gurmukhi
|
||||
0a02 0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36
|
||||
0a38-0a39 0a3e-0a42 0a47-0a48 0a4b-0a4d 0a59-0a5c 0a5e 0a74
|
||||
|
||||
; Gujarati
|
||||
0a81-0a83 0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3
|
||||
0ab5-0ab9 0abd-0ac5 0ac7-0ac9 0acb-0acd 0ad0 0ae0
|
||||
|
||||
; Oriya
|
||||
0b01-0b03 0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39
|
||||
0b3e-0b43 0b47-0b48 0b4b-0b4d 0b5c-0b5d 0b5f-0b61
|
||||
|
||||
; Tamil
|
||||
0b82-0b83 0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f
|
||||
0ba3-0ba4 0ba8-0baa 0bae-0bb5 0bb7-0bb9 0bbe-0bc2 0bc6-0bc8 0bca-0bcd
|
||||
|
||||
; Telugu
|
||||
0c01-0c03 0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c3e-0c44
|
||||
0c46-0c48 0c4a-0c4d 0c60-0c61
|
||||
|
||||
; Kannada
|
||||
0c82-0c83 0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0cbe-0cc4
|
||||
0cc6-0cc8 0cca-0ccd 0cde 0ce0-0ce1
|
||||
|
||||
; Malayalam
|
||||
0d02-0d03 0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d3e-0d43 0d46-0d48
|
||||
0d4a-0d4d 0d60-0d61
|
||||
|
||||
# CORRECTION: exclude 0e50-0e59 from the Thai range as it also appears
|
||||
# in the Digits range below.
|
||||
; Thai
|
||||
0e01-0e3a 0e40-0e49 0e5a-0e5b
|
||||
|
||||
; Lao
|
||||
0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
|
||||
0ea7 0eaa-0eab 0ead-0eae 0eb0-0eb9 0ebb-0ebd 0ec0-0ec4 0ec6 0ec8-0ecd
|
||||
0edc-0edd
|
||||
|
||||
; Tibetan
|
||||
0f00 0f18-0f19 0f35 0f37 0f39 0f3e-0f47 0f49-0f69 0f71-0f84 0f86-0f8b
|
||||
0f90-0f95 0f97 0f99-0fad 0fb1-0fb7 0fb9
|
||||
|
||||
; Georgian
|
||||
10a0-10c5 10d0-10f6
|
||||
|
||||
; Hiragana
|
||||
3041-3093 309b-309c
|
||||
|
||||
; Katakana
|
||||
30a1-30f6 30fb-30fc
|
||||
|
||||
; Bopomofo
|
||||
3105-312c
|
||||
|
||||
; CJK Unified Ideographs
|
||||
4e00-9fa5
|
||||
|
||||
; Hangul
|
||||
ac00-d7a3
|
||||
|
||||
; Special characters
|
||||
00b5 00b7 02b0-02b8 02bb 02bd-02c1 02d0-02d1 02e0-02e4 037a 0559 093d
|
||||
0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128
|
||||
212a-2131 2133-2138 2160-2182 3005-3007 3021-3029
|
||||
|
||||
[C99|DIG]
|
||||
; Digits
|
||||
0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f
|
||||
0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33
|
||||
|
||||
[CXX]
|
||||
|
||||
; Latin
|
||||
00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9a 1ea0-1ef9
|
||||
|
||||
; Greek
|
||||
0384 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0
|
||||
03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
|
||||
1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3
|
||||
1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
|
||||
|
||||
; Cyrillic
|
||||
0401-040d 040f-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
|
||||
04d0-04eb 04ee-04f5 04f8-04f9
|
||||
|
||||
; Armenian
|
||||
0531-0556 0561-0587
|
||||
|
||||
; Hebrew
|
||||
05d0-05ea 05f0-05f4
|
||||
|
||||
; Arabic
|
||||
0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06e5-06e7
|
||||
|
||||
; Devanagari
|
||||
0905-0939 0958-0962
|
||||
|
||||
; Bengali
|
||||
0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9 09dc-09dd
|
||||
09df-09e1 09f0-09f1
|
||||
|
||||
; Gurmukhi
|
||||
0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36 0a38-0a39
|
||||
0a59-0a5c 0a5e
|
||||
|
||||
; Gujarati
|
||||
0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3 0ab5-0ab9 0ae0
|
||||
|
||||
; Oriya
|
||||
0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39 0b5c-0b5d
|
||||
0b5f-0b61
|
||||
|
||||
; Tamil
|
||||
0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f 0ba3-0ba4
|
||||
0ba8-0baa 0bae-0bb5 0bb7-0bb9
|
||||
|
||||
; Telugu
|
||||
0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c60-0c61
|
||||
|
||||
; Kannada
|
||||
0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0ce0-0ce1
|
||||
|
||||
; Malayalam
|
||||
0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d60-0d61
|
||||
|
||||
# CORRECTION: Exclude 0e50-0e59 from the Thai range and make a fake
|
||||
# Digits range for it, to match C99. cppcharset.c knows that C++
|
||||
# doesn't distinguish digits from other UCNs valid in identifiers.
|
||||
; Thai
|
||||
0e01-0e30 0e32-0e33 0e40-0e46 0e4f-0e49 0e5a-0e5b
|
||||
|
||||
; Digits
|
||||
0e50-0e59
|
||||
|
||||
# CORRECTION: Change 0e0d to 0e8d (typo in standard; see C++ DR 131)
|
||||
; Lao
|
||||
0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
|
||||
0ea7 0eaa-0eab 0ead-0eb0 0eb2 0eb3 0ebd 0ec0-0ec4 0ec6
|
||||
|
||||
; Georgian
|
||||
10a0-10c5 10d0-10f6
|
||||
|
||||
; Hiragana
|
||||
3041-3094 309b-309e
|
||||
|
||||
; Katakana
|
||||
30a1-30fe
|
||||
|
||||
# CORRECTION: language spelled "Bopmofo" in C++98.
|
||||
; Bopomofo
|
||||
3105-312c
|
||||
|
||||
; Hangul
|
||||
1100-1159 1161-11a2 11a8-11f9
|
||||
|
||||
; CJK Unified Ideographs
|
||||
f900-fa2d fb1f-fb36 fb38-fb3c fb3e fb40-fb41 fb42-fb44 fb46-fbb1
|
||||
fbd3-fd3f fd50-fd8f fd92-fdc7 fdf0-fdfb fe70-fe72 fe74 fe76-fefc
|
||||
ff21-ff3a ff41-ff5a ff66-ffbe ffc2-ffc7 ffca-ffcf ffd2-ffd7
|
||||
ffda-ffdc 4e00-9fa5
|
||||
|
|
@ -104,6 +104,7 @@ useful on its own.
|
|||
|
||||
Overview
|
||||
|
||||
* Character sets::
|
||||
* Initial processing::
|
||||
* Tokenization::
|
||||
* The preprocessing language::
|
||||
|
@ -233,11 +234,62 @@ manual refer to GNU CPP.
|
|||
@c man end
|
||||
|
||||
@menu
|
||||
* Character sets::
|
||||
* Initial processing::
|
||||
* Tokenization::
|
||||
* The preprocessing language::
|
||||
@end menu
|
||||
|
||||
@node Character sets
|
||||
@section Character sets
|
||||
|
||||
Source code character set processing in C and related languages is
|
||||
rather complicated. The C standard discusses two character sets, but
|
||||
there are really at least four.
|
||||
|
||||
The files input to CPP might be in any character set at all. CPP's
|
||||
very first action, before it even looks for line boundaries, is to
|
||||
convert the file into the character set it uses for internal
|
||||
processing. That set is what the C standard calls the @dfn{source}
|
||||
character set. It must be isomorphic with ISO 10646, also known as
|
||||
Unicode. CPP uses the UTF-8 encoding of Unicode.
|
||||
|
||||
At present, GNU CPP does not implement conversion from arbitrary file
|
||||
encodings to the source character set. Use of any encoding other than
|
||||
plain ASCII or UTF-8, except in comments, will cause errors. Use of
|
||||
encodings that are not strict supersets of ASCII, such as Shift JIS,
|
||||
may cause errors even if non-ASCII characters appear only in comments.
|
||||
We plan to fix this in the near future.
|
||||
|
||||
All preprocessing work (the subject of the rest of this manual) is
|
||||
carried out in the source character set. If you request textual
|
||||
output from the preprocessor with the @option{-E} option, it will be
|
||||
in UTF-8.
|
||||
|
||||
After preprocessing is complete, string and character constants are
|
||||
converted again, into the @dfn{execution} character set. This
|
||||
character set is under control of the user; the default is UTF-8,
|
||||
matching the source character set. Wide string and character
|
||||
constants have their own character set, which is not called out
|
||||
specifically in the standard. Again, it is under control of the user.
|
||||
The default is UTF-16 or UTF-32, whichever fits in the target's
|
||||
@code{wchar_t} type, in the target machine's byte
|
||||
order.@footnote{UTF-16 does not meet the requirements of the C
|
||||
standard for a wide character set, but the choice of 16-bit
|
||||
@code{wchar_t} is enshrined in some system ABIs so we cannot fix
|
||||
this.} Octal and hexadecimal escape sequences do not undergo
|
||||
conversion; @t{'\x12'} has the value 0x12 regardless of the currently
|
||||
selected execution character set. All other escapes are replaced by
|
||||
the character in the source character set that they represent, then
|
||||
converted to the execution character set, just like unescaped
|
||||
characters.
|
||||
|
||||
GCC does not permit the use of characters outside the ASCII range, nor
|
||||
@samp{\u} and @samp{\U} escapes, in identifiers. We hope this will
|
||||
change eventually, but there are problems with the standard semantics
|
||||
of such ``extended identifiers'' which must be resolved through the
|
||||
ISO C and C++ committees first.
|
||||
|
||||
@node Initial processing
|
||||
@section Initial processing
|
||||
|
||||
|
@ -251,27 +303,19 @@ standard.
|
|||
|
||||
@enumerate
|
||||
@item
|
||||
@cindex character sets
|
||||
@cindex line endings
|
||||
The input file is read into memory and broken into lines.
|
||||
|
||||
CPP expects its input to be a text file, that is, an unstructured
|
||||
stream of ASCII characters, with some characters indicating the end of a
|
||||
line of text. Extended ASCII character sets, such as ISO Latin-1 or
|
||||
Unicode encoded in UTF-8, are also acceptable. Character sets that are
|
||||
not strict supersets of seven-bit ASCII will not work. We plan to add
|
||||
complete support for international character sets in a future release.
|
||||
|
||||
Different systems use different conventions to indicate the end of a
|
||||
line. GCC accepts the ASCII control sequences @kbd{LF}, @kbd{@w{CR
|
||||
LF}} and @kbd{CR} as end-of-line markers. These
|
||||
are the canonical sequences used by Unix, DOS and VMS, and the
|
||||
classic Mac OS (before OSX) respectively. You may therefore safely copy
|
||||
source code written on any of those systems to a different one and use
|
||||
it without conversion. (GCC may lose track of the current line number
|
||||
if a file doesn't consistently use one convention, as sometimes happens
|
||||
when it is edited on computers with different conventions that share a
|
||||
network file system.)
|
||||
LF}} and @kbd{CR} as end-of-line markers. These are the canonical
|
||||
sequences used by Unix, DOS and VMS, and the classic Mac OS (before
|
||||
OSX) respectively. You may therefore safely copy source code written
|
||||
on any of those systems to a different one and use it without
|
||||
conversion. (GCC may lose track of the current line number if a file
|
||||
doesn't consistently use one convention, as sometimes happens when it
|
||||
is edited on computers with different conventions that share a network
|
||||
file system.)
|
||||
|
||||
If the last line of any input file lacks an end-of-line marker, the end
|
||||
of the file is considered to implicitly supply one. The C standard says
|
||||
|
@ -378,8 +422,9 @@ comment.
|
|||
@end group
|
||||
@end example
|
||||
|
||||
Comments are not recognized within string literals. @t{@w{"/* blah
|
||||
*/"}} is the string constant @samp{@w{/* blah */}}, not an empty string.
|
||||
Comments are not recognized within string literals.
|
||||
@t{@w{"/* blah */"}} is the string constant @samp{@w{/* blah */}}, not
|
||||
an empty string.
|
||||
|
||||
Line comments are not in the 1989 edition of the C standard, but they
|
||||
are recognized by GCC as an extension. In C++ and in the 1999 edition
|
||||
|
@ -3706,8 +3751,9 @@ and stick to it.
|
|||
@item The mapping of physical source file multi-byte characters to the
|
||||
execution character set.
|
||||
|
||||
Currently, GNU cpp only supports character sets that are strict supersets
|
||||
of ASCII, and performs no translation of characters.
|
||||
Currently, CPP requires its input to be ASCII or UTF-8. The execution
|
||||
character set may be controlled by the user, with the
|
||||
@code{-ftarget-charset} and @code{-ftarget-wide-charset} options.
|
||||
|
||||
@item Identifier characters.
|
||||
@anchor{Identifier characters}
|
||||
|
|
|
@ -498,6 +498,21 @@ correct column numbers in warnings or errors, even if tabs appear on the
|
|||
line. If the value is less than 1 or greater than 100, the option is
|
||||
ignored. The default is 8.
|
||||
|
||||
@item -fexec-charset=@var{charset}
|
||||
@opindex fexec-charset
|
||||
Set the execution character set, used for string and character
|
||||
constants. The default is UTF-8. @var{charset} can be any encoding
|
||||
supported by the system's @code{iconv} library routine.
|
||||
|
||||
@item -fwide-exec-charset=@var{charset}
|
||||
@opindex fwide-exec-charset
|
||||
Set the wide execution character set, used for wide string and
|
||||
character constants. The default is UTF-32 or UTF-16, whichever
|
||||
corresponds to the width of @code{wchar_t}. As with
|
||||
@option{-ftarget-charset}, @var{charset} can be any encoding supported
|
||||
by the system's @code{iconv} library routine; however, you will have
|
||||
problems with encodings that do not fit exactly in @code{wchar_t}.
|
||||
|
||||
@item -fno-show-column
|
||||
@opindex fno-show-column
|
||||
Do not print column numbers in diagnostics. This may be necessary if
|
||||
|
|
|
@ -439,7 +439,6 @@ extensions, accepted by GCC in C89 mode and in C++.
|
|||
* Empty Structures:: Structures with no members.
|
||||
* Variadic Macros:: Macros with a variable number of arguments.
|
||||
* Escaped Newlines:: Slightly looser rules for escaped newlines.
|
||||
* Multi-line Strings:: String literals with embedded newlines.
|
||||
* Subscripting:: Any array can be subscripted, even if not an lvalue.
|
||||
* Pointer Arith:: Arithmetic on @code{void}-pointers and function pointers.
|
||||
* Initializers:: Non-constant initializers.
|
||||
|
@ -1529,27 +1528,14 @@ argument, these arguments are not macro expanded.
|
|||
|
||||
Recently, the preprocessor has relaxed its treatment of escaped
|
||||
newlines. Previously, the newline had to immediately follow a
|
||||
backslash. The current implementation allows whitespace in the form of
|
||||
spaces, horizontal and vertical tabs, and form feeds between the
|
||||
backslash. The current implementation allows whitespace in the form
|
||||
of spaces, horizontal and vertical tabs, and form feeds between the
|
||||
backslash and the subsequent newline. The preprocessor issues a
|
||||
warning, but treats it as a valid escaped newline and combines the two
|
||||
lines to form a single logical line. This works within comments and
|
||||
tokens, including multi-line strings, as well as between tokens.
|
||||
Comments are @emph{not} treated as whitespace for the purposes of this
|
||||
relaxation, since they have not yet been replaced with spaces.
|
||||
|
||||
@node Multi-line Strings
|
||||
@section String Literals with Embedded Newlines
|
||||
@cindex multi-line string literals
|
||||
|
||||
As an extension, GNU CPP permits string literals to cross multiple lines
|
||||
without escaping the embedded newlines. Each embedded newline is
|
||||
replaced with a single @samp{\n} character in the resulting string
|
||||
literal, regardless of what form the newline took originally.
|
||||
|
||||
CPP currently allows such strings in directives as well (other than the
|
||||
@samp{#include} family). This is deprecated and will eventually be
|
||||
removed.
|
||||
tokens, as well as between tokens. Comments are @emph{not} treated as
|
||||
whitespace for the purposes of this relaxation, since they have not
|
||||
yet been replaced with spaces.
|
||||
|
||||
@node Subscripting
|
||||
@section Non-Lvalue Arrays May Have Subscripts
|
||||
|
@ -4437,18 +4423,47 @@ This extension is not supported by GNU C++.
|
|||
|
||||
@node Function Names
|
||||
@section Function Names as Strings
|
||||
@cindex @code{__func__} identifier
|
||||
@cindex @code{__FUNCTION__} identifier
|
||||
@cindex @code{__PRETTY_FUNCTION__} identifier
|
||||
@cindex @code{__func__} identifier
|
||||
|
||||
GCC predefines two magic identifiers to hold the name of the current
|
||||
function. The identifier @code{__FUNCTION__} holds the name of the function
|
||||
as it appears in the source. The identifier @code{__PRETTY_FUNCTION__}
|
||||
holds the name of the function pretty printed in a language specific
|
||||
fashion.
|
||||
GCC provides three magic variables which hold the name of the current
|
||||
function, as a string. The first of these is @code{__func__}, which
|
||||
is part of the C99 standard:
|
||||
|
||||
These names are always the same in a C function, but in a C++ function
|
||||
they may be different. For example, this program:
|
||||
@display
|
||||
The identifier @code{__func__} is implicitly declared by the translator
|
||||
as if, immediately following the opening brace of each function
|
||||
definition, the declaration
|
||||
|
||||
@smallexample
|
||||
static const char __func__[] = "function-name";
|
||||
@end smallexample
|
||||
|
||||
appeared, where function-name is the name of the lexically-enclosing
|
||||
function. This name is the unadorned name of the function.
|
||||
@end display
|
||||
|
||||
@code{__FUNCTION__} is another name for @code{__func__}. Older
|
||||
versions of GCC recognize only this name. However, it is not
|
||||
standardized. For maximum portability, we recommend you use
|
||||
@code{__func__}, but provide a fallback definition with the
|
||||
preprocessor:
|
||||
|
||||
@smallexample
|
||||
#if __STDC_VERSION__ < 199901L
|
||||
# if __GNUC__ >= 2
|
||||
# define __func__ __FUNCTION__
|
||||
# else
|
||||
# define __func__ "<unknown>"
|
||||
# endif
|
||||
#endif
|
||||
@end smallexample
|
||||
|
||||
In C, @code{__PRETTY_FUNCTION__} is yet another name for
|
||||
@code{__func__}. However, in C++, @code{__PRETTY_FUNCTION__} contains
|
||||
the type signature of the function as well as its bare name. For
|
||||
example, this program:
|
||||
|
||||
@smallexample
|
||||
extern "C" @{
|
||||
|
@ -4478,46 +4493,16 @@ gives this output:
|
|||
|
||||
@smallexample
|
||||
__FUNCTION__ = sub
|
||||
__PRETTY_FUNCTION__ = int a::sub (int)
|
||||
__PRETTY_FUNCTION__ = void a::sub(int)
|
||||
@end smallexample
|
||||
|
||||
The compiler automagically replaces the identifiers with a string
|
||||
literal containing the appropriate name. Thus, they are neither
|
||||
preprocessor macros, like @code{__FILE__} and @code{__LINE__}, nor
|
||||
variables. This means that they catenate with other string literals, and
|
||||
that they can be used to initialize char arrays. For example
|
||||
|
||||
@smallexample
|
||||
char here[] = "Function " __FUNCTION__ " in " __FILE__;
|
||||
@end smallexample
|
||||
|
||||
On the other hand, @samp{#ifdef __FUNCTION__} does not have any special
|
||||
meaning inside a function, since the preprocessor does not do anything
|
||||
special with the identifier @code{__FUNCTION__}.
|
||||
|
||||
Note that these semantics are deprecated, and that GCC 3.2 will handle
|
||||
@code{__FUNCTION__} and @code{__PRETTY_FUNCTION__} the same way as
|
||||
@code{__func__}. @code{__func__} is defined by the ISO standard C99:
|
||||
|
||||
@display
|
||||
The identifier @code{__func__} is implicitly declared by the translator
|
||||
as if, immediately following the opening brace of each function
|
||||
definition, the declaration
|
||||
|
||||
@smallexample
|
||||
static const char __func__[] = "function-name";
|
||||
@end smallexample
|
||||
|
||||
appeared, where function-name is the name of the lexically-enclosing
|
||||
function. This name is the unadorned name of the function.
|
||||
@end display
|
||||
|
||||
By this definition, @code{__func__} is a variable, not a string literal.
|
||||
In particular, @code{__func__} does not catenate with other string
|
||||
literals.
|
||||
|
||||
In @code{C++}, @code{__FUNCTION__} and @code{__PRETTY_FUNCTION__} are
|
||||
variables, declared in the same way as @code{__func__}.
|
||||
These identifiers are not preprocessor macros. In GCC 3.3 and
|
||||
earlier, in C only, @code{__FUNCTION__} and @code{__PRETTY_FUNCTION__}
|
||||
were treated as string literals; they could be used to initialize
|
||||
@code{char} arrays, and they could be concatenated with other string
|
||||
literals. GCC 3.4 and later treat them as variables, like
|
||||
@code{__func__}. In C++, @code{__FUNCTION__} and
|
||||
@code{__PRETTY_FUNCTION__} have always been variables.
|
||||
|
||||
@node Return Address
|
||||
@section Getting the Return or Frame Address of a Function
|
||||
|
|
|
@ -1274,18 +1274,18 @@ my_build_string (len, str)
|
|||
return fix_string_type (build_string (len, str));
|
||||
}
|
||||
|
||||
/* Given a chain of STRING_CST's, build a static instance of
|
||||
NXConstantString which points at the concatenation of those strings.
|
||||
/* Build a static instance of NXConstantString which points at the
|
||||
string constant STRING.
|
||||
We place the string object in the __string_objects section of the
|
||||
__OBJC segment. The Objective-C runtime will initialize the isa
|
||||
pointers of the string objects to point at the NXConstantString
|
||||
class object. */
|
||||
|
||||
tree
|
||||
build_objc_string_object (strings)
|
||||
tree strings;
|
||||
build_objc_string_object (string)
|
||||
tree string;
|
||||
{
|
||||
tree string, initlist, constructor;
|
||||
tree initlist, constructor;
|
||||
int length;
|
||||
|
||||
if (lookup_interface (constant_string_id) == NULL_TREE)
|
||||
|
@ -1297,22 +1297,6 @@ build_objc_string_object (strings)
|
|||
|
||||
add_class_reference (constant_string_id);
|
||||
|
||||
if (TREE_CHAIN (strings))
|
||||
{
|
||||
varray_type vstrings;
|
||||
VARRAY_TREE_INIT (vstrings, 32, "strings");
|
||||
|
||||
for (; strings ; strings = TREE_CHAIN (strings))
|
||||
VARRAY_PUSH_TREE (vstrings, strings);
|
||||
|
||||
string = combine_strings (vstrings);
|
||||
}
|
||||
else
|
||||
string = strings;
|
||||
|
||||
string = fix_string_type (string);
|
||||
|
||||
TREE_SET_CODE (string, STRING_CST);
|
||||
length = TREE_STRING_LENGTH (string) - 1;
|
||||
|
||||
/* We could not properly create NXConstantString in synth_module_prologue,
|
||||
|
|
|
@ -1,3 +1,17 @@
|
|||
2003-07-04 Zack Weinberg <zack@codesourcery.com>
|
||||
|
||||
* gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c
|
||||
everywhere.
|
||||
* gcc.dg/concat.c: Concatenation of string constants with
|
||||
__FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error.
|
||||
* gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp.
|
||||
* gcc.dg/cpp/escape-2.c: Use wide character constants where
|
||||
necessary to avoid multi-character character constant warning.
|
||||
* gcc.dg/cpp/escape.c: Likewise.
|
||||
* gcc.dg/cpp/ucs.c: Likewise.
|
||||
Remove backslashes from dg-bogus comments, as they confuse Tcl.
|
||||
Fix a typo.
|
||||
|
||||
2003-07-04 Kazu Hirata <kazu@cs.umass.edu>
|
||||
|
||||
PR c/11428
|
||||
|
|
3
gcc/testsuite/gcc.c-torture/execute/wchar_t-1.x
Normal file
3
gcc/testsuite/gcc.c-torture/execute/wchar_t-1.x
Normal file
|
@ -0,0 +1,3 @@
|
|||
# Doesn't compile due to use of literal ISO8859.1 characters. PR 11439.
|
||||
set torture_compile_xfail "*-*-*"
|
||||
return 0
|
|
@ -2,15 +2,15 @@
|
|||
|
||||
/* { dg-do compile } */
|
||||
|
||||
/* Test we output a warning for concatenation of artificial strings.
|
||||
/* Test we output an error for concatenation of artificial strings.
|
||||
|
||||
Neil Booth, 10 Dec 2001. */
|
||||
|
||||
void foo ()
|
||||
{
|
||||
char str1[] = __FUNCTION__ "."; /* { dg-warning "deprecated" } */
|
||||
char str2[] = __PRETTY_FUNCTION__ ".";/* { dg-warning "deprecated" } */
|
||||
char str3[] = "." __FUNCTION__; /* { dg-warning "deprecated" } */
|
||||
char str4[] = "." __PRETTY_FUNCTION__;/* { dg-warning "deprecated" } */
|
||||
char str5[] = "." "."; /* No warning. */
|
||||
char s1[] = __FUNCTION__"."; /* { dg-error "(parse|syntax|invalid)" } */
|
||||
char s2[] = __PRETTY_FUNCTION__".";/* { dg-error "(parse|syntax|invalid)" } */
|
||||
char s3[] = "."__FUNCTION__; /* { dg-error "(parse|syntax|invalid)" } */
|
||||
char s4[] = "."__PRETTY_FUNCTION__;/* { dg-error "(parse|syntax|invalid)" } */
|
||||
char s5[] = ".""."; /* No error. */
|
||||
}
|
||||
|
|
|
@ -10,11 +10,11 @@
|
|||
|
||||
#if '\e' /* { dg-warning "non-ISO" "non-ISO \\e" } */
|
||||
#endif
|
||||
#if '\u00a0' /* { dg-bogus "unknown" "\\u is known in C99" } */
|
||||
#if L'\u00a0' /* { dg-bogus "unknown" "\\u is known in C99" } */
|
||||
#endif
|
||||
|
||||
void foo ()
|
||||
{
|
||||
int c = '\E'; /* { dg-warning "non-ISO" "non-ISO \\E" } */
|
||||
c = '\u00a0'; /* { dg-bogus "unknown" "\\u is known in C99" } */
|
||||
c = L'\u00a0'; /* { dg-bogus "unknown" "\\u is known in C99" } */
|
||||
}
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
#if '\x1a' != 26 /* { dg-warning "traditional" "traditional hex" } */
|
||||
#error bad hex /* { dg-bogus "bad" "bad hexadecimal evaluation" } */
|
||||
#endif
|
||||
#if '\u' /* { dg-warning "unknown" "\u is unknown in C89" } */
|
||||
#if L'\u00a1' /* { dg-warning "only valid" "\u is unknown in C89" } */
|
||||
#endif
|
||||
|
||||
void foo ()
|
||||
|
@ -21,5 +21,5 @@ void foo ()
|
|||
int c = '\a'; /* { dg-warning "traditional" "traditional bell" } */
|
||||
|
||||
c = '\xa1'; /* { dg-warning "traditional" "traditional hex" } */
|
||||
c = '\u'; /* { dg-warning "unknown" "\u is unknown in C89" } */
|
||||
c = L'\u00a1'; /* { dg-warning "only valid" "\u is unknown in C89" } */
|
||||
}
|
||||
|
|
|
@ -35,12 +35,12 @@
|
|||
#undef long
|
||||
|
||||
#if L'\u1234' != 0x1234
|
||||
#error bad short ucs /* { dg-bogus "bad" "bad \u1234 evaluation" } */
|
||||
#error bad short ucs /* { dg-bogus "bad" "bad u1234 evaluation" } */
|
||||
#endif
|
||||
|
||||
#if WCHAR_MAX >= 0x7ffffff
|
||||
# if L'\U1234abcd' != 0x1234abcd
|
||||
# error bad long ucs /* { dg-bogus "bad" "bad \U1234abcd evaluation" } */
|
||||
# error bad long ucs /* { dg-bogus "bad" "bad U1234abcd evaluation" } */
|
||||
# endif
|
||||
#endif
|
||||
|
||||
|
@ -48,7 +48,7 @@ void foo ()
|
|||
{
|
||||
int c;
|
||||
|
||||
c = L'\ubad'; /* { dg-error "incomplete" "incompete UCN 1" } */
|
||||
c = L'\ubad'; /* { dg-error "incomplete" "incomplete UCN 1" } */
|
||||
c = L"\U1234"[0]; /* { dg-error "incomplete" "incompete UCN 2" } */
|
||||
|
||||
c = L'\u000x'; /* { dg-error "incomplete" "non-hex digit in UCN" } */
|
||||
|
@ -58,7 +58,7 @@ void foo ()
|
|||
|
||||
c = '\u0024'; /* { dg-bogus "invalid" "0024 is a valid UCN" } */
|
||||
c = "\u0040"[0]; /* { dg-bogus "invalid" "0040 is a valid UCN" } */
|
||||
c = '\u00a0'; /* { dg-bogus "invalid" "00a0 is a valid UCN" } */
|
||||
c = L'\u00a0'; /* { dg-bogus "invalid" "00a0 is a valid UCN" } */
|
||||
c = '\U00000060'; /* { dg-bogus "invalid" "0060 is a valid UCN" } */
|
||||
|
||||
c = '\u0025'; /* { dg-error "not a valid" "0025 invalid UCN" } */
|
||||
|
|
|
@ -9,7 +9,7 @@ testfunc ()
|
|||
{
|
||||
const char *foo;
|
||||
|
||||
foo = "hello" "hello"; /* { dg-warning "string concatenation" "string concatenation" } */
|
||||
foo = "hello" "hello"; /* { dg-warning "concatenation" "string concatenation" } */
|
||||
|
||||
# 15 "sys-header.h" 3
|
||||
/* We are in system headers now, no -Wtraditional warnings should issue. */
|
||||
|
|
|
@ -1,3 +1,16 @@
|
|||
2003-07-04 Zack Weinberg <zack@codesourcery.com>
|
||||
|
||||
* testsuite/22_locale/collate/compare/wchar_t/2.cc
|
||||
* testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc
|
||||
* testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc
|
||||
* testsuite/22_locale/collate/hash/wchar_t/2.cc
|
||||
* testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc
|
||||
* testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc
|
||||
* testsuite/22_locale/collate/transform/wchar_t/2.cc
|
||||
* testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc
|
||||
* testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc:
|
||||
XFAIL on all targets.
|
||||
|
||||
2003-07-04 Benjamin Kosnik <bkoz@redhat.com>
|
||||
|
||||
* acinclude.m4 (GLIBCPP_ENABLE_PCH): Fix missed variable.
|
||||
|
|
|
@ -18,6 +18,10 @@
|
|||
// Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307,
|
||||
// USA.
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
#include <locale>
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_1
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_1
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <locale>
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_1
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_1
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <locale>
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_2
|
||||
|
|
|
@ -20,6 +20,10 @@
|
|||
|
||||
// 22.2.4.1.1 collate members
|
||||
|
||||
// Doesn't work due to use of literal ISO8859.1 characters. PR 11439
|
||||
// { dg-do compile { xfail *-*-* } } should be run
|
||||
// { dg-excess-errors "" }
|
||||
|
||||
#include <testsuite_hooks.h>
|
||||
|
||||
#define main discard_main_2
|
||||
|
|
Loading…
Add table
Reference in a new issue