The various decls relating to rich_location are in
libcpp/include/line-map.h, but they don't relate to line maps.
Split them out to their own header: libcpp/include/rich-location.h
No functional change intended.
gcc/ChangeLog:
* Makefile.in (CPPLIB_H): Add libcpp/include/rich-location.h.
* coretypes.h (class rich_location): New forward decl.
gcc/analyzer/ChangeLog:
* analyzer.h: Include "rich-location.h".
gcc/c-family/ChangeLog:
* c-lex.cc: Include "rich-location.h".
gcc/cp/ChangeLog:
* mapper-client.cc: Include "rich-location.h".
gcc/ChangeLog:
* diagnostic.h: Include "rich-location.h".
* edit-context.h (class fixit_hint): New forward decl.
* gcc-rich-location.h: Include "rich-location.h".
* genmatch.cc: Likewise.
* pretty-print.h: Likewise.
gcc/rust/ChangeLog:
* rust-location.h: Include "rich-location.h".
libcpp/ChangeLog:
* Makefile.in (TAGS_SOURCES): Add "include/rich-location.h".
* include/cpplib.h (class rich_location): New forward decl.
* include/line-map.h (class range_label)
(enum range_display_kind, struct location_range)
(class semi_embedded_vec, class rich_location, class label_text)
(class range_label, class fixit_hint): Move to...
* include/rich-location.h: ...this new file.
* internal.h: Include "rich-location.h".
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
The PR requests an enhancement to the diagnostic issued for the use of a
poisoned identifier. Currently, we show the location of the usage, but not
the location which requested the poisoning, which would be helpful for the
user if the decision to poison an identifier was made externally, such as
in a library header.
In order to output this information, we need to remember a location_t for
each identifier that has been poisoned, and that data needs to be preserved
as well in a PCH. One option would be to add a field to struct cpp_hashnode,
but there is no convenient place to add it without increasing the size of
the struct for all identifiers. Given this facility will be needed rarely,
it seemed better to add a second hash map, which is handled PCH-wise the
same as the current one in gcc/stringpool.cc. This hash map associates a new
struct cpp_hashnode_extra with each identifier that needs one. Currently
that struct only contains the new location_t, but it could be extended in
the future if there is other ancillary data that may be convenient to put
there for other purposes.
libcpp/ChangeLog:
PR preprocessor/36887
* directives.cc (do_pragma_poison): Store in the extra hash map the
location from which an identifier has been poisoned.
* lex.cc (identifier_diagnostics_on_lex): When issuing a diagnostic
for the use of a poisoned identifier, also add a note indicating the
location from which it was poisoned.
* identifiers.cc (alloc_node): Convert to template function.
(_cpp_init_hashtable): Handle the new extra hash map.
(_cpp_destroy_hashtable): Likewise.
* include/cpplib.h (struct cpp_hashnode_extra): New struct.
(cpp_create_reader): Update prototype to...
* init.cc (cpp_create_reader): ...accept an argument for the extra
hash table and pass it to _cpp_init_hashtable.
* include/symtab.h (ht_lookup): New overload for convenience.
* internal.h (struct cpp_reader): Add EXTRA_HASH_TABLE member.
(_cpp_init_hashtable): Adjust prototype.
gcc/c-family/ChangeLog:
PR preprocessor/36887
* c-opts.cc (c_common_init_options): Pass new extra hash map
argument to cpp_create_reader().
gcc/ChangeLog:
PR preprocessor/36887
* toplev.h (ident_hash_extra): Declare...
* stringpool.cc (ident_hash_extra): ...this new global variable.
(init_stringpool): Handle ident_hash_extra as well as ident_hash.
(ggc_mark_stringpool): Likewise.
(ggc_purge_stringpool): Likewise.
(struct string_pool_data_extra): New struct.
(spd2): New GC root variable.
(gt_pch_save_stringpool): Use spd2 to handle ident_hash_extra,
analogous to how spd is used to handle ident_hash.
(gt_pch_restore_stringpool): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/36887
* c-c++-common/cpp/diagnostic-poison.c: New test.
* g++.dg/pch/pr36887.C: New test.
* g++.dg/pch/pr36887.Hs: New test.
When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.
libcpp/ChangeLog:
PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.
gcc/testsuite/ChangeLog:
PR c++/66290
* c-c++-common/cpp/macro-ranges.c: New test.
* c-c++-common/cpp/line-2.c: Adapt to check for column information
on macro-related libcpp warnings.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/pr58844-1.c: Likewise.
* c-c++-common/cpp/pr58844-2.c: Likewise.
* c-c++-common/cpp/warning-zero-location.c: Likewise.
* c-c++-common/pragma-diag-14.c: Likewise.
* c-c++-common/pragma-diag-15.c: Likewise.
* g++.dg/modules/macro-2_d.C: Likewise.
* g++.dg/modules/macro-4_d.C: Likewise.
* g++.dg/modules/macro-4_e.C: Likewise.
* g++.dg/spellcheck-macro-ordering.C: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/Wunused.c: Likewise.
* gcc.dg/cpp/redef2.c: Likewise.
* gcc.dg/cpp/redef3.c: Likewise.
* gcc.dg/cpp/redef4.c: Likewise.
* gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-11.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.
Stephan Bergmann reported that our -Wbidi-chars breaks the build
of LibreOffice because we warn about UCNs even when their usage
is correct: LibreOffice constructs strings piecewise, as in:
aText = u"\u202D" + aText;
and warning about that is overzealous. Since no editor (AFAIK)
interprets UCNs to show them as Unicode characters, there's less
risk in misinterpreting them, and so perhaps we shouldn't warn
about them by default. However, identifiers containing UCNs or
programs generating other programs could still cause confusion,
so I'm keeping the UCN checking. To turn it on, you just need
to use -Wbidi-chars=unpaired,ucn or -Wbidi-chars=any,ucn.
The implementation is done by using the new EnumSet feature.
PR preprocessor/104030
gcc/c-family/ChangeLog:
* c.opt (Wbidi-chars): Mark as EnumSet. Also accept =ucn.
gcc/ChangeLog:
* doc/invoke.texi: Update documentation for -Wbidi-chars.
libcpp/ChangeLog:
* include/cpplib.h (enum cpp_bidirectional_level): Add
bidirectional_ucn. Set values explicitly.
* internal.h (cpp_reader): Adjust warn_bidi_p.
* lex.cc (maybe_warn_bidi_on_close): Don't warn about UCNs
unless UCN checking is on.
(maybe_warn_bidi_on_char): Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/Wbidi-chars-10.c: Turn on UCN checking.
* c-c++-common/Wbidi-chars-11.c: Likewise.
* c-c++-common/Wbidi-chars-14.c: Likewise.
* c-c++-common/Wbidi-chars-16.c: Likewise.
* c-c++-common/Wbidi-chars-17.c: Likewise.
* c-c++-common/Wbidi-chars-4.c: Likewise.
* c-c++-common/Wbidi-chars-5.c: Likewise.
* c-c++-common/Wbidi-chars-6.c: Likewise.
* c-c++-common/Wbidi-chars-7.c: Likewise.
* c-c++-common/Wbidi-chars-8.c: Likewise.
* c-c++-common/Wbidi-chars-9.c: Likewise.
* c-c++-common/Wbidi-chars-ranges.c: Likewise.
* c-c++-common/Wbidi-chars-18.c: New test.
* c-c++-common/Wbidi-chars-19.c: New test.
* c-c++-common/Wbidi-chars-20.c: New test.
* c-c++-common/Wbidi-chars-21.c: New test.
* c-c++-common/Wbidi-chars-22.c: New test.
* c-c++-common/Wbidi-chars-23.c: New test.
As the testcase show, sometimes _Pragma is turned into CPP_PRAGMA
.. CPP_PRAGMA_EOL tokens, even when it might still need to be
stringized later on. We are then ICEing because we don't handle
stringification of CPP_PRAGMA or CPP_PRAGMA_EOL, but trying to
reconstruct the exact tokens with exact spacing after it has been
lowered is very hard. So, instead this patch ensures we don't
lower _Pragma during expand_arg calls, but only later when
cpp_get_token_1 is called outside of expand_arg.
2021-11-22 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tobias@codesourcery.com>
PR preprocessor/103165
libcpp/
* internal.h (struct lexer_state): Add ignore__Pragma field.
* macro.c (builtin_macro): Don't interpret _Pragma if
pfile->state.ignore__Pragma.
(expand_arg): Temporarily set pfile->state.ignore__Pragma to 1.
gcc/testsuite/
* c-c++-common/gomp/pragma-3.c: New test.
* c-c++-common/gomp/pragma-4.c: New test.
* c-c++-common/gomp/pragma-5.c: New test.
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From a link below:
"An issue was discovered in the Bidirectional Algorithm in the Unicode
Specification through 14.0. It permits the visual reordering of
characters via control sequences, which can be used to craft source code
that renders different logic than the logical ordering of tokens
ingested by compilers and interpreters. Adversaries can leverage this to
encode source code for compilers accepting Unicode such that targeted
vulnerabilities are introduced invisibly to human reviewers."
More info:
https://nvd.nist.gov/vuln/detail/CVE-2021-42574https://trojansource.codes/
This is not a compiler bug. However, to mitigate the problem, this patch
implements -Wbidi-chars=[none|unpaired|any] to warn about possibly
misleading Unicode bidirectional control characters the preprocessor may
encounter.
The default is =unpaired, which warns about improperly terminated
bidirectional control characters; e.g. a LRE without its corresponding PDF.
The level =any warns about any use of bidirectional control characters.
This patch handles both UCNs and UTF-8 characters. UCNs designating
bidi characters in identifiers are accepted since r204886. Then r217144
enabled -fextended-identifiers by default. Extended characters in C/C++
identifiers have been accepted since r275979. However, this patch still
warns about mixing UTF-8 and UCN bidi characters; there seems to be no
good reason to allow mixing them.
We warn in different contexts: comments (both C and C++-style), string
literals, character constants, and identifiers. Expectedly, UCNs are ignored
in comments and raw string literals. The bidirectional control characters
can nest so this patch handles that as well.
I have not included nor tested this at all with Fortran (which also has
string literals and line comments).
Dave M. posted patches improving diagnostic involving Unicode characters.
This patch does not make use of this new infrastructure yet.
PR preprocessor/103026
gcc/c-family/ChangeLog:
* c.opt (Wbidi-chars, Wbidi-chars=): New option.
gcc/ChangeLog:
* doc/invoke.texi: Document -Wbidi-chars.
libcpp/ChangeLog:
* include/cpplib.h (enum cpp_bidirectional_level): New.
(struct cpp_options): Add cpp_warn_bidirectional.
(enum cpp_warning_reason): Add CPP_W_BIDIRECTIONAL.
* internal.h (struct cpp_reader): Add warn_bidi_p member
function.
* init.c (cpp_create_reader): Set cpp_warn_bidirectional.
* lex.c (bidi): New namespace.
(get_bidi_utf8): New function.
(get_bidi_ucn): Likewise.
(maybe_warn_bidi_on_close): Likewise.
(maybe_warn_bidi_on_char): Likewise.
(_cpp_skip_block_comment): Implement warning about bidirectional
control characters.
(skip_line_comment): Likewise.
(forms_identifier_p): Likewise.
(lex_identifier): Likewise.
(lex_string): Likewise.
(lex_raw_string): Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/Wbidi-chars-1.c: New test.
* c-c++-common/Wbidi-chars-2.c: New test.
* c-c++-common/Wbidi-chars-3.c: New test.
* c-c++-common/Wbidi-chars-4.c: New test.
* c-c++-common/Wbidi-chars-5.c: New test.
* c-c++-common/Wbidi-chars-6.c: New test.
* c-c++-common/Wbidi-chars-7.c: New test.
* c-c++-common/Wbidi-chars-8.c: New test.
* c-c++-common/Wbidi-chars-9.c: New test.
* c-c++-common/Wbidi-chars-10.c: New test.
* c-c++-common/Wbidi-chars-11.c: New test.
* c-c++-common/Wbidi-chars-12.c: New test.
* c-c++-common/Wbidi-chars-13.c: New test.
* c-c++-common/Wbidi-chars-14.c: New test.
* c-c++-common/Wbidi-chars-15.c: New test.
* c-c++-common/Wbidi-chars-16.c: New test.
* c-c++-common/Wbidi-chars-17.c: New test.
This patch adds support to GCC's diagnostic subsystem for escaping certain
bytes and Unicode characters when quoting source code.
Specifically, this patch adds a new flag rich_location::m_escape_on_output
which is a hint from a diagnostic that non-ASCII bytes in the pertinent
lines of the user's source code should be escaped when printed.
The patch sets this for the following diagnostics:
- when complaining about stray bytes in the program (when these
are non-printable)
- when complaining about "null character(s) ignored");
- for -Wnormalized= (and generate source ranges for such warnings)
The escaping is controlled by a new option:
-fdiagnostics-escape-format=[unicode|bytes]
For example, consider a diagnostic involing a source line containing the
string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
(a stray UTF-8 trailing byte), followed by the string "after", where the
diagnostic highlights the U+03C0 character.
By default, this line will be printed verbatim to the user when
reporting a diagnostic at it, as:
beforeπXafter
^
(using X for the stray byte to avoid putting invalid UTF-8 in this
commit message)
If the diagnostic sets the "escape" flag, it will be printed as:
before<U+03C0><BF>after
^~~~~~~~
with -fdiagnostics-escape-format=unicode (the default), or as:
before<CF><80><BF>after
^~~~~~~~
if the user supplies -fdiagnostics-escape-format=bytes.
This only affects how the source is printed; it does not affect
how column numbers that are printed (as per -fdiagnostics-column-unit=
and -fdiagnostics-column-origin=).
gcc/c-family/ChangeLog:
* c-lex.c (c_lex_with_flags): When complaining about non-printable
CPP_OTHER tokens, set the "escape on output" flag.
gcc/ChangeLog:
* common.opt (fdiagnostics-escape-format=): New.
(diagnostics_escape_format): New enum.
(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
* diagnostic-format-json.cc (json_end_diagnostic): Add
"escape-source" attribute.
* diagnostic-show-locus.c
(exploc_with_display_col::exploc_with_display_col): Replace
"tabstop" param with a cpp_char_column_policy and add an "aspect"
param. Use these to compute m_display_col accordingly.
(struct char_display_policy): New struct.
(layout::m_policy): New field.
(layout::m_escape_on_output): New field.
(def_policy): New function.
(make_range): Update for changes to exploc_with_display_col ctor.
(default_print_decoded_ch): New.
(width_per_escaped_byte): New.
(escape_as_bytes_width): New.
(escape_as_bytes_print): New.
(escape_as_unicode_width): New.
(escape_as_unicode_print): New.
(make_policy): New.
(layout::layout): Initialize new fields. Update m_exploc ctor
call for above change to ctor.
(layout::maybe_add_location_range): Update for changes to
exploc_with_display_col ctor.
(layout::calculate_x_offset_display): Update for change to
cpp_display_width.
(layout::print_source_line): Pass policy
to cpp_display_width_computation. Capture cpp_decoded_char when
calling process_next_codepoint. Move printing of source code to
m_policy.m_print_cb.
(line_label::line_label): Pass in policy rather than context.
(layout::print_any_labels): Update for change to line_label ctor.
(get_affected_range): Pass in policy rather than context, updating
calls to location_compute_display_column accordingly.
(get_printed_columns): Likewise, also for cpp_display_width.
(correction::correction): Pass in policy rather than tabstop.
(correction::compute_display_cols): Pass m_policy rather than
m_tabstop to cpp_display_width.
(correction::m_tabstop): Replace with...
(correction::m_policy): ...this.
(line_corrections::line_corrections): Pass in policy rather than
context.
(line_corrections::m_context): Replace with...
(line_corrections::m_policy): ...this.
(line_corrections::add_hint): Update to use m_policy rather than
m_context.
(line_corrections::add_hint): Likewise.
(layout::print_trailing_fixits): Likewise.
(selftest::test_display_widths): New.
(selftest::test_layout_x_offset_display_utf8): Update to use
policy rather than tabstop.
(selftest::test_one_liner_labels_utf8): Add test of escaping
source lines.
(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
use policy rather than tabstop.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): New.
(selftest::test_escaping_bytes_2): New.
(selftest::diagnostic_show_locus_c_tests): Call the new tests.
* diagnostic.c (diagnostic_initialize): Initialize
context->escape_format.
(convert_column_unit): Update to use default character width policy.
(selftest::test_diagnostic_get_location_text): Likewise.
* diagnostic.h (enum diagnostics_escape_format): New enum.
(diagnostic_context::escape_format): New field.
* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
(-fdiagnostics-format=): Add "escape-source" attribute to examples
of JSON output, and document it.
* input.c (location_compute_display_column): Pass in "policy"
rather than "tabstop", passing to
cpp_byte_column_to_display_column.
(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
* input.h (class cpp_char_column_policy): New forward decl.
(location_compute_display_column): Pass in "policy" rather than
"tabstop".
* opts.c (common_handle_option): Handle
OPT_fdiagnostics_escape_format_.
* selftest.c (temp_source_file::temp_source_file): New ctor
overload taking a size_t.
* selftest.h (temp_source_file::temp_source_file): Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
"escape-source" attribute.
* c-c++-common/diagnostic-format-json-2.c: Likewise.
* c-c++-common/diagnostic-format-json-3.c: Likewise.
* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
* c-c++-common/diagnostic-format-json-5.c: Likewise.
* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
* gcc.dg/encoding-issues-bytes.c: New test.
* gcc.dg/encoding-issues-unicode.c: New test.
* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
"escape-source" attribute.
* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
* gfortran.dg/diagnostic-format-json-3.F90: Likewise.
libcpp/ChangeLog:
* charset.c (convert_escape): Use encoding_rich_location when
complaining about nonprintable unknown escape sequences.
(cpp_display_width_computation::::cpp_display_width_computation):
Pass in policy rather than tabstop.
(cpp_display_width_computation::process_next_codepoint): Add "out"
param and populate *out if non-NULL.
(cpp_display_width_computation::advance_display_cols): Pass NULL
to process_next_codepoint.
(cpp_byte_column_to_display_column): Pass in policy rather than
tabstop. Pass NULL to process_next_codepoint.
(cpp_display_column_to_byte_column): Pass in policy rather than
tabstop.
* errors.c (cpp_diagnostic_get_current_location): New function,
splitting out the logic from...
(cpp_diagnostic): ...here.
(cpp_warning_at): New function.
(cpp_pedwarning_at): New function.
* include/cpplib.h (cpp_warning_at): New decl for rich_location.
(cpp_pedwarning_at): Likewise.
(struct cpp_decoded_char): New.
(struct cpp_char_column_policy): New.
(cpp_display_width_computation::cpp_display_width_computation):
Replace "tabstop" param with "policy".
(cpp_display_width_computation::process_next_codepoint): Add "out"
param.
(cpp_display_width_computation::m_tabstop): Replace with...
(cpp_display_width_computation::m_policy): ...this.
(cpp_byte_column_to_display_column): Replace "tabstop" param with
"policy".
(cpp_display_width): Likewise.
(cpp_display_column_to_byte_column): Likewise.
* include/line-map.h (rich_location::escape_on_output_p): New.
(rich_location::set_escape_on_output): New.
(rich_location::m_escape_on_output): New.
* internal.h (cpp_diagnostic_get_current_location): New decl.
(class encoding_rich_location): New.
* lex.c (skip_whitespace): Use encoding_rich_location when
complaining about null characters.
(warn_about_normalization): Generate a source range when
complaining about improperly normalized tokens, rather than just a
point, and use encoding_rich_location so that the source code
is escaped on printing.
* line-map.c (rich_location::rich_location): Initialize
m_escape_on_output.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This defect really required building header-units and include translation
of pieces of the standard library. This adds smarts to the modules
test harness to do that -- accept .X files as the source file, but
provide '-x c++-system-header $HDR' in the options. The .X file will
be considered by the driver to be a linker script and ignored (with a
warning).
Using this we can add 2 tests that end up building list_initializer
and iostream, along with a test that iostream's build
include-translates list_initializer's #include. That discovered a set
of issues with the -flang-info-include-translate=HDR handling, also
fixed and documented here.
PR c++/99023
gcc/cp/
* module.cc (canonicalize_header_name): Use
cpp_probe_header_unit.
(maybe_translate_include): Fix note_includes comparison.
(init_modules): Fix note_includes string termination.
libcpp/
* include/cpplib.h (cpp_find_header_unit): Rename to ...
(cpp_probe_header_unit): ... this.
* internal.h (_cp_find_header_unit): Declare.
* files.c (cpp_find_header_unit): Break apart to ..
(test_header_unit): ... this, and ...
(_cpp_find_header_unit): ... and, or and ...
(cpp_probe_header_unit): ... this.
* macro.c (cpp_get_token_1): Call _cpp_find_header_unit.
gcc/
* doc/invoke.texi (flang-info-include-translate): Document header
lookup behaviour.
gcc/testsuite/
* g++.dg/modules/modules.exp: Bail on cross-testing. Add support
for .X files.
* g++.dg/modules/pr99023_a.X: New.
* g++.dg/modules/pr99023_b.X: New.
Deferred macros are needed for C++ modules. Header units may export
macro definitions and undefinitions. These are resolved lazily at the
point of (potential) use. (The language specifies that, it's not just
a useful optimization.) Thus, identifier nodes grow a 'deferred'
field, which fortunately doesn't expand the structure on 64-bit
systems as there was padding there. This is non-zero on NT_MACRO
nodes, if the macro is deferred. When such an identifier is lexed, it
is resolved via a callback that I added recently. That will either
provide the macro definition, or discover it there was an overriding
undef. Either way the identifier is no longer a deferred macro.
Notice it is now possible for NT_MACRO nodes to have a NULL macro
expansion.
libcpp/
* include/cpplib.h (struct cpp_hashnode): Add deferred field.
(cpp_set_deferred_macro): Define.
(cpp_get_deferred_macro): Declare.
(cpp_macro_definition): Reformat, add overload.
(cpp_macro_definition_location): Deal with deferred macro.
(cpp_alloc_token_string, cpp_compare_macro): Declare.
* internal.h (_cpp_notify_macro_use): Return bool
(_cpp_maybe_notify_macro_use): Likewise.
* directives.c (do_undef): Check macro is not undef before
warning.
(do_ifdef, do_ifndef): Deal with deferred macro.
* expr.c (parse_defined): Likewise.
* lex.c (cpp_allocate_token_string): Break out of ...
(create_literal): ... here. Call it.
(cpp_maybe_module_directive): Deal with deferred macro.
* macro.c (cpp_get_token_1): Deal with deferred macro.
(warn_of_redefinition): Deal with deferred macro.
(compare_macros): Rename to ...
(cpp_compare_macro): ... here. Make extern.
(cpp_get_deferred_macro): New.
(_cpp_notify_macro_use): Deal with deferred macro, return bool
indicating definedness.
(cpp_macro_definition): Deal with deferred macro.
This adds the capability to locate the main file on the user or system
include paths. That's extremely useful to users building header
units. Searching has to be requiested (plain header-unit compilation
will not search). Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.
libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file. Set main_loc
* files.c (cpp_retrofit_as_include): New.
In preparing module patch 7 I realized there was a cleanup I could
make to simplify it. This is that cleanup. Also, when doing the
cleanup I noticed some macros had been turned into inline functions,
but not renamed to the preprocessors internal namespace
(_cpp_$INTERNAL rather than cpp_$USER). Thus, this renames those
functions, deletes an internal field of the file structure, and
determines whether we're in the main file by comparing to
pfile->main_file, the _cpp_file of the main file.
libcpp/
* internal.h (cpp_in_system_header): Rename to ...
(_cpp_in_system_header): ... here.
(cpp_in_primary_file): Rename to ...
(_cpp_in_main_source_file): ... here. Compare main_file equality
and check main_search value.
* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
* macro.c (_cpp_builtin_macro_text): Likewise.
(replace_args): Likewise.
* directives.c (do_include_next): Likewise.
(do_pragma_once, do_pragma_system_header): Likewise.
* files.c (struct _cpp_file): Delete main_file field.
(pch_open): Check pfile->main_file equality.
(make_cpp_file): Drop cpp_reader parm, don't set main_file.
(_cpp_find_file): Adjust.
(_cpp_stack_file): Check pfile->main_file equality.
(struct report_missing_guard_data): Add cpp_reader field.
(report_missing_guard): Check pfile->main_file equality.
(_cpp_report_missing_guards): Adjust.
C++20 modules introduces a new kind of preprocessor directive -- a
module directive. These are directives but without the leading '#'.
We have to detect them by sniffing the start of a logical line. When
detected we replace the initial identifiers with unspellable tokens
and pass them through to the language parser the same way deferred
pragmas are. There's a PRAGMA_EOL at the logical end of line too.
One additional complication is that we have to do header-name lexing
after the initial tokens, and that requires changes in the macro-aware
piece of the preprocessor. The above sniffer sets a counter in the
lexer state, and that triggers at the appropriate point. We then do
the same header-name lexing that occurs on a #include directive or
has_include pseudo-macro. Except that the header name ends up in the
token stream.
A couple of token emitters need to deal with the new token possibility.
gcc/c-family/
* c-lex.c (c_lex_with_flags): CPP_HEADER_NAMEs can now be seen.
libcpp/
* include/cpplib.h (struct cpp_options): Add module_directives
option.
(NODE_MODULE): New node flag.
(struct cpp_hashnode): Make rid-code a bitfield, increase bits in
flags and swap with type field.
* init.c (post_options): Create module-directive identifier nodes.
* internal.h (struct lexer_state): Add directive_file_token &
n_modules fields. Add module node enumerator.
* lex.c (cpp_maybe_module_directive): New.
(_cpp_lex_token): Call it.
(cpp_output_token): Add '"' around CPP_HEADER_NAME token.
(do_peek_ident, do_peek_module): New.
(cpp_directives_only): Detect module-directive lines.
* macro.c (cpp_get_token_1): Deal with directive_file_token
triggering.
This patch adds LC_MODULE as a map kind, used to indicate a c++
module. Unlike a regular source file, it only contains a single
location, and the source locations in that module are represented by
ordinary locations whose 'included_from' location is the module.
It also exposes some entry points that modules will use to create
blocks of line maps.
In the original posting, I'd missed the deletion of the
linemap_enter_macro from internal.h. That's included here.
libcpp/
* include/line-map.h (enum lc_reason): Add LC_MODULE.
(MAP_MODULE_P): New.
(line_map_new_raw): Declare.
(linemap_enter_macro): Move declaration from internal.h
(linemap_module_loc, linemap_module_reparent)
(linemap_module_restore): Declare.
(linemap_lookup_macro_indec): Declare.
* internal.h (linemap_enter_macro): Moved to line-map.h.
* line-map.c (linemap_new_raw): New, broken out of ...
(new_linemap): ... here. Call it.
(LAST_SOURCE_LINE_LOCATION): New.
(liemap_module_loc, linemap_module_reparent)
(linemap_module_restore): New.
(linemap_lookup_macro_index): New, broken out of ...
(linemap_macro_map_lookup): ... here. Call it.
(linemap_dump): Add module dump.
Joseph pointed me at cb_get_source_date_epoch, which allows repeatable
builds and solves a FIXME I had on the modules branch. Unfortunately
it's used exclusively to generate __DATE__ and __TIME__ values, which
fallback to using a time(2) call. It'd be nicer if the preprocessor
made whatever time value it determined available to the rest of the
compiler. So this patch adds a new cpp_get_date function, which
abstracts the call to the get_source_date_epoch hook, or uses time
directly. The value is cached. Thus the timestamp I end up putting
on CMI files matches __DATE__ and __TIME__ expansions. That seems
worthwhile.
libcpp/
* include/cpplib.h (enum class CPP_time_kind): New.
(cpp_get_date): Declare.
* internal.h (struct cpp_reader): Replace source_date_epoch with
time_stamp and time_stamp_kind.
* init.c (cpp_create_reader): Initialize them.
* macro.c (_cpp_builtin_macro_text): Use cpp_get_date.
(cpp_get_date): Broken out from _cpp_builtin_macro_text and
genericized.
Our macro use hook passes a location, but doesn't recieve it from the
using location. This patch adds the extra location_t parameter and
passes it though.
A second cleanup is breaking out the macro comparison code from the
redefinition warning. That;ll turn out useful for modules.
Finally, there's a filename comparison needed for the location
optimization of rewinding from line 2 (occurs during the emission of
builtin macros).
libcpp/
* internal.h (_cpp_notify_macro_use): Add location parm.
(_cpp_maybe_notify_macro_use): Likewise.
* directives.c (_cpp_do_file_change): Check we've not changed file
when optimizing a rewind.
(do_ifdef): Pass location to _cpp_maybe_notify_macro_use.
(do_ifndef): Likewise. Delete obsolete comment about powerpc.
* expr.c (parse_defined): Pass location to
_cpp_maybe_notify_macro_use.
* macro.c (enter_macro_context): Likewise.
(warn_of_redefinition): Break out helper function. Call it.
(compare_macros): New function broken out of warn_of_redefinition.
(_cpp_new_macro): Zero all fields.
(_cpp_notify_macro_use): Add location parameter.
We inject EOF tokens between macro argument lists, but had
confused/stale logic in the non-fn invocation. Renamed the magic
'eof' token, as it's now only used for macro argument termination.
Always rewind the non-OPEN_PAREN token.
libcpp/
* internal.h (struct cpp_reader): Rename 'eof' field to 'endarg'.
* init.c (cpp_create_reader): Adjust.
* macro.c (collect_args): Use endarg for separator. Always rewind
in the not-fn case.
gcc/testsuite/
* c-c++-common/cpp/pr97471.c: New.
Using the tokenizer to sniff for an initial line marker for
preprocessed input is a little brittle, particularly with
-fdirectives-only. If there is no marker we'll happily munch initial
comments. This patch directly sniffs the buffer. This is safe
because the initial line marker was machine generated and must be
right at the beginning of the file. Anything else is not such a line
marker. The same is true for the initial directory marker. For that
tokenizing the string is simplest, but at that point it's either a
regular line marker or a directory marker. If it's a regular marker,
unwinding tokens is fine.
libcpp/
* internal.h (enum include_type): Rename IT_MAIN_INJECT to
IT_PRE_MAIN.
* init.c (cpp_read_main_file): If there is no line marker, adjust
the initial line marker.
(read_original_filename): Return bool, peek the buffer directly
before trying to tokenize.
(read_original_directory): Likewise. Directly prod the string
literal.
* files.c (_cpp_stack_file): Adjust for IT_PRE_MAIN change.
With C++ module header units it becomes important to distinguish
between macros defined in forced headers (& commandline & builtins)
from those defined in the header file being processed. We weren't
making that easy because we treated the builtins and command-line
locations somewhat file-like, with incrementing line numbers, and
showing them as included from line 1 of the main file. This patch does
3 things:
0) extend the idiom that 'line 0' of a file means 'the file as a whole'
1) builtins and command-line macros are shown as-if included from line zero.
2) when emitting preprocessed output we keep resetting the line number
so that re-reading that preprocessed output will get the same set of
locations for the command line etc.
For instance the new c-c++-common/cpp/line-2.c test, now emits
In file included from <command-line>:
./line-2.h:4:2: error: #error wrong
4 | #error wrong
| ^~~~~
line-2.c:3:11: error: macro "bill" passed 1 arguments, but takes just 0
3 | int bill(1);
| ^
In file included from <command-line>:
./line-2.h:3: note: macro "bill" defined here
3 | #define bill() 2
|
Before it told you about including from <command-line>:31.
the preprocessed output looks like:
...
(There's a new optimization in do_line_marker to stop each of these
line markers causing a new line map. We can simply rewind the
location, and keep using the same line map.)
libcpp/
* directives.c (do_linemarker): Optimize rewinding to line zero.
* files.c (_cpp_stack_file): Start on line zero when about to inject
headers.
(cpp_push_include, cpp_push_default_include): Use highest_line as
the location.
* include/cpplib.h (cpp_read_main_file): Add injecting parm.
* init.c (cpp_read_main_file): Likewise, inform _cpp_stack_file.
* internal.h (enum include_type): Add IT_MAIN_INJECT.
gcc/c-family/
* c-opts.c (c_common_post_options): Add 'injecting' arg to
cpp_read_main_file.
(c_finish_options): Add linemap_line_start calls for builtin and cmd
maps. Force token position to line_table's highest line.
* c-ppoutput.c (print_line_1): Refactor, print line zero.
(cb_define): Always increment source line.
gcc/testsuite/
* c-c++-common/cpp/line-2.c: New.
* c-c++-common/cpp/line-2.h: New.
* c-c++-common/cpp/line-3.c: New.
* c-c++-common/cpp/line-4.c: New.
* c-c++-common/cpp/line-4.h: New.
_cpp_find_file has 3 bool arguments, at most one of which is ever set.
Ripe for replacing with a 4-state enum. Also, this is C++, so
'typedef struct Foo Foo' is unnecessary.
* internal.h (typedef _cpp_file): Delete, unnecessary in C++.
(enum _cpp_find_file_kind): New.
(_cpp_find_file): Use it, not 3 bools.
* files.c (_cpp_find_file): Use _cpp_find_file_kind enum, not
bools.
(cpp_make_system_header): Break overly long line.
(_cpp_stack_include, _cpp_fake_include)
(_cpp_do_file_change, _cpp_compare_file_date, _cpp_has_header): Adjust.
* init.c (cpp_read_main): Adjust _cpp_find_file call.
This fixes a bunch of poorly formatted decls, marks some getters as
PURE, deletes some C-relevant bool hackery, and finally uses a
passed-in location rather than deducing a closely-related but not
necessarily the same location.
* include/cpplib.h (cpp_get_otions, cpp_get_callbacks)
(cpp_get_deps): Mark as PURE.
* include/line-map.h (get_combined_adhoc_loc)
(get_location_from_adhoc_loc, get_pure_location): Reformat decls.
* internal.h (struct lexer_state): Clarify comment.
* system.h: Remove now-unneeded bool hackery.
* files.c (_cpp_find_file): Store LOC not highest_location.
The existing directives-only code (a) punched a hole through the
libcpp interface and (b) didn't support raw string literals. This
reimplements this preprocessing mode. I added a proper callback
interface, and adjusted c-ppoutput to use it. Sadly I cannot get rid
of the libcpp/internal.h include for unrelated reasons.
The new scanner is in lex.x, and works doing some backwards scanning
when it finds a charater of interest. This reduces the number of
cases one has to deal with in forward scanning. It may have different
failure mode than forward scanning on bad tokenization.
Finally, Moved some cpp tests from the c-specific dg.gcc/cpp directory
to the c-c++-common/cpp shared directory,
libcpp/
* directives-only.c: Delete.
* Makefile.in (libcpp_a_OBJS, libcpp_a_SOURCES): Remove it.
* include/cpplib.h (enum CPP_DO_task): New enum.
(cpp_directive_only_preprocess): Declare.
* internal.h (_cpp_dir_only_callbacks): Delete.
(_cpp_preprocess_dir_only): Delete.
* lex.c (do_peek_backslask, do_peek_next, do_peek_prev): New.
(cpp_directives_only_process): New implementation.
gcc/c-family/
Reimplement directives only processing.
* c-ppoutput.c (token_streamer): Ne.
(directives_only_cb): New. Swallow ...
(print_lines_directives_only): ... this.
(scan_translation_unit_directives_only): Reimplment using the
published interface.
gcc/testsuite/
* gcc.dg/cpp/counter-[23].c: Move to c-c+_-common/cpp.
* gcc.dg/cpp/dir-only-*: Likewise.
* c-c++-common/cpp/dir-only-[78].c: New.
The clever hack of '#define __has_include __has_include' breaks -dD
and -fdirectives-only, because that emits definitions. This turns
__has_include into a proper builtin macro. Thus it's never emitted
via -dD, and because use outside of directive processing is undefined,
we can just expand it anywhere.
PR preprocessor/93452
* internal.h (struct spec_nodes): Drop n__has_include{,_next}.
* directives.c (lex_macro_node): Don't check __has_include redef.
* expr.c (eval_token): Drop __has_include eval.
(parse_has_include): Move to ...
* macro.c (builtin_has_include): ... here.
(_cpp_builtin_macro_text): Eval __has_include{,_next}.
* include/cpplib.h (enum cpp_builtin_type): Add BT_HAS_INCLUDE{,_NEXT}.
* init.c (builtin_array): Add them.
(cpp_init_builtins): Drop __has_include{,_next} init here ...
* pch.c (cpp_read_state): ... and here.
* traditional.c (enum ls): Drop has_include states ...
(_cpp_scan_out_logical_line): ... and here.
__has_include is funky in that it is macro-like from the POV of #ifdef and
friends, but lexes its parenthesize argument #include-like. We were
failing the second part of that, because we used a forwarding macro to an
internal name, and hence always lexed the argument in macro-parameter
context. We componded that by not setting the right flag when lexing, so
it didn't even know. Mostly users got lucky.
This reimplements the handline.
1) Remove the forwarding, but declare object-like macros that
expand to themselves. This satisfies the #ifdef requirement
2) Correctly set angled_brackets when lexing the parameter. This tells
the lexer (a) <...> is a header name and (b) "..." is too (not a string).
3) Remove the in__has_include lexer state, just tell find_file that that's
what's happenning, so it doesn't emit an error.
We lose the (undocumented) ability to #undef __has_include. That may well
have been an accident of implementation. There are no tests for it.
We gain __has_include behaviour for all users of the preprocessors -- not
just the C-family ones that defined a forwarding macro.
libcpp/
PR preprocessor/80005
* include/cpplib.h (BT_HAS_ATTRIBUTE): Fix comment.
* internal.h (struct lexer_state): Delete in__has_include field.
(struct spec_nodes): Rename n__has_include{,_next}__ fields.
(_cpp_defined_macro_p): New.
(_cpp_find_file): Add has_include parm.
* directives.c (lex_macro_node): Combine defined,
__has_inline{,_next} checking.
(do_ifdef, do_ifndef): Use _cpp_defined_macro_p.
(_cpp_init_directives): Refactor.
* expr.c (parse_defined): Use _cpp_defined_macro_p.
(eval_token): Adjust parse_has_include calls.
(parse_has_include): Add OP parameter. Reimplement.
* files.c (_cpp_find_file): Add HAS_INCLUDE parm. Use it to
inhibit error message.
(_cpp_stack_include): Adjust _cpp_find_file call.
(_cpp_fake_include, _cpp_compare_file_date): Likewise.
(open_file_failed): Remove in__has_include check.
(_cpp_has_header): Adjust _cpp_find_file call.
* identifiers.c (_cpp_init_hashtable): Don't init
__has_include{,_next} here ...
* init.c (cpp_init_builtins): ... init them here. Define as
macros.
(cpp_read_main_file): Adjust _cpp_find_file call.
* pch.c (cpp_read_state): Adjust __has_include{,_next} access.
* traditional.c (_cpp_scan_out_locgical_line): Likewise.
gcc/c-family/
PR preprocessor/80005
* c-cppbuiltins.c (c_cpp_builtins): Don't define __has_include{,_next}.
gcc/testsuite/
PR preprocessor/80005
* g++.dg/cpp1y/feat-cxx14.C: Adjust.
* g++.dg/cpp1z/feat-cxx17.C: Adjust.
* g++.dg/cpp2a/feat-cxx2a.C: Adjust.
* g++.dg/cpp/pr80005.C: New.
libcpp/ChangeLog
2019-09-19 Lewis Hyatt <lhyatt@gmail.com>
PR c/67224
* charset.c (_cpp_valid_utf8): New function to help lex UTF-8 tokens.
* internal.h (_cpp_valid_utf8): Declare.
* lex.c (forms_identifier_p): Use it to recognize UTF-8 identifiers.
(_cpp_lex_direct): Handle UTF-8 in identifiers and CPP_OTHER tokens.
Do all work in "default" case to avoid slowing down typical code paths.
Also handle $ and UCN in the default case for consistency.
gcc/Changelog
2019-09-19 Lewis Hyatt <lhyatt@gmail.com>
PR c/67224
* doc/cpp.texi: Document support for extended characters in
identifiers.
* doc/cppopts.texi: Likewise.
gcc/testsuite/ChangeLog
2019-09-19 Lewis Hyatt <lhyatt@gmail.com>
PR c/67224
* c-c++-common/cpp/ucnid-2011-1-utf8.c: New test.
* g++.dg/cpp/ucnid-1-utf8.C: New test.
* g++.dg/cpp/ucnid-2-utf8.C: New test.
* g++.dg/cpp/ucnid-3-utf8.C: New test.
* g++.dg/cpp/ucnid-4-utf8.C: New test.
* g++.dg/other/ucnid-1-utf8.C: New test.
* gcc.dg/cpp/ucnid-1-utf8.c: New test.
* gcc.dg/cpp/ucnid-10-utf8.c: New test.
* gcc.dg/cpp/ucnid-11-utf8.c: New test.
* gcc.dg/cpp/ucnid-12-utf8.c: New test.
* gcc.dg/cpp/ucnid-13-utf8.c: New test.
* gcc.dg/cpp/ucnid-14-utf8.c: New test.
* gcc.dg/cpp/ucnid-15-utf8.c: New test.
* gcc.dg/cpp/ucnid-2-utf8.c: New test.
* gcc.dg/cpp/ucnid-3-utf8.c: New test.
* gcc.dg/cpp/ucnid-4-utf8.c: New test.
* gcc.dg/cpp/ucnid-6-utf8.c: New test.
* gcc.dg/cpp/ucnid-7-utf8.c: New test.
* gcc.dg/cpp/ucnid-9-utf8.c: New test.
* gcc.dg/ucnid-1-utf8.c: New test.
* gcc.dg/ucnid-10-utf8.c: New test.
* gcc.dg/ucnid-11-utf8.c: New test.
* gcc.dg/ucnid-12-utf8.c: New test.
* gcc.dg/ucnid-13-utf8.c: New test.
* gcc.dg/ucnid-14-utf8.c: New test.
* gcc.dg/ucnid-15-utf8.c: New test.
* gcc.dg/ucnid-16-utf8.c: New test.
* gcc.dg/ucnid-2-utf8.c: New test.
* gcc.dg/ucnid-3-utf8.c: New test.
* gcc.dg/ucnid-4-utf8.c: New test.
* gcc.dg/ucnid-5-utf8.c: New test.
* gcc.dg/ucnid-6-utf8.c: New test.
* gcc.dg/ucnid-7-utf8.c: New test.
* gcc.dg/ucnid-8-utf8.c: New test.
* gcc.dg/ucnid-9-utf8.c: New test.
From-SVN: r275979
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01971.html
* internal.h (enum include_type): Add IT_MAIN, IT_DIRECTIVE_HWM,
IT_HEADER_HWM.
(_cpp_stack_file): Take include_type, not a bool.
* files.c (_cpp_find_file): Refactor to not hide an if inside a
for conditional.
(should_stack_file): Break apart to ...
(is_known_idempotent_file, has_unique_contents): ... these.
(_cpp_stack_file): Replace IMPORT boolean with include_type enum.
Refactor to use new predicates. Do linemap compensation here ...
(_cpp_stack_include): ... not here.
* init.c (cpp_read_main_file): Pass IT_MAIN to _cpp_stack_file.
From-SVN: r275034
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01904.html
* directives-only.c (_cpp_preprocess_dir_only): Use false, not
zero for _cpp_handle_directive call.
* directives.c (_cpp_handle_directive): Indented is bool.
* files.c (struct _cpp_file): Make bools 1 bit bitfields.
* internal.h (enum include_type): Reformat and comment.
(struct cpp_buffer): Make flags 1 bit bitfields.
(_cpp_handle_directive): Indented is bool.
From-SVN: r274999