gcc/contrib/unicode
David Malcolm 4f01ae3761 diagnostics: add support for "text art" diagrams
Existing text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance.  This makes it
hard to implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc).

This patch adds more flexible ways of creating text output:
- a canvas class, which can be "painted" to via random-access (rather
that sequentially)
- a table class for 2D grid layout, supporting items that span
multiple rows/columns
- a widget class for organizing diagrams hierarchically.

The patch also expands GCC's diagnostics subsystem so that diagnostics
can have "text art" diagrams - think ASCII art, but potentially
including some Unicode characters, such as box-drawing chars.

The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.

The patch adds a new "-fdiagnostics-text-art-charset=VAL" option, with
values:
- "none": don't emit diagrams (added to -fdiagnostics-plain-output)
- "ascii": use pure ASCII in diagrams
- "unicode": allow for conservative use of unicode drawing characters
(such as box-drawing characters).
- "emoji" (the default): as "unicode", but potentially allow for
conservative use of emoji in the output (such as U+26A0 WARNING SIGN).
I made it possible to disable emoji separately from unicode as I believe
there's a generation gap in acceptance of these characters (some older
programmers have a visceral reaction against them, whereas younger
programmers may have no problem with them).

Diagrams are emitted to stderr by default.  With SARIF output they are
captured as a location in "relatedLocations", with the diagram as a
code block in Markdown within a "markdown" property of a message.

This patch doesn't add any such diagram usage to GCC, saving that for
followups, apart from adding a plugin to the test suite to exercise the
functionality.

contrib/ChangeLog:
	* unicode/gen-box-drawing-chars.py: New file.
	* unicode/gen-combining-chars.py: New file.
	* unicode/gen-printable-chars.py: New file.

gcc/ChangeLog:
	* Makefile.in (OBJS-libcommon): Add text-art/box-drawing.o,
	text-art/canvas.o, text-art/ruler.o, text-art/selftests.o,
	text-art/style.o, text-art/styled-string.o, text-art/table.o,
	text-art/theme.o, and text-art/widget.o.
	* color-macros.h (COLOR_FG_BRIGHT_BLACK): New.
	(COLOR_FG_BRIGHT_RED): New.
	(COLOR_FG_BRIGHT_GREEN): New.
	(COLOR_FG_BRIGHT_YELLOW): New.
	(COLOR_FG_BRIGHT_BLUE): New.
	(COLOR_FG_BRIGHT_MAGENTA): New.
	(COLOR_FG_BRIGHT_CYAN): New.
	(COLOR_FG_BRIGHT_WHITE): New.
	(COLOR_BG_BRIGHT_BLACK): New.
	(COLOR_BG_BRIGHT_RED): New.
	(COLOR_BG_BRIGHT_GREEN): New.
	(COLOR_BG_BRIGHT_YELLOW): New.
	(COLOR_BG_BRIGHT_BLUE): New.
	(COLOR_BG_BRIGHT_MAGENTA): New.
	(COLOR_BG_BRIGHT_CYAN): New.
	(COLOR_BG_BRIGHT_WHITE): New.
	* common.opt (fdiagnostics-text-art-charset=): New option.
	(diagnostic-text-art.h): New SourceInclude.
	(diagnostic_text_art_charset) New Enum and EnumValues.
	* configure: Regenerate.
	* configure.ac (gccdepdir): Add text-art to loop.
	* diagnostic-diagram.h: New file.
	* diagnostic-format-json.cc (json_emit_diagram): New.
	(diagnostic_output_format_init_json): Wire it up to
	context->m_diagrams.m_emission_cb.
	* diagnostic-format-sarif.cc: Include "diagnostic-diagram.h" and
	"text-art/canvas.h".
	(sarif_result::on_nested_diagnostic): Move code to...
	(sarif_result::add_related_location): ...this new function.
	(sarif_result::on_diagram): New.
	(sarif_builder::emit_diagram): New.
	(sarif_builder::make_message_object_for_diagram): New.
	(sarif_emit_diagram): New.
	(diagnostic_output_format_init_sarif): Set
	context->m_diagrams.m_emission_cb to sarif_emit_diagram.
	* diagnostic-text-art.h: New file.
	* diagnostic.cc: Include "diagnostic-text-art.h",
	"diagnostic-diagram.h", and "text-art/theme.h".
	(diagnostic_initialize): Initialize context->m_diagrams and
	call diagnostics_text_art_charset_init.
	(diagnostic_finish): Clean up context->m_diagrams.m_theme.
	(diagnostic_emit_diagram): New.
	(diagnostics_text_art_charset_init): New.
	* diagnostic.h (text_art::theme): New forward decl.
	(class diagnostic_diagram): Likewise.
	(diagnostic_context::m_diagrams): New field.
	(diagnostic_emit_diagram): New decl.
	* doc/invoke.texi (Diagnostic Message Formatting Options): Add
	-fdiagnostics-text-art-charset=.
	(-fdiagnostics-plain-output): Add
	-fdiagnostics-text-art-charset=none.
	* gcc.cc: Include "diagnostic-text-art.h".
	(driver_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
	* opts-common.cc (decode_cmdline_options_to_array): Add
	"-fdiagnostics-text-art-charset=none" to expanded_args for
	-fdiagnostics-plain-output.
	* opts.cc: Include "diagnostic-text-art.h".
	(common_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
	* pretty-print.cc (pp_unicode_character): New.
	* pretty-print.h (pp_unicode_character): New decl.
	* selftest-run-tests.cc: Include "text-art/selftests.h".
	(selftest::run_tests): Call text_art_tests.
	* text-art/box-drawing-chars.inc: New file, generated by
	contrib/unicode/gen-box-drawing-chars.py.
	* text-art/box-drawing.cc: New file.
	* text-art/box-drawing.h: New file.
	* text-art/canvas.cc: New file.
	* text-art/canvas.h: New file.
	* text-art/ruler.cc: New file.
	* text-art/ruler.h: New file.
	* text-art/selftests.cc: New file.
	* text-art/selftests.h: New file.
	* text-art/style.cc: New file.
	* text-art/styled-string.cc: New file.
	* text-art/table.cc: New file.
	* text-art/table.h: New file.
	* text-art/theme.cc: New file.
	* text-art/theme.h: New file.
	* text-art/types.h: New file.
	* text-art/widget.cc: New file.
	* text-art/widget.h: New file.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c: New test.
	* gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c: New test.
	* gcc.dg/plugin/diagnostic-test-text-art-none.c: New test.
	* gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c: New test.
	* gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c: New test.
	* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: New test plugin.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add them.

libcpp/ChangeLog:
	* charset.cc (get_cppchar_property): New function template, based
	on...
	(cpp_wcwidth): ...this function.  Rework to use the above.
	Include "combining-chars.inc".
	(cpp_is_combining_char): New function
	Include "printable-chars.inc".
	(cpp_is_printable_char): New function
	* combining-chars.inc: New file, generated by
	contrib/unicode/gen-combining-chars.py.
	* include/cpplib.h (cpp_is_combining_char): New function decl.
	(cpp_is_printable_char): New function decl.
	* printable-chars.inc: New file, generated by
	contrib/unicode/gen-printable-chars.py.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-21 21:49:00 -04:00
..
from_glibc Update copyright years. 2023-01-16 11:52:17 +01:00
DerivedCoreProperties.txt libcpp: Update cpp_wcwidth() to Unicode 15 2023-03-13 07:40:50 -04:00
DerivedNormalizationProps.txt libcpp: Update cpp_wcwidth() to Unicode 15 2023-03-13 07:40:50 -04:00
EastAsianWidth.txt libcpp: Update cpp_wcwidth() to Unicode 15 2023-03-13 07:40:50 -04:00
gen-box-drawing-chars.py diagnostics: add support for "text art" diagrams 2023-06-21 21:49:00 -04:00
gen-combining-chars.py diagnostics: add support for "text art" diagrams 2023-06-21 21:49:00 -04:00
gen-printable-chars.py diagnostics: add support for "text art" diagrams 2023-06-21 21:49:00 -04:00
gen_wcwidth.py
NameAliases.txt contrib: Update instructions regarding Unicode updates 2023-03-16 10:28:25 +01:00
PropList.txt libcpp: Update cpp_wcwidth() to Unicode 15 2023-03-13 07:40:50 -04:00
README contrib: Update instructions regarding Unicode updates 2023-03-16 10:28:25 +01:00
unicode-license.txt
UnicodeData.txt libcpp: Update cpp_wcwidth() to Unicode 15 2023-03-13 07:40:50 -04:00
utf8-dump.py

This directory contains a mechanism for GCC to have its own internal
implementation of wcwidth functionality (cpp_wcwidth () in libcpp/charset.c),
as well as a mechanism to update the information about codepoints permitted in
identifiers, which is encoded in libcpp/ucnid.h, and mapping between Unicode
names and codepoints, which is encoded in libcpp/uname2c.h.

The idea is to produce the necessary lookup tables
(../../libcpp/{ucnid.h,uname2c.h,generated_cpp_wcwidth.h}) in a reproducible
way, starting from the following files that are distributed by the Unicode
Consortium:

ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt

These files have been added to source control in this directory;
please see unicode-license.txt for the relevant copyright information.

In order to keep in sync with glibc's wcwidth as much as possible, it is
desirable for the logic that processes the Unicode data to be the same as
glibc's.  To that end, we also put in this directory, in the from_glibc/
directory, the glibc python code that implements their logic.  This code was
copied verbatim from glibc, and it can be updated at any time from the glibc
source code repository.  The files copied from that respository are:

localedata/unicode-gen/unicode_utils.py
localedata/unicode-gen/utf8_gen.py

And the most recent versions added to GCC are from glibc git commit:
4c721f24fc190d1dc935eb0bab283de7cf13182e

The script gen_wcwidth.py found here contains the GCC-specific code to
map glibc's output to the lookup tables we require.  This script should not need
to change, unless there are structural changes to the Unicode data files or to
the glibc code.  Similarly, makeucnid.cc in ../../libcpp contains the logic to
produce ucnid.h.

The procedure to update GCC's Unicode support is the following:

1.  Update the five Unicode data files from the above URLs.

2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
    the commit number above in this README.

3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
    (where X.Y is the version of the Unicode standard corresponding to the
    Unicode data files being used, most recently, 15.0.0).

4.  Update Unicode Copyright years in libcpp/makeucnid.cc and in
    libcpp/makeuname2c.cc up to the year in which the Unicode
    standard has been released.

5.  Compile makeucnid, e.g. with:
      g++ -O2 ../../libcpp/makeucnid.cc -o ../../libcpp/makeucnid

6.  Generate ucnid.h as follows:
      ../../libcpp/makeucnid ../../libcpp/ucnid.tab UnicodeData.txt \
	DerivedNormalizationProps.txt DerivedCoreProperties.txt \
	> ../../libcpp/ucnid.h

7.  Read the corresponding Unicode's standard and update correspondingly
    generated_ranges table in libcpp/makeuname2c.cc (in Unicode 15 all
    the needed information was in Table 4-8).

8.  Compile makeuname2c, e.g. with:
      g++ -O2 ../../libcpp/makeuname2c.cc -o ../../libcpp/makeuname2c

9:  Generate uname2c.h as follows:
      ../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \
	> ../../libcpp/uname2c.h