gcc/contrib
Jonathan Wakely 37a4c5c23a libstdc++: Add Unicode-aware width estimation for std::format
This implements the requirements in the following proposals, which
dictate how std::format deals with non-ASCII strings:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1868r1.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2572r1.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2675r1.pdf

There are two parts to this. The width estimation for strings must only
count the width of the first character in an extended grapheme cluster.
That requires implementing the algorithm for detecting cluster breaks,
which requires a number of lookup tables of the grapheme cluster break
properties (and Indic_Conjunct_Break and Extended_Pictographic
properties) of every code point. Additionally, some characters have a
field width of 2, which requires another lookup table of field widths
for every code point.  The tables added in this commit do not contain
entries for every code point from 0 to 0x10FFFF as that would be very
inefficient and use too much memory. Instead the tables only contain the
code points that form an "edge" for a property, omitting all the code
points that have the same property as the preceding one. We can use a
binary search to find the closest code point in the table that is not
greater than the one we're looking for.

The tables are generated by a new Python script added to the
contrib/unicode directory, and a new data file downloaded from the
Unicode Consortium website.

The rules for extended grapheme cluster breaking are implemented for the
latest Unicode standard, version 15.1.0.

libstdc++-v3/ChangeLog:

	* include/Makefile.am: Add new headers.
	* include/Makefile.in: Regenerate.
	* include/bits/unicode.h: New file.
	* include/bits/unicode-data.h: New file.
	* include/std/format: Include <bits/unicode.h>.
	(__literal_encoding_is_utf8): Move to <bits/unicode.h>.
	(_Spec::_M_fill): Change type to char32_t.
	(_Spec::_M_parse_fill_and_align): Read a Unicode scalar value
	instead of a single character.
	(__write_padded): Change __fill_char parameter to char32_t and
	encode it into the output.
	(__formatter_str::format): Use new __unicode::__field_width and
	__unicode::__truncate functions.
	* include/std/ostream: Adjust namespace qualification for
	__literal_encoding_is_utf8.
	* include/std/print: Likewise.
	* src/c++23/print.cc: Add [[unlikely]] attribute to error path.
	* testsuite/ext/unicode/view.cc: New test.
	* testsuite/std/format/functions/format.cc: Add missing examples
	from the standard demonstrating alignment with non-ASCII
	characters. Add examples checking correct handling of extended
	grapheme clusters.

contrib/ChangeLog:

	* unicode/README: Add notes about generating libstdc++ tables.
	* unicode/GraphemeBreakProperty.txt: New file.
	* unicode/emoji-data.txt: New file.
	* unicode/gen_libstdcxx_unicode_data.py: New file.
2024-01-08 01:14:50 +00:00
..
gcc-changelog contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
header-tools Daily bump. 2022-09-01 00:17:39 +00:00
legacy Update copyright years. 2024-01-03 12:19:35 +01:00
mdcompact contrib: add mdcompact 2023-10-05 17:41:54 +02:00
reghunt Update copyright years. 2024-01-03 12:19:35 +01:00
regression Update copyright years. 2024-01-03 12:19:35 +01:00
testsuite-management Update copyright years. 2024-01-03 12:19:35 +01:00
unicode libstdc++: Add Unicode-aware width estimation for std::format 2024-01-08 01:14:50 +00:00
vim-gcc-dev Update copyright years. 2024-01-03 12:19:35 +01:00
analyze_brprob.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
analyze_brprob_spec.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
bench-stringop Update copyright years. 2024-01-03 12:19:35 +01:00
ChangeLog Daily bump. 2024-01-06 00:18:04 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
check-internal-format-escaping.py Fix flake8 errors. 2021-11-02 10:27:27 +01:00
check-MAINTAINERS.py Update copyright years. 2024-01-03 12:19:35 +01:00
check-params-in-docs.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
check_GNU_style.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
check_GNU_style.sh Update copyright years. 2024-01-03 12:19:35 +01:00
check_GNU_style_lib.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
check_makefile_deps.sh Update copyright years. 2024-01-03 12:19:35 +01:00
check_warning_flags.sh Update copyright years. 2024-01-03 12:19:35 +01:00
clang-format Update copyright years. 2024-01-03 12:19:35 +01:00
compare-all-tests Update copyright years. 2024-01-03 12:19:35 +01:00
compare-debug Update copyright years. 2024-01-03 12:19:35 +01:00
compare-lto Update copyright years. 2024-01-03 12:19:35 +01:00
compare_tests compare_tests: distinguish c-c++-common results by tool 2023-12-21 01:02:41 -03:00
compare_two_ftime_report_sets Update copyright years. 2024-01-03 12:19:35 +01:00
compareSumTests3 Update copyright years. 2024-01-03 12:19:35 +01:00
config-list.mk Tweak language choice in config-list.mk 2023-12-02 13:49:53 +00:00
dg-cmp-results.sh Update copyright years. 2024-01-03 12:19:35 +01:00
dg-extract-results.py Update copyright years. 2024-01-03 12:19:35 +01:00
dg-extract-results.sh Update copyright years. 2024-01-03 12:19:35 +01:00
dg-out-generator.pl Update copyright years. 2024-01-03 12:19:35 +01:00
dglib.pm Update copyright years. 2024-01-03 12:19:35 +01:00
download_prerequisites Update copyright years. 2024-01-03 12:19:35 +01:00
filter-clang-warnings.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
filter_gcc_for_doxygen contrib: port doxygen script to Python3 2023-04-28 16:42:17 +02:00
filter_knr2ansi.pl
filter_params.py contrib: port doxygen script to Python3 2023-04-28 16:42:17 +02:00
gcc-git-customization.sh contrib: add git gcc-style alias 2023-12-20 18:08:16 -05:00
gcc.doxy contrib: doxygen: add gcc/analyzer subdirectory to INPUT 2022-12-06 13:26:56 -05:00
gcc_build Update copyright years. 2024-01-03 12:19:35 +01:00
gcc_update libgrust: Add libproc_macro and build system 2023-12-14 13:58:57 +01:00
gen_autofdo_event.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
gennews Update copyright years. 2024-01-03 12:19:35 +01:00
git-add-user-branch.sh contrib: Change 'remote' for personal branches and add branch creation script 2020-01-24 14:38:16 +00:00
git-add-vendor-branch.sh contrib: script to create a new vendor branch 2020-01-22 10:06:50 +00:00
git-backport.py Update copyright years. 2024-01-03 12:19:35 +01:00
git-commit-mklog.py Update copyright years. 2024-01-03 12:19:35 +01:00
git-descr.sh gcc-descr: by default show revision for HEAD 2022-07-04 12:27:43 +02:00
git-fetch-vendor.sh contrib: Fix a typo in contrib/git-fetch-vendor.sh 2022-08-18 17:34:58 +02:00
git-fix-changelog.py Update copyright years. 2024-01-03 12:19:35 +01:00
git-undescr.sh contrib: Fix up git-descr.sh regression [PR102664] 2022-03-10 09:42:03 +01:00
gthr_supp_vxw_5x.c
index-prop
jit-coverage-report.py Update copyright years. 2024-01-03 12:19:35 +01:00
make-obstacks-texi.pl
make_sunver.pl contrib: Fix make_sunver.pl warning 2023-02-17 13:33:25 +01:00
mark_spam.py contrib: Remove C-style comments from Python files 2024-01-05 13:57:05 +00:00
mklog.py Update copyright years. 2024-01-03 12:19:35 +01:00
paranoia.cc Change references of .c files to .cc files 2022-01-17 22:12:07 +01:00
patch_tester.sh Update copyright years. 2024-01-03 12:19:35 +01:00
prepare-commit-msg Update copyright years. 2024-01-03 12:19:35 +01:00
prepare_patch.sh Update copyright years. 2024-01-03 12:19:35 +01:00
prerequisites.md5 *: add modern gettext 2023-11-14 00:47:11 +01:00
prerequisites.sha512 *: add modern gettext 2023-11-14 00:47:11 +01:00
repro_fail contrib: Fix nonportable shell syntax in "test" and "[" commands [PR105831] 2023-05-18 14:01:40 +01:00
test_installed Update copyright years. 2024-01-03 12:19:35 +01:00
test_mklog.py Update copyright years. 2024-01-03 12:19:35 +01:00
test_recheck
test_summary
texi2pod.pl Update copyright years. 2024-01-03 12:19:35 +01:00
uninclude
unused_functions.py Update copyright years. 2024-01-03 12:19:35 +01:00
update-copyright.py Small tweaks for update-copyright.py 2024-01-03 12:11:32 +01:00
vimrc Update copyright years. 2024-01-03 12:19:35 +01:00
warn_summary