FreeChainXenon/gcc: GCC modified for the FreeChainXenon project

GCC modified for the FreeChainXenon project

Find a file

Jonathan Wakely 37a4c5c23a libstdc++: Add Unicode-aware width estimation for std::format This implements the requirements in the following proposals, which dictate how std::format deals with non-ASCII strings: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1868r1.html https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2572r1.html https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2675r1.pdf There are two parts to this. The width estimation for strings must only count the width of the first character in an extended grapheme cluster. That requires implementing the algorithm for detecting cluster breaks, which requires a number of lookup tables of the grapheme cluster break properties (and Indic_Conjunct_Break and Extended_Pictographic properties) of every code point. Additionally, some characters have a field width of 2, which requires another lookup table of field widths for every code point. The tables added in this commit do not contain entries for every code point from 0 to 0x10FFFF as that would be very inefficient and use too much memory. Instead the tables only contain the code points that form an "edge" for a property, omitting all the code points that have the same property as the preceding one. We can use a binary search to find the closest code point in the table that is not greater than the one we're looking for. The tables are generated by a new Python script added to the contrib/unicode directory, and a new data file downloaded from the Unicode Consortium website. The rules for extended grapheme cluster breaking are implemented for the latest Unicode standard, version 15.1.0. libstdc++-v3/ChangeLog: * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/bits/unicode.h: New file. * include/bits/unicode-data.h: New file. * include/std/format: Include <bits/unicode.h>. (__literal_encoding_is_utf8): Move to <bits/unicode.h>. (_Spec::_M_fill): Change type to char32_t. (_Spec::_M_parse_fill_and_align): Read a Unicode scalar value instead of a single character. (__write_padded): Change __fill_char parameter to char32_t and encode it into the output. (__formatter_str::format): Use new __unicode::__field_width and __unicode::__truncate functions. * include/std/ostream: Adjust namespace qualification for __literal_encoding_is_utf8. * include/std/print: Likewise. * src/c++23/print.cc: Add [[unlikely]] attribute to error path. * testsuite/ext/unicode/view.cc: New test. * testsuite/std/format/functions/format.cc: Add missing examples from the standard demonstrating alignment with non-ASCII characters. Add examples checking correct handling of extended grapheme clusters. contrib/ChangeLog: * unicode/README: Add notes about generating libstdc++ tables. * unicode/GraphemeBreakProperty.txt: New file. * unicode/emoji-data.txt: New file. * unicode/gen_libstdcxx_unicode_data.py: New file.		2024-01-08 01:14:50 +00:00
.github
c++tools
config
contrib	libstdc++: Add Unicode-aware width estimation for std::format	2024-01-08 01:14:50 +00:00
fixincludes
gcc	RISC-V: Fix avl-type operand index error for ZVBC	2024-01-08 00:55:57 +00:00
gnattools
gotools
include
INSTALL
libada
libatomic
libbacktrace	Update copyright years.	2024-01-05 08:54:28 +01:00
libcc1
libcody
libcpp	Daily bump.	2024-01-05 00:18:48 +00:00
libdecnumber
libffi
libgcc
libgfortran	Daily bump.	2024-01-08 00:16:43 +00:00
libgm2	Daily bump.	2024-01-06 00:18:04 +00:00
libgo
libgomp	Daily bump.	2024-01-07 00:17:43 +00:00
libgrust
libiberty
libitm	Daily bump.	2024-01-04 00:18:45 +00:00
libobjc
libphobos	Update copyright years.	2024-01-05 08:54:28 +01:00
libquadmath	Daily bump.	2024-01-04 00:18:45 +00:00
libsanitizer
libssp
libstdc++-v3	libstdc++: Add Unicode-aware width estimation for std::format	2024-01-08 01:14:50 +00:00
libvtv
lto-plugin
maintainer-scripts
zlib
.dir-locals.el
.gitattributes
.gitignore
ABOUT-NLS
ar-lib
ChangeLog
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess
config.rpath
config.sub
configure
configure.ac
COPYING
COPYING.LIB
COPYING.RUNTIME
COPYING3
COPYING3.LIB
depcomp
install-sh
libtool-ldflags
libtool.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
lt~obsolete.m4
MAINTAINERS
Makefile.def
Makefile.in
Makefile.tpl
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
SECURITY.txt
symlink-tree
test-driver
ylwrap

README

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.