
The following patch (in plaintext just a pseudo-patch where I've left out the too big parts of either wget downloaded or regenerated files out with ..., full patch attached compressed) updates to Unicode 15.1 from 15.0 we had last year. Apparently Unicode forgot to add a new range to 4-8 Table we are using, but from the other files it is clear what should have been added; I've filed a bugreport against Unicode. 2023-11-14 Jakub Jelinek <jakub@redhat.com> contrib/ * unicode/README: Adjust glibc git commit hash, number of Unicode data files to be updated and latest Unicode version. * unicode/from_glibc/utf8_gen.py: Update from glibc. * unicode/UnicodeData.txt: Update from Unicode 15.1. * unicode/EastAsianWidth.txt: Likewise. * unicode/DerivedNormalizationProps.txt: Likewise. * unicode/NameAliases.txt: Likewise. * unicode/DerivedCoreProperties.txt: Likewise. * unicode/PropList.txt: Likewise. libcpp/ * makeucnid.cc (write_copyright): Update copyright year. * makeuname2c.cc (write_copyright): Likewise. (struct generated): Update latest Unicode version. (generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH range which was forgotten to be added to 4-8 table, but clearly is expected to be there from the 15.1 additions. * ucnid.h: Regenerated. * uname2c.h: Regenerated. * generated_cpp_wcwidth.h: Regenerated.
73 lines
3.3 KiB
Text
73 lines
3.3 KiB
Text
This directory contains a mechanism for GCC to have its own internal
|
|
implementation of wcwidth functionality (cpp_wcwidth () in libcpp/charset.c),
|
|
as well as a mechanism to update the information about codepoints permitted in
|
|
identifiers, which is encoded in libcpp/ucnid.h, and mapping between Unicode
|
|
names and codepoints, which is encoded in libcpp/uname2c.h.
|
|
|
|
The idea is to produce the necessary lookup tables
|
|
(../../libcpp/{ucnid.h,uname2c.h,generated_cpp_wcwidth.h}) in a reproducible
|
|
way, starting from the following files that are distributed by the Unicode
|
|
Consortium:
|
|
|
|
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
|
|
ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt
|
|
|
|
These files have been added to source control in this directory;
|
|
please see unicode-license.txt for the relevant copyright information.
|
|
|
|
In order to keep in sync with glibc's wcwidth as much as possible, it is
|
|
desirable for the logic that processes the Unicode data to be the same as
|
|
glibc's. To that end, we also put in this directory, in the from_glibc/
|
|
directory, the glibc python code that implements their logic. This code was
|
|
copied verbatim from glibc, and it can be updated at any time from the glibc
|
|
source code repository. The files copied from that respository are:
|
|
|
|
localedata/unicode-gen/unicode_utils.py
|
|
localedata/unicode-gen/utf8_gen.py
|
|
|
|
And the most recent versions added to GCC are from glibc git commit:
|
|
71de3aead9fffe89556e80ebc94aa918d8ee7bca
|
|
|
|
The script gen_wcwidth.py found here contains the GCC-specific code to
|
|
map glibc's output to the lookup tables we require. This script should not need
|
|
to change, unless there are structural changes to the Unicode data files or to
|
|
the glibc code. Similarly, makeucnid.cc in ../../libcpp contains the logic to
|
|
produce ucnid.h.
|
|
|
|
The procedure to update GCC's Unicode support is the following:
|
|
|
|
1. Update the six Unicode data files from the above URLs.
|
|
|
|
2. Update the two glibc files in from_glibc/ from glibc's git. Update
|
|
the commit number above in this README.
|
|
|
|
3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
|
|
(where X.Y is the version of the Unicode standard corresponding to the
|
|
Unicode data files being used, most recently, 15.1.0).
|
|
|
|
4. Update Unicode Copyright years in libcpp/makeucnid.cc and in
|
|
libcpp/makeuname2c.cc up to the year in which the Unicode
|
|
standard has been released.
|
|
|
|
5. Compile makeucnid, e.g. with:
|
|
g++ -O2 ../../libcpp/makeucnid.cc -o ../../libcpp/makeucnid
|
|
|
|
6. Generate ucnid.h as follows:
|
|
../../libcpp/makeucnid ../../libcpp/ucnid.tab UnicodeData.txt \
|
|
DerivedNormalizationProps.txt DerivedCoreProperties.txt \
|
|
> ../../libcpp/ucnid.h
|
|
|
|
7. Read the corresponding Unicode's standard and update correspondingly
|
|
generated_ranges table in libcpp/makeuname2c.cc (in Unicode 15 all
|
|
the needed information was in Table 4-8).
|
|
|
|
8. Compile makeuname2c, e.g. with:
|
|
g++ -O2 ../../libcpp/makeuname2c.cc -o ../../libcpp/makeuname2c
|
|
|
|
9: Generate uname2c.h as follows:
|
|
../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \
|
|
> ../../libcpp/uname2c.h
|