[PATCH 2/2] Add new C.UTF-8 locale (Bug 17318)
Andreas Schwab
schwab@linux-m68k.org
Mon Jun 29 07:54:29 GMT 2020
On Jun 29 2020, Carlos O'Donell via Libc-alpha wrote:
> @@ -125,67 +146,122 @@ def process_charmap(flines, outfile):
>
> <U0010> /x10 DATA LINK ESCAPE
> <U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A>
> - %<UD800> /xed/xa0/x80 <Non Private Use High Surrogate, First>
> - %<UDB7F> /xed/xad/xbf <Non Private Use High Surrogate, Last>
> + <UD800> /xed/xa0/x80 <Non Private Use High Surrogate, First>
> + <UDB7F> /xed/xad/xbf <Non Private Use High Surrogate, Last>
> <U0010FFC0>..<U0010FFFD> /xf4/x8f/xbf/x80 <Plane 16 Private Use>
>
> + Note that old glibc UTF-8 charmap left the surrogates commented out.
> + We keep the surrogate entries because we want to be able to sort the
> + invalid values into a consistent location.
> +
> '''
> fields_start = []
> + fields_end = []
> for line in flines:
> fields = line.split(";")
> - # Some characters have “<control>” as their name. We try to
> - # use the “Unicode 1.0 Name” (10th field in
> - # UnicodeData.txt) for them.
> - #
> - # The Characters U+0080, U+0081, U+0084 and U+0099 have
> - # “<control>” as their name but do not even have aa
> - # ”Unicode 1.0 Name”. We could write code to take their
> - # alternate names from NameAliases.txt.
> + # Some characters have “<control>” as their name. We try to
> + # use the “Unicode 1.0 Name” (10th field in
> + # UnicodeData.txt) for them.
> + #
> + # The Characters U+0080, U+0081, U+0084 and U+0099 have
> + # “<control>” as their name but do not even have aa
s/aa/a/
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
More information about the Libc-alpha
mailing list