[PATCH 2/2] Add new C.UTF-8 locale (Bug 17318)

Andreas Schwab schwab@linux-m68k.org
Mon Jun 29 07:54:29 GMT 2020


On Jun 29 2020, Carlos O'Donell via Libc-alpha wrote:

> @@ -125,67 +146,122 @@ def process_charmap(flines, outfile):
>  
>      <U0010>     /x10 DATA LINK ESCAPE
>      <U3400>..<U343F>     /xe3/x90/x80 <CJK Ideograph Extension A>
> -    %<UD800>     /xed/xa0/x80 <Non Private Use High Surrogate, First>
> -    %<UDB7F>     /xed/xad/xbf <Non Private Use High Surrogate, Last>
> +    <UD800>     /xed/xa0/x80 <Non Private Use High Surrogate, First>
> +    <UDB7F>     /xed/xad/xbf <Non Private Use High Surrogate, Last>
>      <U0010FFC0>..<U0010FFFD>     /xf4/x8f/xbf/x80 <Plane 16 Private Use>
>  
> +    Note that old glibc UTF-8 charmap left the surrogates commented out.
> +    We keep the surrogate entries because we want to be able to sort the
> +    invalid values into a consistent location.
> +
>      '''
>      fields_start = []
> +    fields_end = []
>      for line in flines:
>          fields = line.split(";")
> -         # Some characters have “<control>” as their name. We try to
> -         # use the “Unicode 1.0 Name” (10th field in
> -         # UnicodeData.txt) for them.
> -         #
> -         # The Characters U+0080, U+0081, U+0084 and U+0099 have
> -         # “<control>” as their name but do not even have aa
> -         # ”Unicode 1.0 Name”. We could write code to take their
> -         # alternate names from NameAliases.txt.
> +        # Some characters have “<control>” as their name. We try to
> +        # use the “Unicode 1.0 Name” (10th field in
> +        # UnicodeData.txt) for them.
> +        #
> +        # The Characters U+0080, U+0081, U+0084 and U+0099 have
> +        # “<control>” as their name but do not even have aa

s/aa/a/

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


More information about the Libc-alpha mailing list