This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, Zack Weinberg <zackw at panix dot com>
- Cc: Mike FABIAN <mfabian at redhat dot com>
- Date: Fri, 4 Aug 2017 11:14:44 +0200 (CEST)
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com>
- Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>
3.08.2017 23:17 Zack Weinberg <zackw@panix.com> wrote:
>
> [...]
> ... and finds dozens and dozens of errors. The full list is attached,
> [...]
Thank you, Zack. This list is huge and it will take time to process
it properly but just some errors here:
> localedata/locales/br_FR... (charset: iso8859-1)
> localedata/locales/br_FR:122: string not representable in iso8859-1:
> 006D 0065 0072 0063 02BC 0068 0065 0072
> [...]
Most probably this is because of <U02BC> which is a Unicode apostrophe.
In order to be representable in iso8859-1 it needs to be converted
to an ASCII apostrophe <U0027>. Can we please have this in the conversion
script? This is really necessary as br_FR must be converted to both
UTF-8 and ISO 88859-1.
> localedata/locales/ca_ES... (charset: iso8859-1)
> localedata/locales/ca_ES:87: string not representable in iso8859-1:
> 20AC
This is the euro (€) sign. Can we replace it with anything else?
"EUR"? Probably not. Should we stop supporting ca_ES in iso8859-1
and support iso8859-15 only since it includes euro? On the other hand
we have this:
> localedata/locales/ca_ES@euro... (charset: iso8859-15) OK
But do we still need "@euro" variants for countries which adopted
euro currency enough long time ago? Weren't they supposed to be
used in the transition period (1999-2002) where both old currencies
and euro were used?
> localedata/locales/cs_CZ... (charset: iso8859-2)
> localedata/locales/cs_CZ:477: string not representable in iso8859-2:
> 00C6 00C6
> localedata/locales/cs_CZ:478: string not representable in iso8859-2:
> 00C6 00C6
> [cut the rest]
These are the collating tables. Necessary for UTF-8 but I'm not sure
what to do with them in 8-bit charset. I think the conversion scripts
should skip the unrepresentable characters.
> localedata/locales/da_DK... (charset: iso8859-1)
> localedata/locales/da_DK:145: string not representable in iso8859-1:
> 0041 0308
This is false positive: 0308 is a combining diaeresis character so
0041 0308 produces A with diaeresis (Ä) which is representable in
iso8859-1 as C4. Even diaeresis standalone is representable as A8.
This should be continued.
Regards,
Rafal