This is the mail archive of the
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, Zack Weinberg <zackw at panix dot com>
- Cc: Mike FABIAN <mfabian at redhat dot com>
- Date: Fri, 4 Aug 2017 11:14:44 +0200 (CEST)
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com>
- Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>
3.08.2017 23:17 Zack Weinberg <firstname.lastname@example.org> wrote:
> ... and finds dozens and dozens of errors. The full list is attached,
Thank you, Zack. This list is huge and it will take time to process
it properly but just some errors here:
> localedata/locales/br_FR... (charset: iso8859-1)
> localedata/locales/br_FR:122: string not representable in iso8859-1:
> 006D 0065 0072 0063 02BC 0068 0065 0072
Most probably this is because of <U02BC> which is a Unicode apostrophe.
In order to be representable in iso8859-1 it needs to be converted
to an ASCII apostrophe <U0027>. Can we please have this in the conversion
script? This is really necessary as br_FR must be converted to both
UTF-8 and ISO 88859-1.
> localedata/locales/ca_ES... (charset: iso8859-1)
> localedata/locales/ca_ES:87: string not representable in iso8859-1:
This is the euro (€) sign. Can we replace it with anything else?
"EUR"? Probably not. Should we stop supporting ca_ES in iso8859-1
and support iso8859-15 only since it includes euro? On the other hand
we have this:
> localedata/locales/ca_ES@euro... (charset: iso8859-15) OK
But do we still need "@euro" variants for countries which adopted
euro currency enough long time ago? Weren't they supposed to be
used in the transition period (1999-2002) where both old currencies
and euro were used?
> localedata/locales/cs_CZ... (charset: iso8859-2)
> localedata/locales/cs_CZ:477: string not representable in iso8859-2:
> 00C6 00C6
> localedata/locales/cs_CZ:478: string not representable in iso8859-2:
> 00C6 00C6
> [cut the rest]
These are the collating tables. Necessary for UTF-8 but I'm not sure
what to do with them in 8-bit charset. I think the conversion scripts
should skip the unrepresentable characters.
> localedata/locales/da_DK... (charset: iso8859-1)
> localedata/locales/da_DK:145: string not representable in iso8859-1:
> 0041 0308
This is false positive: 0308 is a combining diaeresis character so
0041 0308 produces A with diaeresis (Ä) which is representable in
iso8859-1 as C4. Even diaeresis standalone is representable as A8.
This should be continued.