This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improved check-localedef script

From: Rafal Luzynski <digitalfreak at lingonborough dot com>
To: GNU C Library <libc-alpha at sourceware dot org>, Zack Weinberg <zackw at panix dot com>
Cc: Mike FABIAN <mfabian at redhat dot com>
Date: Fri, 4 Aug 2017 11:14:44 +0200 (CEST)
Subject: Re: Improved check-localedef script
Authentication-results: sourceware.org; auth=none
References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com>
Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>

3.08.2017 23:17 Zack Weinberg <zackw@panix.com> wrote:
>
> [...]
> ... and finds dozens and dozens of errors. The full list is attached,
> [...]

Thank you, Zack.  This list is huge and it will take time to process
it properly but just some errors here:

> localedata/locales/br_FR... (charset: iso8859-1)
>   localedata/locales/br_FR:122: string not representable in iso8859-1:
>       006D 0065 0072 0063 02BC 0068 0065 0072
> [...]

Most probably this is because of <U02BC> which is a Unicode apostrophe.
In order to be representable in iso8859-1 it needs to be converted
to an ASCII apostrophe <U0027>.  Can we please have this in the conversion
script?  This is really necessary as br_FR must be converted to both
UTF-8 and ISO 88859-1.

> localedata/locales/ca_ES... (charset: iso8859-1)
>   localedata/locales/ca_ES:87: string not representable in iso8859-1:
>       20AC

This is the euro (€) sign.  Can we replace it with anything else?
"EUR"?  Probably not.  Should we stop supporting ca_ES in iso8859-1
and support iso8859-15 only since it includes euro?  On the other hand
we have this:

> localedata/locales/ca_ES@euro... (charset: iso8859-15) OK

But do we still need "@euro" variants for countries which adopted
euro currency enough long time ago?  Weren't they supposed to be
used in the transition period (1999-2002) where both old currencies
and euro were used?

> localedata/locales/cs_CZ... (charset: iso8859-2)
>   localedata/locales/cs_CZ:477: string not representable in iso8859-2:
>       00C6 00C6
>   localedata/locales/cs_CZ:478: string not representable in iso8859-2:
>       00C6 00C6
> [cut the rest]

These are the collating tables.  Necessary for UTF-8 but I'm not sure
what to do with them in 8-bit charset.  I think the conversion scripts
should skip the unrepresentable characters.

> localedata/locales/da_DK... (charset: iso8859-1)
>   localedata/locales/da_DK:145: string not representable in iso8859-1:
>       0041 0308

This is false positive: 0308 is a combining diaeresis character so
0041 0308 produces A with diaeresis (Ä) which is representable in
iso8859-1 as C4.  Even diaeresis standalone is representable as A8.

This should be continued.

Regards,

Rafal

Follow-Ups:
- Re: Improved check-localedef script
  - From: Mike FABIAN
- Re: Improved check-localedef script
  - From: Zack Weinberg

References:
- Improved check-localedef script
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]