This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 4 Aug 2017 11:25:16 +0200 (CEST)
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9d60e3bspn.fsf@redhat.com>
- Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>
4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:
> [...]
> I am not sure what do do about this one:
>
> ca_ES:87: string not representable in iso8859-1:
> 20AC
I've just written another email about it. :-)
> This is the euro symbol, the line from the source file is:
>
> currency_symbol "<U20AC>"
>
> SUPPORTED contains:
>
> ca_ES.UTF-8/UTF-8 \
> ca_ES/ISO-8859-1 \
> ca_ES@euro/ISO-8859-15 \
>
> But even though U+20AC cannot be converted to ISO-8859-1, the
> ca_ES.ISO-8859-1 locale still works because it is transliterated:
>
> $ LC_ALL=ca_ES locale -k currency_symbol charmap
> currency_symbol="EUR"
> charmap="ISO-8859-1"
>
> So this does not cause an actual problem.
So the "€" character is actually representable in ISO-8859-1 because
we can convert it to "EUR". Looks like a false positive then.
> The ca_ES source file is not ASCII, it has
>
> % català
> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>"
>
> So maybe I could just convert the file to UTF-8
> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”
> to get rid of the check-localedef warning.
>
> Would that be OK?
I think that no, it's not OK. If I understand correctly the
"source file is ASCII" sentence means that the individual characters:
'<', '2', '0', 'A', 'C', '>' are ASCII. They may describe something
more complex like <U00E0>. But even this is not UTF-8 because UTF-8
would be <C3> <A0> (UTF-8 is 8-bit). The closest charset would be
UCS-2 or simply a generic Unicode.
Caution: we are mixing metalevels here: what characters we describe
vs what characters we use to describe. :-)
Regards,
Rafal