This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improved check-localedef script

From: Rafal Luzynski <digitalfreak at lingonborough dot com>
To: Mike FABIAN <mfabian at redhat dot com>, Zack Weinberg <zackw at panix dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 4 Aug 2017 11:25:16 +0200 (CEST)
Subject: Re: Improved check-localedef script
Authentication-results: sourceware.org; auth=none
References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9d60e3bspn.fsf@redhat.com>
Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>

4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:
> [...]
> I am not sure what do do about this one:
>
> ca_ES:87: string not representable in iso8859-1:
> 20AC

I've just written another email about it. :-)

> This is the euro symbol, the line from the source file is:
>
> currency_symbol "<U20AC>"
>
> SUPPORTED contains:
>
> ca_ES.UTF-8/UTF-8 \
> ca_ES/ISO-8859-1 \
> ca_ES@euro/ISO-8859-15 \
>
> But even though U+20AC cannot be converted to ISO-8859-1, the
> ca_ES.ISO-8859-1 locale still works because it is transliterated:
>
> $ LC_ALL=ca_ES locale -k currency_symbol charmap
> currency_symbol="EUR"
> charmap="ISO-8859-1"
>
> So this does not cause an actual problem.

So the "€" character is actually representable in ISO-8859-1 because
we can convert it to "EUR".  Looks like a false positive then.

> The ca_ES source file is not ASCII, it has
>
> % català
> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>"
>
> So maybe I could just convert the file to UTF-8
> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”
> to get rid of the check-localedef warning.
>
> Would that be OK?

I think that no, it's not OK.  If I understand correctly the
"source file is ASCII" sentence means that the individual characters:
'<', '2', '0', 'A', 'C', '>' are ASCII.  They may describe something
more complex like <U00E0>.  But even this is not UTF-8 because UTF-8
would be <C3> <A0> (UTF-8 is 8-bit).  The closest charset would be
UCS-2 or simply a generic Unicode.

Caution: we are mixing metalevels here: what characters we describe
vs what characters we use to describe. :-)

Regards,

Rafal

Follow-Ups:
- Re: Improved check-localedef script
  - From: Mike FABIAN

References:
- Improved check-localedef script
  - From: Zack Weinberg
- Re: Improved check-localedef script
  - From: Mike FABIAN

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]