This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improved check-localedef script

From: Mike FABIAN <mfabian at redhat dot com>
To: Rafal Luzynski <digitalfreak at lingonborough dot com>
Cc: Zack Weinberg <zackw at panix dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 04 Aug 2017 11:50:00 +0200
Subject: Re: Improved check-localedef script
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 587CB8124F
References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9d60e3bspn.fsf@redhat.com> <26692227.553011.1501838716734@poczta.nazwa.pl>

Rafal Luzynski <digitalfreak@lingonborough.com> wrote:

> 4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:

>> But even though U+20AC cannot be converted to ISO-8859-1, the
>> ca_ES.ISO-8859-1 locale still works because it is transliterated:
>>
>> $ LC_ALL=ca_ES locale -k currency_symbol charmap
>> currency_symbol="EUR"
>> charmap="ISO-8859-1"
>>
>> So this does not cause an actual problem.
>
> So the "€" character is actually representable in ISO-8859-1 because
> we can convert it to "EUR".  Looks like a false positive then.

Yes.

>> The ca_ES source file is not ASCII, it has
>>
>> % català
>> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>"
>>
>> So maybe I could just convert the file to UTF-8
>> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”
>> to get rid of the check-localedef warning.
>>
>> Would that be OK?
>
> I think that no, it's not OK.  If I understand correctly the
> "source file is ASCII" sentence means that the individual characters:
> '<', '2', '0', 'A', 'C', '>' are ASCII.

Yes.

> They may describe something more complex like <U00E0>.  But even this
> is not UTF-8 because UTF-8 would be <C3> <A0> (UTF-8 is 8-bit).  The
> closest charset would be UCS-2 or simply a generic Unicode.

My understanding at the moment is that the “% Charset: ...” comment
indicates the encoding used to write the source file. So something like
“<U20AC>” is definitely ASCII. Non-ASCII stuff in locale source files
seems to exist only in comments at the moment.

-- 
Mike FABIAN <mfabian@redhat.com>

Follow-Ups:
- Re: Improved check-localedef script
  - From: Rafal Luzynski

References:
- Improved check-localedef script
  - From: Zack Weinberg
- Re: Improved check-localedef script
  - From: Mike FABIAN
- Re: Improved check-localedef script
  - From: Rafal Luzynski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]