This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improved check-localedef script


Rafal Luzynski <digitalfreak@lingonborough.com> wrote:

> 4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:

>> But even though U+20AC cannot be converted to ISO-8859-1, the
>> ca_ES.ISO-8859-1 locale still works because it is transliterated:
>>
>> $ LC_ALL=ca_ES locale -k currency_symbol charmap
>> currency_symbol="EUR"
>> charmap="ISO-8859-1"
>>
>> So this does not cause an actual problem.
>
> So the "€" character is actually representable in ISO-8859-1 because
> we can convert it to "EUR".  Looks like a false positive then.

Yes.

>> The ca_ES source file is not ASCII, it has
>>
>> % català
>> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>"
>>
>> So maybe I could just convert the file to UTF-8
>> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”
>> to get rid of the check-localedef warning.
>>
>> Would that be OK?
>
> I think that no, it's not OK.  If I understand correctly the
> "source file is ASCII" sentence means that the individual characters:
> '<', '2', '0', 'A', 'C', '>' are ASCII.

Yes.

> They may describe something more complex like <U00E0>.  But even this
> is not UTF-8 because UTF-8 would be <C3> <A0> (UTF-8 is 8-bit).  The
> closest charset would be UCS-2 or simply a generic Unicode.

My understanding at the moment is that the “% Charset: ...” comment
indicates the encoding used to write the source file. So something like
“<U20AC>” is definitely ASCII. Non-ASCII stuff in locale source files
seems to exist only in comments at the moment.

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]