This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Rafal Luzynski <digitalfreak at lingonborough dot com>
- Cc: Zack Weinberg <zackw at panix dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 04 Aug 2017 10:27:57 +0200
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4B694356F4
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9dfud7j0kc.fsf@redhat.com> <1301063485.547479.1501834008015@poczta.nazwa.pl>
Rafal Luzynski <digitalfreak@lingonborough.com> さんはかきました:
> This "Charset: CP1256" is just a comment. Is it used anywhere? I don't
> think so. I think that localedata/SUPPORTED file is relevant and it
> requires ur_PK (and ur_IN as well) to be converted to UTF-8 only.
This comment is supposed to indicate the encoding the locale source file
is written in.
>> [...]
>> So I think we should replace
>>
>> % Charset: CP1256
>>
>> with
>>
>> % Charset: UTF-8
>>
>> in ur_PK.
>
> The file currently is in pure 7-bit ASCII. Do we need this line
> at all? What about removing it? If it should not be removed then
> maybe let's consider ASCII. UTF-8 is good if ASCII cannot be used.
> Actually, CP1256 is also true but misleading, the file uses an ASCII
> charset which is a common subset of many other subsets. The only
> problem is that CP1256 is misleading and causes those false positives.
> TL;DR: my suggestions are (in the order of my preference):
>
> - remove this line,
> - replace with % Charset: ASCII
> - replace with % Charset: UTF-8
> - leave unchanged,
> - feel free to post your own suggestion.
I would change it to “% Charset: UTF-8”, you are right the the
file is pure ASCII at the moment, but in an ur_PK file it might
make sense to use Arabic scripts in comments and if we do that
we should use UTF-8. So even if the file is currently ASCII,
using “% Charset: UTF-8” shows our intention better.
--
Mike FABIAN <mfabian@redhat.com>