This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 4 Aug 2017 10:06:47 +0200 (CEST)
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9dfud7j0kc.fsf@redhat.com>
- Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>
4.08.2017 08:42 Mike FABIAN <mfabian@redhat.com> wrote:
>
>
> Zack Weinberg <zackw@panix.com> wrote:
>
> [...]
> > ... and finds dozens and dozens of errors. The full list is attached,
> > but here's a small sample:
> >
> > localedata/locales/ur_PK... (charset: cp1256)
> > localedata/locales/ur_PK:114: string not representable in cp1256:
> > 062C 0646 0648 0631 06CC
> > localedata/locales/ur_PK:115: string not representable in cp1256:
> > 0641 0631 0648 0631 06CC
> > localedata/locales/ur_PK:117: string not representable in cp1256:
> > 0627 067E 0631 06CC 0644
> >
> > These are the abmon strings, so I think it really would be a problem...
>
> This is the first abmon string:
>
> abmon "جنوری";/
>
> The last letter in this string, ی U+06CC ARABIC LETTER FARSI YEH
> is not convertible to CP1256.
> [...]
This "Charset: CP1256" is just a comment. Is it used anywhere? I don't
think so. I think that localedata/SUPPORTED file is relevant and it
requires ur_PK (and ur_IN as well) to be converted to UTF-8 only.
> [...]
> So I think we should replace
>
> % Charset: CP1256
>
> with
>
> % Charset: UTF-8
>
> in ur_PK.
The file currently is in pure 7-bit ASCII. Do we need this line
at all? What about removing it? If it should not be removed then
maybe let's consider ASCII. UTF-8 is good if ASCII cannot be used.
Actually, CP1256 is also true but misleading, the file uses an ASCII
charset which is a common subset of many other subsets. The only
problem is that CP1256 is misleading and causes those false positives.
TL;DR: my suggestions are (in the order of my preference):
- remove this line,
- replace with % Charset: ASCII
- replace with % Charset: UTF-8
- leave unchanged,
- feel free to post your own suggestion.
Regards,
Rafal