This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
- Date: Fri, 04 Aug 2017 11:58:53 +0200
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 32C856147B
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com>
Zack Weinberg <zackw@panix.com> wrote:
> Here is an improved version of the check-localedef script I posted the
> other week. It now takes only about 1.5 seconds to process all the
> files in localedata/locales/ (instead of seven seconds with the old
> parser), which is fast enough that I think it would be reasonable to
> run it during 'make check'. Also, many bugs have been fixed.
> Especially, the "can we encode this string in the charset that the
> file is annotated with" test now actually _runs_...
om_ET:15: unknown charset '(Under'
om_ET:15: unknown charset 'Qubee'
om_ET:15: unknown charset 'conventions)'
om_KE:15: unknown charset '(Under'
om_KE:15: unknown charset 'Qubee'
om_KE:15: unknown charset 'conventions)'
Seems to be a failure in the script to parse this correctly:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Oromo language locale for Ethiopia.
%
% Charset: UTF-8 (Under Qubee conventions)
This seems to be an error in the source file:
lt_LT:35: unknown charset 'BALTIC'
% Charset: BALTIC
does look weird indeed.
The other “unknown charset” errors are from
ta_LK:
% Charset: SLS 1326:2008
th_TH:
% Charset: TIS-620.2533:1990
tt_RU:
% Charset: TATAR-CYR
I am not sure wether these “% Charset: ” are useful for anything
at the moment.
Maybe we should remove them all or replace them all by “% Charset: UTF-8”
and make sure that all locale source files are UTF-8 encoded?
(Only POSIX portable character set (ASCII subset) in the actual data,
but UTF-8 allowed in comments).
--
Mike FABIAN <mfabian@redhat.com>