This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improved check-localedef script

From: Mike FABIAN <mfabian at redhat dot com>
To: Zack Weinberg <zackw at panix dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
Date: Fri, 04 Aug 2017 11:58:53 +0200
Subject: Re: Improved check-localedef script
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 32C856147B
References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com>

Zack Weinberg <zackw@panix.com> wrote:

> Here is an improved version of the check-localedef script I posted the
> other week.  It now takes only about 1.5 seconds to process all the
> files in localedata/locales/ (instead of seven seconds with the old
> parser), which is fast enough that I think it would be reasonable to
> run it during 'make check'.  Also, many bugs have been fixed.
> Especially, the "can we encode this string in the charset that the
> file is annotated with" test now actually _runs_...

    om_ET:15: unknown charset '(Under'
    om_ET:15: unknown charset 'Qubee'
    om_ET:15: unknown charset 'conventions)'
    om_KE:15: unknown charset '(Under'
    om_KE:15: unknown charset 'Qubee'
    om_KE:15: unknown charset 'conventions)'

Seems to be a failure in the script to parse this correctly:

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %
    % Oromo language locale for Ethiopia.
    %
    % Charset: UTF-8 (Under Qubee conventions)

This seems to be an error in the source file:

    lt_LT:35: unknown charset 'BALTIC'

    % Charset: BALTIC

does look weird indeed. 

The other “unknown charset” errors  are from

    ta_LK:
    % Charset: SLS 1326:2008

    th_TH:
    % Charset:   TIS-620.2533:1990

    tt_RU:
    % Charset: TATAR-CYR

I am not sure wether these “% Charset: ” are useful for anything
at the moment.

Maybe we should remove them all or replace them all by “% Charset: UTF-8”
and make sure that all locale source files are UTF-8 encoded?

(Only POSIX portable character set (ASCII subset) in the actual data,
but UTF-8 allowed in comments).

-- 
Mike FABIAN <mfabian@redhat.com>

References:
- Improved check-localedef script
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]