This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: localedata linting revised again

From: Mike FABIAN <mfabian at redhat dot com>
To: Zack Weinberg <zackw at panix dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
Date: Tue, 29 Aug 2017 10:24:05 +0200
Subject: Re: localedata linting revised again
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 5D67D8553D
References: <CAKCAbMiGM6kN1QNrVuaqBQLFf6a4b_2t1frT_DZqwsH5NBDuAg@mail.gmail.com>

Zack Weinberg <zackw@panix.com> wrote:

> I've revised my localedata linter to use iconv instead of python's
> built-in codecs, and to only complain about strings being
> unrepresentable if transliteration doesn't help.
>
> All of the remaining complaints are about strings that aren't NFC
> (full list at bottom of this message).  Most, but not all, of these
> appear to be LC_COLLATE specifications for decomposed accented
> characters, which I would have expected to be handled generically for
> all languages (if there is a canonical equivalence between two
> codepoint sequences, then it seems intuitively obvious to me that they
> should always be treated the same for collation, perhaps with the
> actual code points used as a tiebreaker).  But given the contents of
> the various files, apparently it isn't, and I think that's a bug.
>
> zw
>
> ---

[...]

> localedata/locales/de_DE:50: string not normalized:
>   source: 0041 0308
>      nfc: 00C4

Many of these are from  custom transliteration rules.
In this case it is:

LC_CTYPE
copy "i18n"

translit_start

include "translit_combining";""

% German umlauts.
% LATIN CAPITAL LETTER A WITH DIAERESIS.
<U00C4> "<U0041><U0308>";"<U0041><U0045>"

That seems correct, doesn’t it?

-- 
Mike FABIAN <mfabian@redhat.com>

References:
- localedata linting revised again
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]