This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: localedata linting revised again


Zack Weinberg <zackw@panix.com> wrote:

> I've revised my localedata linter to use iconv instead of python's
> built-in codecs, and to only complain about strings being
> unrepresentable if transliteration doesn't help.
>
> All of the remaining complaints are about strings that aren't NFC
> (full list at bottom of this message).  Most, but not all, of these
> appear to be LC_COLLATE specifications for decomposed accented
> characters, which I would have expected to be handled generically for
> all languages (if there is a canonical equivalence between two
> codepoint sequences, then it seems intuitively obvious to me that they
> should always be treated the same for collation, perhaps with the
> actual code points used as a tiebreaker).  But given the contents of
> the various files, apparently it isn't, and I think that's a bug.
>
> zw
>
> ---

[...]

> localedata/locales/de_DE:50: string not normalized:
>   source: 0041 0308
>      nfc: 00C4

Many of these are from  custom transliteration rules.
In this case it is:

LC_CTYPE
copy "i18n"

translit_start

include "translit_combining";""

% German umlauts.
% LATIN CAPITAL LETTER A WITH DIAERESIS.
<U00C4> "<U0041><U0308>";"<U0041><U0045>"

That seems correct, doesn’t it?

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]