This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: localedata linting revised again
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
- Date: Tue, 29 Aug 2017 10:33:08 +0200
- Subject: Re: localedata linting revised again
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B0EBD4E33D
- References: <CAKCAbMiGM6kN1QNrVuaqBQLFf6a4b_2t1frT_DZqwsH5NBDuAg@mail.gmail.com>
Zack Weinberg <zackw@panix.com> wrote:
> I've revised my localedata linter to use iconv instead of python's
> built-in codecs, and to only complain about strings being
> unrepresentable if transliteration doesn't help.
>
> All of the remaining complaints are about strings that aren't NFC
> (full list at bottom of this message). Most, but not all, of these
> appear to be LC_COLLATE specifications for decomposed accented
> characters, which I would have expected to be handled generically for
> all languages (if there is a canonical equivalence between two
> codepoint sequences, then it seems intuitively obvious to me that they
> should always be treated the same for collation, perhaps with the
> actual code points used as a tiebreaker). But given the contents of
> the various files, apparently it isn't, and I think that's a bug.
>
> zw
>
> ---
[...]
> localedata/locales/sgs_LT:85: string not normalized:
> source: 0070 0065 0074 006E 0069 0304 010D 0117
> nfc: 0070 0065 0074 006E 012B 010D 0117
This is:
day "nedielės dëna";/
"panedielis";/
"oterninks";/
"sereda";/
"četvergs";/
"petnīčė";/
petnīčė <- source
petnīčė <- NFC
The NFC version seems better to me because it seems to render correctly
always for me. Whereas the current source version renders correctly with
some fonts. For example in gedit it renders correctly when using “DejaVu
Sans Book” or “DejaVu Sans Mono Book” but not when using “Liberation
Sans Regular” or “Liberation Mono Regular”. That is probably a bug in
the Liberation font, but I think the NFC version has greater chances
to render correctly, so I think we should fix this in our locale sources.
> localedata/locales/sgs_LT:117: string not normalized:
> source: 0074 0227 0304 0070
> nfc: 0074 01E1 0070
> localedata/locales/sgs_LT:118: string not normalized:
> source: 006E 0065 0304
> nfc: 006E 0113
Same here, using the NFC version increases the chances for correct
rendering, I think.
[...]
--
Mike FABIAN <mfabian@redhat.com>