This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: localedata linting revised again


Zack Weinberg <zackw@panix.com> wrote:

> I've revised my localedata linter to use iconv instead of python's
> built-in codecs, and to only complain about strings being
> unrepresentable if transliteration doesn't help.
>
> All of the remaining complaints are about strings that aren't NFC
> (full list at bottom of this message).  Most, but not all, of these
> appear to be LC_COLLATE specifications for decomposed accented
> characters, which I would have expected to be handled generically for
> all languages (if there is a canonical equivalence between two
> codepoint sequences, then it seems intuitively obvious to me that they
> should always be treated the same for collation, perhaps with the
> actual code points used as a tiebreaker).  But given the contents of
> the various files, apparently it isn't, and I think that's a bug.
>
> zw
>
> ---

[...]


> localedata/locales/sgs_LT:85: string not normalized:
>   source: 0070 0065 0074 006E 0069 0304 010D 0117
>      nfc: 0070 0065 0074 006E 012B 010D 0117

This is:

day     "nedielės dëna";/
        "panedielis";/
        "oterninks";/
        "sereda";/
        "četvergs";/
        "petnīčė";/

petnīčė <- source
petnīčė <- NFC

The NFC version seems better to me because it seems to render correctly
always for me. Whereas the current source version renders correctly with
some fonts. For example in gedit it renders correctly when using “DejaVu
Sans Book” or “DejaVu Sans Mono Book” but not when using “Liberation
Sans Regular” or “Liberation Mono Regular”. That is probably a bug in
the Liberation font, but I think the NFC version has greater chances
to render correctly, so I think we should fix this in our locale sources.

> localedata/locales/sgs_LT:117: string not normalized:
>   source: 0074 0227 0304 0070
>      nfc: 0074 01E1 0070


> localedata/locales/sgs_LT:118: string not normalized:
>   source: 006E 0065 0304
>      nfc: 006E 0113

Same  here, using the NFC version increases the chances for correct
rendering, I think.

[...]

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]