This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
- From: Marko Myllynen <myllynen at redhat dot com>
- To: Egor Kobylkin <egor at kobylkin dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org
- Cc: mfabian at redhat dot com, "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
- Date: Mon, 15 Oct 2018 14:04:52 +0300
- Subject: Re: [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
- References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <d5582688-819b-90c2-3f4a-0d19c932d487@kobylkin.com> <165238610.582597.1539392357757@poczta.nazwa.pl> <e072a70c-9962-4087-93c2-06ec3c9a0b1f@kobylkin.com>
- Reply-to: Marko Myllynen <myllynen at redhat dot com>
Hi,
On 2018-10-13 19:58, Egor Kobylkin wrote:
> On 13.10.2018 02:59, Rafal Luzynski wrote:
>
>> Regarding the tests, I think there is no complete transliteration
>> test suite at the moment. Probably the only test is
>> localedata/bug-iconv-trans.c. You can also see the collation tests
>> placed in the same directory, they use those multiple *.UTF-8.in
>> files.
>>
>> You can skip the tests for now.
>
> First I though they could just be added but not all locales
> transliterate Umlauts so just extending the current test won't do as it
> will fail for those locales.
I still think a one-time check against uconv(1) (part of Unicode's ICU
project) for discrepancies.
>>> [...] diff -uNr a/localedata/locales/am_ET
>>> b/localedata/locales/am_ET --- a/localedata/locales/am_ET
>>> 2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET
>>> 2018-10-11 15:10:43.000000000 +0000 @@ -1394,6 +1394,7 @@ <U137A>
>>> <U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C>
>>> <U0060><U0031><U0030><U0030><U0030><U0030> +include
>>> "translit_cyrillic";"" translit_end % END LC_CTYPE
>>
>> Shouldn't “include "translit_cyrillic";""” be placed before the
>> custom rules, together with other includes? The same in more files,
>> I will not mention them all.
>
> If I recall correctly it is because of the
> "translit_end
> END LC_CTYPE"
> part at the end of the translit_cyrillic. This way it works for any
> locale, regardless whether it has translit itself or not. And being at
> the end it does not supersede any previous transliteration that may be
> there for a reason.
I suspect one problem would be that the latter rule wins, so if there
are some locale-specific rules than possible translit_* inclusions would
override them if not included before the locale-specific rules.
Cheers,
--
Marko Myllynen