This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2
- From: Marko Myllynen <myllynen at redhat dot com>
- To: Rafal Luzynski <digitalfreak at lingonborough dot com>, Egor Kobylkin <egor at kobylkin dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org, mfabian at redhat dot com
- Cc: "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
- Date: Thu, 11 Oct 2018 16:10:49 +0300
- Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2
- References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <bcb7fcd8-7f71-4f7e-6804-7c3f07d6d3ee@kobylkin.com> <180516689.458569.1539255868196@poczta.nazwa.pl>
- Reply-to: Marko Myllynen <myllynen at redhat dot com>
Hi,
On 2018-10-11 14:04, Rafal Luzynski wrote:
>
> First of all, I think that such a large patch should also include
> the tests. Please see how automatic tests are performed in locale
> data and write your own.
>
> 11.10.2018 00:29 Egor Kobylkin <egor@kobylkin.com> wrote:
>
> Also I can see some gaps in the range. Are you going to fill them
> or maybe for now just mention that they exist?
>
> <U040D> is missing here. Can we add it already?
>
> Sure, I'm not going to stop you from pushing these changes just because
> there are missing characters. I will consider adding them later.
>
> <U0400> is missing here. Are you going to leave it for now?
See check https://sourceware.org/ml/libc-alpha/2018-10/msg00160.html.
>> +% CYRILLIC CAPITAL LETTER U
>> +<U0423> <U0055>
>> +% CYRILLIC UNDEFINED
>> +<U0423><U0301> <U00DA>;"<U0055><U0060>"
>
> This still makes me wonder.
>
> Does it work at all?
No, see the above link.
More importantly, I realized that ICU uconv(1) I mentioned earlier
should make a great reference for this data; output of the currently
included transliteration rules should match uconv(1) output. If that is
not the case, the patch or uconv(1) might have an issue. If the outputs
match, then we should be able to safely assume the patch is ok.
It could also be considered to use uconv(1) output as reference how the
handle to currently missing characters.
(uconv(1) is part of the icu package on Fedora/CentOS/RHEL/openSUSE.)
Thanks,
--
Marko Myllynen