This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2


Hi,

On 2018-10-11 14:04, Rafal Luzynski wrote:
> 
> First of all, I think that such a large patch should also include
> the tests.  Please see how automatic tests are performed in locale
> data and write your own.
> 
> 11.10.2018 00:29 Egor Kobylkin <egor@kobylkin.com> wrote:
> 
> Also I can see some gaps in the range.  Are you going to fill them
> or maybe for now just mention that they exist?
>
> <U040D> is missing here.  Can we add it already?
>
> Sure, I'm not going to stop you from pushing these changes just because
> there are missing characters.  I will consider adding them later.
> 
> <U0400> is missing here.  Are you going to leave it for now?

See check https://sourceware.org/ml/libc-alpha/2018-10/msg00160.html.

>> +% CYRILLIC CAPITAL LETTER U
>> +<U0423> <U0055>
>> +% CYRILLIC UNDEFINED
>> +<U0423><U0301> <U00DA>;"<U0055><U0060>"
> 
> This still makes me wonder.
> 
> Does it work at all?

No, see the above link.

More importantly, I realized that ICU uconv(1) I mentioned earlier
should make a great reference for this data; output of the currently
included transliteration rules should match uconv(1) output. If that is
not the case, the patch or uconv(1) might have an issue. If the outputs
match, then we should be able to safely assume the patch is ok.

It could also be considered to use uconv(1) output as reference how the
handle to currently missing characters.

(uconv(1) is part of the icu package on Fedora/CentOS/RHEL/openSUSE.)

Thanks,

-- 
Marko Myllynen


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]