This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29


Hi,

On 2018-10-10 01:08, Rafal Luzynski wrote:
> 9.10.2018 18:10 Marko Myllynen <myllynen@redhat.com> wrote:
>> On 2018-10-09 01:04, Rafal Luzynski wrote:
>>> If you refer to other languages than Russian which also use the Cyrillic
>>> alphabet but need a different transliteration rules than Russian for
>>> the same characters then it is OK for me now. I am afraid that the iconv
>>> algorithm does not handle such case. Of course, we should add this missing
>>> feature eventually but I do not volunteer to do it now.
>>
>> Yes, this would be needed for correct transliteration of different
>> languages, and this might be quite a bit of work. There's also the case
>> of transliteration and character sets, consider the transliteration
>> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:
>>
>> Russian: Борис Николаевич Ельцин
>> Int'l: Boris Nikolaevič Elʹcin
>> Finnish: Boris Nikolajevitš Jeltsin
>> French: Boris Nikolaïevitch Ieltsine
>> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
> 
> No, I did not mean the transcription using the rules of the destination
> locale using Latin but that the rules of transliteration may be different
> depending on the language of the source text.

Yes, I mentioned this case in my earlier email:

https://sourceware.org/ml/libc-alpha/2018-10/msg00083.html

> this Cyrillic string: "нъг" (I'm not telling that it is actually used
> in any existing word but still must be handled).  By our transliteration
> rules it will be transliterated as "n``g".  But this is fine for Russian;
> if we knew that the source string is Ukrainian it would be transliterated
> as "n``h"; if it was Bulgarian it would be transliterated as "năg".

And according to SFS 4900, in fi_FI for this string we would see for
Russian ng, for Ukrainian nh, and for Bulgarian năg.

> Unfortunately, I think that distinction of the source language is impossible
> at the moment so let's assume that we fall back to Russian if there is
> any ambiguity.

Yeah, it's not optimal but probably the most decent compromise for now.

Thanks,

-- 
Marko Myllynen


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]