This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
On 2018-10-10 01:08, Rafal Luzynski wrote:
> 9.10.2018 18:10 Marko Myllynen <email@example.com> wrote:
>> On 2018-10-09 01:04, Rafal Luzynski wrote:
>>> If you refer to other languages than Russian which also use the Cyrillic
>>> alphabet but need a different transliteration rules than Russian for
>>> the same characters then it is OK for me now. I am afraid that the iconv
>>> algorithm does not handle such case. Of course, we should add this missing
>>> feature eventually but I do not volunteer to do it now.
>> Yes, this would be needed for correct transliteration of different
>> languages, and this might be quite a bit of work. There's also the case
>> of transliteration and character sets, consider the transliteration
>> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:
>> Russian: Борис Николаевич Ельцин
>> Int'l: Boris Nikolaevič Elʹcin
>> Finnish: Boris Nikolajevitš Jeltsin
>> French: Boris Nikolaïevitch Ieltsine
>> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
> No, I did not mean the transcription using the rules of the destination
> locale using Latin but that the rules of transliteration may be different
> depending on the language of the source text.
Yes, I mentioned this case in my earlier email:
> this Cyrillic string: "нъг" (I'm not telling that it is actually used
> in any existing word but still must be handled). By our transliteration
> rules it will be transliterated as "n``g". But this is fine for Russian;
> if we knew that the source string is Ukrainian it would be transliterated
> as "n``h"; if it was Bulgarian it would be transliterated as "năg".
And according to SFS 4900, in fi_FI for this string we would see for
Russian ng, for Ukrainian nh, and for Bulgarian năg.
> Unfortunately, I think that distinction of the source language is impossible
> at the moment so let's assume that we fall back to Russian if there is
> any ambiguity.
Yeah, it's not optimal but probably the most decent compromise for now.