This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

From: Marko Myllynen <myllynen at redhat dot com>
To: Rafal Luzynski <digitalfreak at lingonborough dot com>, Egor Kobylkin <egor at kobylkin dot com>, Keld Simonsen <keld at keldix dot com>
Cc: libc-alpha at sourceware dot org, libc-locales at sourceware dot org, "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
Date: Wed, 10 Oct 2018 14:21:46 +0300
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com> <1984104697.413415.1539122936119@poczta.nazwa.pl>
Reply-to: Marko Myllynen <myllynen at redhat dot com>

Hi,

On 2018-10-10 01:08, Rafal Luzynski wrote:
> 9.10.2018 18:10 Marko Myllynen <myllynen@redhat.com> wrote:
>> On 2018-10-09 01:04, Rafal Luzynski wrote:
>>> If you refer to other languages than Russian which also use the Cyrillic
>>> alphabet but need a different transliteration rules than Russian for
>>> the same characters then it is OK for me now. I am afraid that the iconv
>>> algorithm does not handle such case. Of course, we should add this missing
>>> feature eventually but I do not volunteer to do it now.
>>
>> Yes, this would be needed for correct transliteration of different
>> languages, and this might be quite a bit of work. There's also the case
>> of transliteration and character sets, consider the transliteration
>> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:
>>
>> Russian: Борис Николаевич Ельцин
>> Int'l: Boris Nikolaevič Elʹcin
>> Finnish: Boris Nikolajevitš Jeltsin
>> French: Boris Nikolaïevitch Ieltsine
>> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
> 
> No, I did not mean the transcription using the rules of the destination
> locale using Latin but that the rules of transliteration may be different
> depending on the language of the source text.

Yes, I mentioned this case in my earlier email:

https://sourceware.org/ml/libc-alpha/2018-10/msg00083.html

> this Cyrillic string: "нъг" (I'm not telling that it is actually used
> in any existing word but still must be handled).  By our transliteration
> rules it will be transliterated as "n``g".  But this is fine for Russian;
> if we knew that the source string is Ukrainian it would be transliterated
> as "n``h"; if it was Bulgarian it would be transliterated as "năg".

And according to SFS 4900, in fi_FI for this string we would see for
Russian ng, for Ukrainian nh, and for Bulgarian năg.

> Unfortunately, I think that distinction of the source language is impossible
> at the moment so let's assume that we fall back to Russian if there is
> any ambiguity.

Yeah, it's not optimal but probably the most decent compromise for now.

Thanks,

-- 
Marko Myllynen

References:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Keld Simonsen
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Marko Myllynen
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]