This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

From: Egor Kobylkin <egor at kobylkin dot com>
To: Marko Myllynen <myllynen at redhat dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, Keld Simonsen <keld at keldix dot com>
Cc: libc-alpha at sourceware dot org, libc-locales at sourceware dot org, "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
Date: Tue, 9 Oct 2018 18:22:22 +0200
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com>

In the hope to be helpful: what you describe below from
https://fi.wikipedia.org/wiki/Siirtokirjoitus is called _transcription_,
not transliteration.

Transliteration is what we have done with ISO 9 or GOST 7.79 System A
and it could be the same for all languages indeed.

The transcription can be phonetic or serve other purposes and depends on
the target language or use case. We have used the GOST 7.79 System B.

Egor

On 09.10.2018 18:10, Marko Myllynen wrote:
> Hi,
> 
> On 2018-10-09 01:04, Rafal Luzynski wrote:
>>
>> Particularly, I think that those rules will not be helpful at all for
>> the languages which use neither Latin nor Cyrillic alphabet.
> 
> This is certainly a very good point.
> 
>> If you refer to other languages than Russian which also use the Cyrillic
>> alphabet but need a different transliteration rules than Russian for
>> the same characters then it is OK for me now.  I am afraid that the iconv
>> algorithm does not handle such case.  Of course, we should add this missing
>> feature eventually but I do not volunteer to do it now.
> 
> Yes, this would be needed for correct transliteration of different
> languages, and this might be quite a bit of work. There's also the case
> of transliteration and character sets, consider the transliteration
> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:
> 
> Russian:        Борис Николаевич Ельцин
> Int'l:          Boris Nikolaevič Elʹcin
> Finnish:        Boris Nikolajevitš Jeltsin
> French:         Boris Nikolaïevitch Ieltsine
> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
> 
> For French you'll get the correct transliteration with iconv by using -t
> ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
> not so obvious how to get the above kind transliteration for ISO 9
> international or especially for the phonetic case.
> 
> One thing that might be helpful here could be something like:
> 
> $ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
> ž
> 
> That is, force transliteration of each character (if defined) even if
> it's part of the target character set. AFAICS this is not currently
> possible.
> 
>> But, while at this, is there anything that stops are from adding transliteration
>> rules for additional Cyrillic characters not used in Russian but used in
>> other languages?
> 
> This would probably make sense.
> 
> FWIW, for Finnish the diff for Russian to be applied in the locale on
> top of translit_cyrillic (ISO 9) rules would be something like below, I
> still need to check whether there are rules needed for other languages
> than Russian that could be added (I hope to submit a proper patch
> against fi_FI shortly after translit_cyrillic has landed):
> 
> <U0446> "<U0074><U0073>"
> <U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
> <U0448> "<U0161>";"<U0073><U0068>"
> <U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
> <U044A> ""
> <U044C> ""
> <U044D> "<U0065>"
> <U044E> "<U006A><U0075>"
> <U044F> "<U006A><U0061>"
> <U0451> "<U006A><U006F>"
> 
> Thanks,
>

Follow-Ups:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Marko Myllynen

References:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Keld Simonsen
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Marko Myllynen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]