This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

From: Marko Myllynen <myllynen at redhat dot com>
To: Rafal Luzynski <digitalfreak at lingonborough dot com>, Egor Kobylkin <egor at kobylkin dot com>, Keld Simonsen <keld at keldix dot com>
Cc: libc-alpha at sourceware dot org, libc-locales at sourceware dot org, "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
Date: Tue, 9 Oct 2018 19:10:26 +0300
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl>
Reply-to: Marko Myllynen <myllynen at redhat dot com>

Hi,

On 2018-10-09 01:04, Rafal Luzynski wrote:
> 
> Particularly, I think that those rules will not be helpful at all for
> the languages which use neither Latin nor Cyrillic alphabet.

This is certainly a very good point.

> If you refer to other languages than Russian which also use the Cyrillic
> alphabet but need a different transliteration rules than Russian for
> the same characters then it is OK for me now.  I am afraid that the iconv
> algorithm does not handle such case.  Of course, we should add this missing
> feature eventually but I do not volunteer to do it now.

Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:

Russian:        Борис Николаевич Ельцин
Int'l:          Boris Nikolaevič Elʹcin
Finnish:        Boris Nikolajevitš Jeltsin
French:         Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]

For French you'll get the correct transliteration with iconv by using -t
ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
not so obvious how to get the above kind transliteration for ISO 9
international or especially for the phonetic case.

One thing that might be helpful here could be something like:

$ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
ž

That is, force transliteration of each character (if defined) even if
it's part of the target character set. AFAICS this is not currently
possible.

> But, while at this, is there anything that stops are from adding transliteration
> rules for additional Cyrillic characters not used in Russian but used in
> other languages?

This would probably make sense.

FWIW, for Finnish the diff for Russian to be applied in the locale on
top of translit_cyrillic (ISO 9) rules would be something like below, I
still need to check whether there are rules needed for other languages
than Russian that could be added (I hope to submit a proper patch
against fi_FI shortly after translit_cyrillic has landed):

<U0446> "<U0074><U0073>"
<U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
<U0448> "<U0161>";"<U0073><U0068>"
<U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
<U044A> ""
<U044C> ""
<U044D> "<U0065>"
<U044E> "<U006A><U0075>"
<U044F> "<U006A><U0061>"
<U0451> "<U006A><U006F>"

Thanks,

-- 
Marko Myllynen

Follow-Ups:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Marko Myllynen

References:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Keld Simonsen
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]