[PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

Rafal Luzynski digitalfreak@lingonborough.com
Fri Oct 5 09:20:00 GMT 2018


3.10.2018 11:32 Egor Kobylkin <egor@kobylkin.com> wrote:
>
> On 03.10.2018 11:19, Keld Simonsen wrote:
> > Hi
> >
> > Please note that translitteration of Cyrillic to latin is not universal.
> > There are different schemes for for example German, English and Danish, and
> > there is also an ISO standard for it.
>
> Thanks for your feedback, Keld!
>
> Could the locale maintainers that wouldn't like to include this patch
> explicitly state so here?

I think it is about me so I must reply.  I am sorry about that and the sole
reason is my lack of time.  I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project.  I do these things only in my free time which I don't
have much.  Of course you will see my contributions here and there but they
are either trivial or take me months to complete.  Your patches are on my
radar but I can't tell any ETA for them.  Of course, there are other people
around here and they are all welcome to come and join.

> That is:
> - In the case that there is a different preferred cyrillic
> transliteration table for any specific locale their maintainers may want
> to point me to it so I can supply a separate table/patch.
> - Or they could state explicitly that for some reason they would like to
> exclude their locale from the patch for a default cyrillic
> transliteration altogether.

As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale.  At first sight, it seems to me they work only for English
(as a destination locale).  Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet.  What about
other languages which use Cyrillic alphabet but add their own diacritic
characters?  Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more.  What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian?  For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".

Few remarks:

* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better?  By the way,
  in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
  "y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
  to "e" or "ie" or "je" or "ye"?

These remarks are obviously incomplete, your patch deserves much more
attention to review.

Best regards,

Rafal



More information about the Libc-locales mailing list