This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

From: Rafal Luzynski <digitalfreak at lingonborough dot com>
To: Egor Kobylkin <egor at kobylkin dot com>, Keld Simonsen <keld at keldix dot com>, Marko Myllynen <myllynen at redhat dot com>
Cc: libc-alpha at sourceware dot org, libc-locales at sourceware dot org, "Dmitry V. Levin" <ldv at altlinux dot org>, Volodymyr Lisivka <vlisivka at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>, Max Kutny <mkutny at gmail dot com>, danilo at gnome dot org
Date: Tue, 9 Oct 2018 00:04:55 +0200 (CEST)
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com>
Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>

5.10.2018 12:36 Egor Kobylkin <egor@kobylkin.com> wrote:
> [...]
> I see three options:
> 1. those locale maintainers that are fine with using ISO
> 9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
> in their locales. https://sourceware.org/bugzilla/attachment.cgi?id=11289
> 2. those that that want to have a differing table can create their own
> variety based on the spreadsheet I have prepared
> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
> this patch.
> 3. those that want to omit a cyrillic transliteration altogether for now
> state so and just carry over the bug #2872 from the year 2006.
>
> Does this make sense to you?

The problem is that we don't have a separate maintainer for each locale,
we have only 2 maintainers for about 200 locales and we must represent
them all.  Sometimes a locale may happen to be our own native locale or
of someone in this list, or it may be a locale which we accidentally can
speak as a foreign language, or we may have friends who can speak it.
Or it may be totally unknown and we still must somehow handle it.

I think that these transliteration rules should be included in multiple
locales on "opt-in" basis rather than "opt-out".  I mean, we should not
include them in all locales unless someone explicitly provides a different
rules.  Instead, I think we should add them (maybe with modification)
only to those locales where we have a good reason to think they will work.

Particularly, I think that those rules will not be helpful at all for
the languages which use neither Latin nor Cyrillic alphabet.

> [...]
> The fact that the patch is reflecting Russian variety of ISO
> 9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
> available and can be helpful to a majority of cyrillic users b) I have
> access to it including via being proficient in Russian.

I took a look at these standards and as first I doubted they may be
correct for English language now I understand they are created for
Russian users.  Therefore I think it is pretty correct to include them
to Russian locale data.  Will it be OK if we say that it is only for
Russian language?  Will it be satisfying for you and/or your users?

> It is offered to all the respective locale maintainers as a stopgap
> solution. Stopgap in the sense that it is better to have some
> transliteration than not to have any at all and carry over the bug from
> 2006. That it may be a somewhat officially correct transliteration for
> ru_RU is a bonus. In that sense I would dub the discussion on the
> correctness for other languages "offtopic". Let me know if this is not OK.

If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now.  I am afraid that the iconv
algorithm does not handle such case.  Of course, we should add this missing
feature eventually but I do not volunteer to do it now.

> [...]
> P.S. specifically as to how address languages other than Ru included in
> GOST_7.79_System_B: we can take the first option left to right from that
> table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
> locales/languages but with errors where Ru supersedes their own variants.

Makes sense, as long as we cannot select the source language now.

But, while at this, is there anything that stops are from adding transliteration
rules for additional Cyrillic characters not used in Russian but used in
other languages?

Regards,

Rafal

Follow-Ups:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Zack Weinberg
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Marko Myllynen

References:
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Keld Simonsen
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin
- Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Rafal Luzynski
- [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
  - From: Egor Kobylkin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]