This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: "Diego (Egor) Kobylkin" <egor at kobylkin dot com>, Marko Myllynen <myllynen at redhat dot com>, Carlos O'Donell <codonell at redhat dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "libc-locales at sourceware dot org" <libc-locales at sourceware dot org>, Siddhesh Poyarekar <siddhesh at gotplt dot org>
- Cc: Mike Fabian <mfabian at redhat dot com>
- Date: Thu, 6 Jun 2019 01:49:04 +0200 (CEST)
- Subject: Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
- References: <DDiRMB942zU2NTs_1xTsb-zTgRD2L6AOaaJW-a0-0YJ3O5voZt2GeTjQJQ0c_hExTwcJKvBMiXIeyHsdieM2Q1m61oOpU27Msj09zowycVM=@kobylkin.com>
5.06.2019 08:47 "Diego (Egor) Kobylkin" <egor@kobylkin.com> wrote:
>
> ping
>
> Egor Kobylkin
I second these pings. Marko, Carlos, Siddhesh, Mike, is there anything
else I can do here?
Since the questions may sound overwhelming, I'd like to focus on
a single issue:
How should we handle the upper/lower case when a single Cyrillic letter
is transliterated to a Latin digraph (trigraph, etc.)?
Possible answers (Cyrillic -> Latin Extended -> ASCII):
1. "Ш" -> "Š" -> "SH"
e.g.: "Шема" -> "Šema" -> "SHema"
"Схема" ----------> "Shema"
2. "Ш" -> "Š" -> "Sh"
e.g.: "Шема" -> "Šema" -> "Shema"
"Схема" ----------> "Shema"
Personally I don't like the answer 1. because "SHema" looks weird
to me. Egor in turn does not like the answer 2. because the output
string becomes ambiguous.
Should we maybe have a smart algorithm which would select the title
case or the upper case of the output characters depending on the
context in the word? Note that it would not resolve the problem of
the output text being ambiguous.
Regards,
Rafal