This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]


5.06.2019 08:47 "Diego (Egor) Kobylkin" <egor@kobylkin.com> wrote:
> 
> ping
> 
> Egor Kobylkin

I second these pings.  Marko, Carlos, Siddhesh, Mike, is there anything
else I can do here?

Since the questions may sound overwhelming, I'd like to focus on
a single issue:

How should we handle the upper/lower case when a single Cyrillic letter
is transliterated to a Latin digraph (trigraph, etc.)?

Possible answers (Cyrillic -> Latin Extended -> ASCII):

1. "Ш" -> "Š" -> "SH"

   e.g.: "Шема" -> "Šema" -> "SHema"
         "Схема" ----------> "Shema"

2. "Ш" -> "Š" -> "Sh"

   e.g.: "Шема" -> "Šema" -> "Shema"
         "Схема" ----------> "Shema"

Personally I don't like the answer 1. because "SHema" looks weird
to me.  Egor in turn does not like the answer 2. because the output
string becomes ambiguous.

Should we maybe have a smart algorithm which would select the title
case or the upper case of the output characters depending on the
context in the word?  Note that it would not resolve the problem of
the output text being ambiguous.

Regards,

Rafal


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]