This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]

From: Marko Myllynen <myllynen at redhat dot com>
To: Egor Kobylkin <egor at kobylkin dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org
Date: Mon, 19 Nov 2018 09:13:55 +0200
Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <b82fe65b-b880-a2b5-c97d-2a6aae9c1165@kobylkin.com> <837001401.21346.1542406647888@poczta.nazwa.pl> <bef63562-09d1-3306-aae9-20002ccf4130@kobylkin.com>
Reply-to: Marko Myllynen <myllynen at redhat dot com>

Hi,

On 17/11/2018 20.34, Egor Kobylkin wrote:
> 
> Looks like we have three issues:
> 1. lack of explicit control which transformation to use (System A or
> System B) via //TRANSLIT
> 2. possibility of collision for System B if used CAP/low transcription
> for capital letters
> 3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per
> System B because it's equivalent 'X'/'x' from System A is always present
> and takes precedence.
> 
> As a solution shouldn't we only keep System B in a new file
> transcribe_cyrillic and put it in place as the explicit ASCII
> transcription for targeted locales (as opposed to transliteration)?
> 
> We would keep System A as translit_cyrillic but won't include it into
> this patch. Once you have resolved an issue of having two conflicting
> rule-sets but only one key //TRANSLIT you could add the System A back.
> 
> The SH/Sh can be decided on either way - seems like an easy change any way.
> 
> I have a question then: isn't this more like a hack than a right thing
> to do?
> 
> Shouldn't we have two explicit rules for transcription and
> transliteration not dependent on a destination character set?
> 
> This would contradict ISO 9.1995. (System A).
> System A was added on Marko's request (so setting him on TO:) I am
> neutral on keeping it or dropping it, just to be clear.
> 
> This particular rule with h/x would make sense it's own.
> But again - it would contradict the standards.
> On the other hand, for my personal needs I care less about standards but
> about current functionality and data loss because of missing
> transcription altogether due to the BZ #2872.

Given the amount of questions above I think the way forward is to try
follow the relevant standards as closely as possible and also check what
the other implementations (i.e., uconv(1)) do. For example, checking the
case earlier mentioned case may or may not give some hints:

$ echo Шема  | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Šema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Shema
$ uconv -V
uconv v2.1  ICU 50.1.2

Thanks,

-- 
Marko Myllynen

Follow-Ups:
- Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
  - From: Egor Kobylkin

References:
- [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
  - From: Egor Kobylkin
- Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
  - From: Rafal Luzynski
- Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
  - From: Egor Kobylkin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]