This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
- From: Marko Myllynen <myllynen at redhat dot com>
- To: Rafal Luzynski <digitalfreak at lingonborough dot com>, Egor Kobylkin <egor at kobylkin dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org
- Cc: Mike Fabian <mfabian at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>
- Date: Mon, 10 Dec 2018 23:20:33 +0200
- Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
- References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <b82fe65b-b880-a2b5-c97d-2a6aae9c1165@kobylkin.com> <837001401.21346.1542406647888@poczta.nazwa.pl> <bef63562-09d1-3306-aae9-20002ccf4130@kobylkin.com> <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> <1441622134.517912.1543702039942@poczta.nazwa.pl> <2f6fc82c-77ba-d331-ae5d-e2373e122a88@kobylkin.com> <1361059722.707244.1544231740358@poczta.nazwa.pl>
- Reply-to: Marko Myllynen <myllynen at redhat dot com>
Hi,
On 08/12/2018 03.15, Rafal Luzynski wrote:
> 17.11.2018 19:34 Egor Kobylkin <egor@kobylkin.com> wrote:
>>
>> The SH/Sh can be decided on either way - seems like an easy change any
>> way.
>
> I'm in favor of "Sh" because it will work fine for titlecased words
> (where only the first letter is uppercase) but I'm aware it would be
> a problem for uppercased words. Unfortunately, I think we are unable
> to satisfy both cases.
I think I'm in favor of "Sh" as well, although not perfect I'd assume
it's probably going to be correct in more cases than SH.
>> System A was added on Marko's request (so setting him on TO:) I am
>> neutral on keeping it or dropping it, just to be clear.
>
> I think I didn't see this Marko's request but I'm in favor of keeping
> System A, too.
>
> Marko, it would be good to hear your opinion about System A vs. System B
> again.
I think System A is a better option as it should be the same as ISO 9
and perhaps also produces results in some cases which are more expected
than with System B (if the Wikipedia ISO 9 article is to be believed).
Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also
deviate from it if needed, however with System A + ASCII fallback
definitions the RFE should be satisfied as well?
> 19.11.2018 20:35 Marko Myllynen <myllynen@redhat.com> wrote:
>> [...]
>> In any case once your patch lands I'm going to submit a follow-up patch
>> for fi_FI to make it compliant with the applicable national standard
>> (SFS 4900) which defines how to do Cyrillic transliteration /
>> transcription in the context Finnish.
>
> I totally agree. As far as I can see, SFS 4900 is more similar to
> System A (ISO 9) rather than System B, that is, it transliterates to Latin
> characters with diacritics rather than plain ASCII. Marko, what is your
> opinion about possible implementation of SFS 4900 in these cases:
>
> * When the destination charset does not contain required Latin diacritic
> characters (e.g., it is plain ASCII)?
This would be according to http://jkorpela.fi/iso9.html8 so for example
instead of ž -> zh and instead of štš -> shtsh.
> * When the output is ambiguous, that means, when two different Cyrillic
> strings produce the same Latin (or ASCII) output?
This is a good point and one I haven't considered but I'm not sure is
there anything we can do about this (at least without major locale
system internals work)? Do you have any rough idea how frequently this
could happen or is this more a theoretical issue? (Sorry if I've missed
earlier comments about this, it's been a long thread.)
>> The same with having both System A and System B. Initially I went along
>> with the suggestion to include the system A but it is clear now that it
>> doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose
>> to set it aside for the moment and use the v10 without the system A.
>> That is the whole reason I have submitted it, to be superclear on that.
>
> OK, I think that now I understand your reason to drop System A better.
> But still I'd like to rethink implementing System A somehow and drop
> (or rather: implement only partially) System B.
Yes, I also think System A AKA ISO 9 would be a better choice but I'll
leave the final decision for you two (and others who might weigh in).
Thanks,
--
Marko Myllynen