This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v10] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
On 19.12.18 23:41, Rafal Luzynski wrote:
> 8.12.2018 22:51 Egor Kobylkin <egor@kobylkin.com> wrote:
>>
>> Rafal, Dmitry, Marko, Mike
>>
>> On 08.12.18 00:35, Rafal Luzynski wrote:
>>> 19.11.2018 12:10 Egor Kobylkin <egor@kobylkin.com> wrote:
>>>>
>>>> Changelog v10: * Removed ISO 9.1995 GOST 7.79-2000 System A
>>>> (transliteration to Latin with diacritics) as conflicting with
>>>> System B within glibc mechanics and not solving BZ #2872
>>>
>>> I'm in favor of implementing System A and dropping System B instead.
>>
>> The BZ #2872 bug name is explicitly "Transliteration Cyrillic -> ASCII
>> fails". The ISO 9 System A does not map to ASCII so it is not a solution
>> to BZ #2872 at all.
>
> I did not mean implementing System A and nothing more. I meant implementing
> System A and a fallback for ASCII which can be similar to System B but
> we wouldn't be able to call it "System B" because it would differ in
> few cases.
Just for the record, I have no objection on my side to that (Using A as
a basis for ASCII as well).
But I'm not sure anymore that inserting a translit table into every
locale is the right solution for ASCII problem. Especially because
distributions may not include any locale but C.
>
>> I was scratching my head as to how can we avoid the explosion of the
>> scope for this patch. And then it appeared to me that it was wrong to
>> target all the present locales for the ASCII translit. This seems to be
>> the root cause for this prolonged A vs. B discussions. The proper target
>> for my table is actually the C locale translit file
>> (locale/C-translit.h.in). I will submit a proper patch shortly.
>
> I saw your patch v11 and now I must say I'm sorry for making noise because
> it was me who said that I didn't mind adding Cyrillic -> ASCII
> transliteration
> to C locale. I said so before taking a look at the current contents of
> transliteration in C locale. When I looked at this I realized that it does
> not support any national characters, even from modified Latin alphabets
> (like
> used in most of western European languages). It only contains mathematical,
> physical, commercial, diacritical etc. characters. So I'm no longer sure
> it should support Cyrillic -> ASCII. But maybe again I'm wrong, maybe
> it should support but just nobody implemented it yet.
Actually there are quite a few letters already transliterated in
locale/C-translit.h.in. (Note the CAPCAP transliteration style for the
capitals, i.e. LATIN CAPITAL LETTER AE is mapped to AE, not to Ae.)
"\x00c6" "AE" /* <U00C6> LATIN CAPITAL LETTER AE */
"\x00d7" "x" /* <U00D7> MULTIPLICATION SIGN */
"\x00df" "ss" /* <U00DF> LATIN SMALL LETTER SHARP S */
"\x00e6" "ae" /* <U00E6> LATIN SMALL LETTER AE */
"\x0132" "IJ" /* <U0132> LATIN CAPITAL LIGATURE IJ */
"\x0133" "ij" /* <U0133> LATIN SMALL LIGATURE IJ */
"\x0149" "'n" /* <U0149> LATIN SMALL LETTER N PRECEDED BY APOSTROPHE */
"\x0152" "OE" /* <U0152> LATIN CAPITAL LIGATURE OE */
"\x0153" "oe" /* <U0153> LATIN SMALL LIGATURE OE */
"\x017f" "s" /* <U017F> LATIN SMALL LETTER LONG S */
"\x01c7" "LJ" /* <U01C7> LATIN CAPITAL LETTER LJ */
"\x01c8" "Lj" /* <U01C8> LATIN CAPITAL LETTER L WITH SMALL LETTER J */
"\x01c9" "lj" /* <U01C9> LATIN SMALL LETTER LJ */
"\x01ca" "NJ" /* <U01CA> LATIN CAPITAL LETTER NJ */
"\x01cb" "Nj" /* <U01CB> LATIN CAPITAL LETTER N WITH SMALL LETTER J */
"\x01cc" "nj" /* <U01CC> LATIN SMALL LETTER NJ */
"\x01f1" "DZ" /* <U01F1> LATIN CAPITAL LETTER DZ */
"\x01f2" "Dz" /* <U01F2> LATIN CAPITAL LETTER D WITH SMALL LETTER Z */
"\x01f3" "dz" /* <U01F3> LATIN SMALL LETTER DZ */
>> My focus is super sharp on helping with Cyrillic -> ASCII translit
>> availability for a default installation with glibc.
>
> I understand your aim and I agree to support ASCII. Our disagreements are:
>
> * whether to support conversion Cyrillic -> extended Latin as well,
no contest on my side
> * which standard to implement,
no contest on my side
> * what to do if the standard is ambiguous or if some details cannot be
> implemented for technical reasons.
no contest on my side either
I just think we may work around all those decisions with a smaller pure
ASCII patch first (more useful too if covers C locale).