This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
- From: Marko Myllynen <myllynen at redhat dot com>
- To: Egor Kobylkin <egor at kobylkin dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org, Carlos O'Donell <carlos at redhat dot com>, Siddhesh Poyarekar <siddhesh at gotplt dot org>
- Cc: Mike Fabian <mfabian at redhat dot com>
- Date: Mon, 7 Jan 2019 22:37:24 +0200
- Subject: Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
- References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <a1db6ae3-2847-1482-b849-dd383e8c85aa@kobylkin.com> <2124833400.35614.1546698902753@poczta.nazwa.pl> <908ed415-cfe4-804c-f421-4351ef062edc@kobylkin.com>
- Reply-to: Marko Myllynen <myllynen at redhat dot com>
Hi,
On 05/01/2019 23.12, Egor Kobylkin wrote:
> On 05.01.19 15:35, Rafal Luzynski wrote:
>> 2.01.2019 19:38 Egor Kobylkin <egor@kobylkin.com> wrote:
>>>
>>> Changelog v12:
>>> [...]
>>>
>>> Changelog v11:
>>> * Re-targeted the patch against locale/C-translit.h.in as the proper
>>> file for the ASCII translit table.
>>> * Correspondingly the patch now only contains the additional
>>> Cyrillic-ASCII strings in the format of locale/C-translit.h.in table.
>>> The 'include "translit_cyrillic";""' directives are not necessary in the
>>> locale files and they are now all left intact.
>>> * Also the file translit_cyrillic is not longer needed and is omitted.
>>> * Edited below email, commit message.
>>> [...]
>>
>> I have tested this and, unfortunately, now this transliteration
>> works *only* in C locale, that is, only when no locale is set or when
>> it is explicitly set to C (C.UTF8, POSIX). It does not work when locale
>> is set to anything different, including en_US, ru_RU, etc.
>
> Good catch! Should we maybe split this into two patches, one for C and
> the other for "country" locales? They have different codes and
> functionality so it looks like it would be easier to keep focus.
That would probably make sense, the standard C/POSIX locale won't
support System A so it also narrows down solution alternatives with it.
(If the C.UTF-8 locale (see
https://sourceware.org/bugzilla/show_bug.cgi?id=17318) materializes one
day I'm not sure would transliteration be applicable in that context.)
> My understanding is that locale/C-translit.h.in is still the proper
> locale for the sole ASCII translit table. It is also the only solution
> for many use cases where there is no locale available (not compiled or
> not set).
Correct, as Siddhesh mentioned those rules will end up to the built-in
C/POSIX locale which is ASCII and will be used if no other locales are
available or set properly. The translit_* files won't affect to it.
> "Country" locales in localedata/locales/ can then have the exact same
> translit table included or they can have any other flavor - I don't see
> a problem here.
Indeed, and since those files are not limited to ASCII, perhaps we could
now reconsider the v9 approach for them, i.e., prefer System A if
possible, otherwise use System B / ASCII (just need to make sure that
the ASCII fall-back for them will match the built-in C ASCII rule)?
Thanks,
--
Marko Myllynen