This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30
- From: Marko Myllynen <myllynen at redhat dot com>
- To: Egor Kobylkin <egor at kobylkin dot com>, libc-alpha at sourceware dot org, libc-locales at sourceware dot org, Carlos O'Donell <carlos at redhat dot com>
- Cc: Rafal Luzynski <digitalfreak at lingonborough dot com>, Siddhesh Poyarekar <siddhesh at gotplt dot org>, Mike Fabian <mfabian at redhat dot com>
- Date: Thu, 14 Feb 2019 18:48:51 +0200
- Subject: Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30
- References: <firstname.lastname@example.org> <20180412224352.GB2911@altlinux.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com>
- Reply-to: Marko Myllynen <myllynen at redhat dot com>
Hi Carlos, Mike, Rafal,
It seems clear that you all are currently too busy to have a look at
this but would you have any estimate when you might be able to review
this so that we could consider merging?
FWIW, I chatted with Egor off-list and we're on the same page wrt the
following, hopefully this gives you a bit off jump start for this
subject when you have time to dig deeper:
1) Built-in C locale doesn't read/use any translit_* files and it can't
have any fallback mechanisms and it only supports ASCII so using GOST
7.79 System B in locale/C-translit.h.in (as per patch v12) would seem to
be the appropriate way to implement Cyrillic transliteration for the
built-in C locale (it adds some 8KB to the binary).
2) Other locales read/use translit_* files and with them fallbacks and
non-ASCII are possible so it would seem preferable to first try ISO 9 /
GOST 7.79 System A and only if that fails then use GOST 7.79 System B
(in which case the end result should match with the built-in C locale).
For this the translit_cyrillic file should be added (as per patch v9 +
changes mentioned in patches v10 and v12).
3) Individual locale files can then be updated to use translit_cyrillic
as appropriate (see patch v9) and language/national specific conventions
(e.g., SFS 4900 for fi_FI) can be applied on per-locale basis.
On 04/02/2019 09.14, Egor Kobylkin wrote:
> are you comfortable to pick this up again this month?
> I would really love to have a reliable action plan to get this committed
> for 2.30. Maybe cut out a subset that is undisputed and commit only that
> first. It looks kinda like an eternal moving target otherwise.
> for you reference:
> Egor Kobylkin
> On 09.01.19 21:03, Marko Myllynen wrote:
>> On 09/01/2019 02.46, Egor Kobylkin wrote:
>>> On 07.01.19 21:37, Marko Myllynen wrote:
>>>> On 05/01/2019 23.12, Egor Kobylkin wrote:
>>>>> Good catch! Should we maybe split this into two patches, one for C and
>>>>> the other for "country" locales? They have different codes and
>>>>> functionality so it looks like it would be easier to keep focus.
>>>> That would probably make sense, the standard C/POSIX locale won't
>>>> support System A so it also narrows down solution alternatives with it.
>>>>> "Country" locales in localedata/locales/ can then have the exact same
>>>>> translit table included or they can have any other flavor - I don't
>>>>> a problem here.
>>>> Indeed, and since those files are not limited to ASCII, perhaps we
>>>> now reconsider the v9 approach for them, i.e., prefer System A if
>>>> possible, otherwise use System B / ASCII (just need to make sure that
>>>> the ASCII fall-back for them will match the built-in C ASCII rule)?
>>> Happy to hear the split seems to be a clear cut one.
>>> How about I rename the "[PATCH v12]...[BZ #2872]" to "[PATCH v1]...
>>> C/POSIX [BZ #2872]" and the "[PATCH v9]" gets its own bug-report
>>> (number) and title for clarity in communication?
>> I'm not sure is a new BZ really needed for such an addition, perhaps a
>> NEWS entry might be more appropriate (with the full details explained in
>> the commit messages of course) but I'll leave this to others to decide.
>>> This way it would probably be easier to have the decision making process
>>> tied up for both patches (separately). We may want to get the v12 POSIX
>>> out of the door in 2.30 then and can take all the time we need to set up
>>> the rules for "Countries" locales as you need them to be.
>> Perhaps Rafal or Carlos have better suggestions but I would think we
>> could have a patch series where the patch 1/3 adds the C/POSIX locale
>> part (that would be what you posted as v12), then patch 2/3 adds
>> translit_cyrillic (based on your v9 so supports ISO 9.1995 / GOST 7.79
>> System A and GOST 7.79 System B as a fall-back (which would match the
>> C/POSIX rules)), and finally the patch 3/3 updates locales to use
>> translit_cyrillic as appropriate. But as said, Rafal or Carlos may have
>> alternative suggestions so it might be best to wait for their feedback
>> before doing anything yet (it's unfortunate you've had to do so many
>> iterations around this already but I think we've all learned something
>> during the process and the end result will be more correct than any of
>> the earlier versions).