This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Is it OK to write ASCII strings directly into locale source files?
On 07/25/2017 02:20 AM, Mike FABIAN wrote:
> Carlos O'Donell <carlos@redhat.com> wrote:
>
>> My only argument is that when you are forced to use <Uxxx> encoding it
>> is empirically less likely you'll make a mistake. Like reading a sentence
>> backwards to catch errors since it prevents your brain from filling in
>> the missing information.
>
> But there are also many mistakes because somebody mistyped code points.
> Several weird typos in things like month names look as if somebody
> mistyped code points.
Ultimately I defer to your judgement as localedata maintainer to create
a workflow that is easy for you and benefits your work.
However, I caution against throwing away the compatibility of our locales
with POSIX, which doesn't seem to allow UTF-8 in the specification.
I would suggest the following:
(a) Documentation:
File an Austin bug to adjust the text of the standard to allow what
we want. Effectively documenting the defacto glibc standard which
uses UTF-8.
(b) New process:
Post-process the locale source before commit, and enforce, that there
is an auto-generated comment that contains either the UTF-8 or code
points, for the author to review before commit. If we wrote UTF-8
in a special markup comment, and auto-generated the locale entry
with code points then we would remain mostly compatible with POSIX
and what we have today (less churn for user tools).
--
Cheers,
Carlos.