This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Is it OK to write ASCII strings directly into locale source files?
* Carlos O'Donell:
> On 07/25/2017 02:20 AM, Mike FABIAN wrote:
>> Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> My only argument is that when you are forced to use <Uxxx> encoding it
>>> is empirically less likely you'll make a mistake. Like reading a sentence
>>> backwards to catch errors since it prevents your brain from filling in
>>> the missing information.
>>
>> But there are also many mistakes because somebody mistyped code points.
>> Several weird typos in things like month names look as if somebody
>> mistyped code points.
>
> Ultimately I defer to your judgement as localedata maintainer to create
> a workflow that is easy for you and benefits your work.
>
> However, I caution against throwing away the compatibility of our locales
> with POSIX, which doesn't seem to allow UTF-8 in the specification.
It does, to some extent:
| A character in the portable character set can be represented by the
| character itself, in which case the value of the character is
| implementation-defined. (Implementations may allow other characters
| to be represented as themselves, but such locale definitions are not
| portable.)
You'll need a very hostile interpretation to say that this doesn't
allow multi-byte character sequences in localedef input.
But I found this in the guts of localedef:
/* The standards leave it up to the implementation to decide
what to do with character which stand for themself. We
could jump through hoops to find out the value relative to
the charmap and the repertoire map, but instead we leave
it up to the locale definition author to write a better
definition. We assume here that every character which
stands for itself is encoded using ISO 8859-1. Using the
escape character is allowed. */
So we currently hard-code ISO 8859-1 (not UTF-8) to avoid the
bootstrapping problem.