This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Is it OK to write ASCII strings directly into locale source files?


* Carlos O'Donell:

> On 07/25/2017 02:20 AM, Mike FABIAN wrote:
>> Carlos O'Donell <carlos@redhat.com> wrote:
>> 
>>> My only argument is that when you are forced to use <Uxxx> encoding it
>>> is empirically less likely you'll make a mistake. Like reading a sentence
>>> backwards to catch errors since it prevents your brain from filling in
>>> the missing information.
>> 
>> But there are also many mistakes because somebody mistyped code points.
>> Several weird typos in things like month names look as if somebody
>> mistyped code points.
>
> Ultimately I defer to your judgement as localedata maintainer to create
> a workflow that is easy for you and benefits your work.
>
> However, I caution against throwing away the compatibility of our locales
> with POSIX, which doesn't seem to allow UTF-8 in the specification.

It does, to some extent:

| A character in the portable character set can be represented by the
| character itself, in which case the value of the character is
| implementation-defined. (Implementations may allow other characters
| to be represented as themselves, but such locale definitions are not
| portable.)

You'll need a very hostile interpretation to say that this doesn't
allow multi-byte character sequences in localedef input.

But I found this in the guts of localedef:

	      /* The standards leave it up to the implementation to decide
		 what to do with character which stand for themself.  We
		 could jump through hoops to find out the value relative to
		 the charmap and the repertoire map, but instead we leave
		 it up to the locale definition author to write a better
		 definition.  We assume here that every character which
		 stands for itself is encoded using ISO 8859-1.  Using the
		 escape character is allowed.  */

So we currently hard-code ISO 8859-1 (not UTF-8) to avoid the
bootstrapping problem.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]