This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] en_CA, es_AR, es_ES: Define yesstr and nostr.


On Mon, Apr 08, 2013 at 01:23:59AM +0200, Petr Baudis wrote:
>   Hi!
> 
> On Mon, Apr 08, 2013 at 01:14:51AM +0200, Keld Simonsen wrote:
> > On Sun, Apr 07, 2013 at 11:02:06PM +0200, Petr Baudis wrote:
> > >   (Though I'm not particularly fond of having the ASCII contents of the
> > > datapoint sequence repeated in the comment, as all data duplication adds
> > > a potential for inconsistencies. Ideally, we would just actually write
> > > the characters right in the values instead of the codepoints; I didn't
> > > find any technical reason why to insist on the <U...> syntax for all
> > > characters. But then again, I'm personally unlikely to gather the
> > > momentum to do such a change, mainly to verify that it really is 100%
> > > safe.)
> > 
> > The locales are character set independent, so they will run with utf-8, iso-8859-1, iso-8859-15
> > and even EBCDIC. They are written in ASCII only, to better the portability between systems with
> > different character sets.
> 
>   But itt's 2013. I claim that portability of locale source files to
> EBCDIC is totally irrelevant in glibc and whoever cares should bear the
> burden of writing the conversion tools.

No, it is not. We are discussing EBCDIC in the Austin group.
Anyway we need still to be character set independent. Then the EBCDIC support comes for free.

>   I don't think it would be a big fuss if we just UTF8-encoded locale
> files, but even if we only embrace the ASCII (!) and substitute 7bit
> codepoint markups with the actual ASCII characters, that would be a
> huge practical step forward already.

We should not just do UTF-8, that would be a major mistake.
We have embedded systems, we have UTF-16, we have 8-bit systems, EBCDIC.

> 
>   The only thing is, I'm not 100% sure if there are any other tools
> looking at the locale source files that would break if we did this,
> and if it's a big deal to break these tools in case there are any.
> 
> > Originally I wrote many locales using some mnemonic scheme, that
> > made them easier to read, such as <A> for <U0041>, <B> for <U0042>, <b> for <U0062> etc,
> > but Ulrich Drepper did not like that and recoded all the locales to use the <Uxxxx> notation.
> > Some of the mnemonics were a bit complex, but IMHO they were far easier to
> > proofread than the <Uxxxx> notation, and some came directly from the POSIX standard.
> > They were documented in the POSIX.2 standard from 1992, and also in TR 14652.
> 
>   Indeed, I have seen some of these locale files I think. But if you
> mean <U0041>, why write even <A> if you can write A?

To be character set independent. 

Best regards
keld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]