This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Use Unicode code points for country_isbn

From: Keld Simonsen <keld at keldix dot com>
To: Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, libc-locales at sourceware dot org
Date: Tue, 21 Jul 2015 13:58:52 +0200
Subject: Re: [PATCH] Use Unicode code points for country_isbn
Authentication-results: sourceware.org; auth=none
References: <5571B8C2 dot 8000108 at redhat dot com> <20150609071130 dot GA26925 at domone> <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier> <20150721084006 dot GB29742 at www5 dot open-std dot org> <20150721092217 dot GG12267 at vapier>

On Tue, Jul 21, 2015 at 05:22:17AM -0400, Mike Frysinger wrote:
> On 21 Jul 2015 10:40, keld@keldix.com wrote:
> > On Tue, Jul 21, 2015 at 04:18:40AM -0400, Mike Frysinger wrote:
> > > On 09 Jun 2015 13:12, Marko Myllynen wrote:
> > > > On 2015-06-09 10:11, Ond??ej Bílka wrote:
> > > > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote:
> > > > >> make country_isbn definitions consistent across locales by using
> > > > >> Unicode code points not numerals everywhere. The code in
> > > > >> locale/categories.def and locale/programs/ld-address.c already
> > > > >> handles strings.
> > > > >>
> > > > >> Please apply.
> > > > >
> > > > > Possible but why, when these are numbers which are easier to read than
> > > > > strings?
> > > > 
> > > > that's true, and I don't feel too strongly about this, but currently
> > > > some locales are using numbers and some are using Unicode code points so
> > > > there's a bit of inconsistency, also it's not that hard to read these
> > > > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last
> > > > digit matters.
> > > 
> > > i find many of the U markers pointlessly obscure, especially when they're used
> > > for characters that are in the ASCII standard.  if we're standardizing on UTF8
> > > encodings in general, why can't we convert these files as well ?  keep in mind
> > > that i'm ignorant of the tooling around these files ;).
> > 
> > The use of Unicode points helps making the locales portable, eg.
> > when crosscompiling for different architectures, including embedded systems, ebcdic
> > systems, utf-16 systems and utf8 systems, when you are on a different host platform.
> 
> i'm referring to the tools we use -- either inside of the source repo
> (i.e. ones we wrote/maintain), or external ones that operate on our
> files directly (i.e. gcc).  what actual problems do you see here ?
> vague references like "cross-compiling is magic" aren't really that
> interesting.

It would mean that you cannot use the locale sources for crosscompiling when using
some different character sets on the hosting and the target machines.
Eg if you are making embedded systems on IOS or Windows or other utf16 machines
for an utf8 target, or making stuff for android. Or the other way round if you are
omn an utf8 host and generate locales for a utf16 target such as a utf16 embedded 
system or an iphone or ipad system.

I suggest you use the POSIX character names instead, eg 12 becomes "<1><2>"

> keep in mind we already use (and agreed to standardize on) UTF8 in
> things like *.c and *.h and ChangeLog and READMEs and info pages.

That is not related. Of cause we have our sources in a specific encoding,
and when sources are moved between platforms (aka portability) the 
sources text may be converted from one representation to another, 
which happens eg. when you move our sources to an IOS or Windows platform.

Best regards
Keld

Follow-Ups:
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Keld Simonsen
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Joseph Myers

References:
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Mike Frysinger
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: keld
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Mike Frysinger

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]