This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Use Unicode code points for country_isbn
- From: keld at keldix dot com
- To: Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, libc-locales at sourceware dot org
- Date: Tue, 21 Jul 2015 10:40:06 +0200
- Subject: Re: [PATCH] Use Unicode code points for country_isbn
- Authentication-results: sourceware.org; auth=none
- References: <5571B8C2 dot 8000108 at redhat dot com> <20150609071130 dot GA26925 at domone> <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier>
On Tue, Jul 21, 2015 at 04:18:40AM -0400, Mike Frysinger wrote:
> On 09 Jun 2015 13:12, Marko Myllynen wrote:
> > On 2015-06-09 10:11, Ond??ej Bílka wrote:
> > > On Fri, Jun 05, 2015 at 05:57:06PM +0300, Marko Myllynen wrote:
> > >> make country_isbn definitions consistent across locales by using
> > >> Unicode code points not numerals everywhere. The code in
> > >> locale/categories.def and locale/programs/ld-address.c already
> > >> handles strings.
> > >>
> > >> Please apply.
> > >
> > > Possible but why, when these are numbers which are easier to read than
> > > strings?
> >
> > that's true, and I don't feel too strongly about this, but currently
> > some locales are using numbers and some are using Unicode code points so
> > there's a bit of inconsistency, also it's not that hard to read these
> > once one sees that e.g. 12 becomes "<U0031><U0032>" i.e. only the last
> > digit matters.
>
> i find many of the U markers pointlessly obscure, especially when they're used
> for characters that are in the ASCII standard. if we're standardizing on UTF8
> encodings in general, why can't we convert these files as well ? keep in mind
> that i'm ignorant of the tooling around these files ;).
The use of Unicode points helps making the locales portable, eg.
when crosscompiling for different architectures, including embedded systems, ebcdic
systems, utf-16 systems and utf8 systems, when you are on a different host platform.
For the ASCII characters one could use the symbolic character name from the
POSIX locale. They are much more readable than the Unicode code points, IMHO.
Best regards
Keld