This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Use Unicode code points for country_isbn

From: Joseph Myers <joseph at codesourcery dot com>
To: Keld Simonsen <keld at keldix dot com>
Cc: Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, <libc-locales at sourceware dot org>
Date: Wed, 22 Jul 2015 20:02:23 +0000
Subject: Re: [PATCH] Use Unicode code points for country_isbn
Authentication-results: sourceware.org; auth=none
References: <5571B8C2 dot 8000108 at redhat dot com> <20150609071130 dot GA26925 at domone> <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier> <20150721084006 dot GB29742 at www5 dot open-std dot org> <20150721092217 dot GG12267 at vapier> <20150721115852 dot GA24115 at rap dot rap dot dk> <alpine dot DEB dot 2 dot 10 dot 1507221719420 dot 21570 at digraph dot polyomino dot org dot uk> <20150722190228 dot GA18489 at www5 dot open-std dot org>

On Wed, 22 Jul 2015, Keld Simonsen wrote:

> > On the build system on which glibc is built, we can always assume that the 
> > glibc sources are the exact sequences of octets provided by the glibc 
> > project, not converted into another character set and without any 
> > conversions of line endings.  Furthermore, on any system using glibc and 
> > executing tools such as localedef with the installed locale source files, 
> > it can be assumed that those source files are the files shipped with 
> > glibc, not those files after conversion into another character set.  Use 
> > of glibc source files after conversion into another character set is 
> > outside the scope of the glibc project - glibc is not expected to build 
> > with such converted source files.
> 
> Sounds strange. glibc is the library for the GNU C language. Standard 

No it's not.  It's the C library for the GNU system.  glibc has a range of 
requirements, including ELF, TLS, an MMU, two's complement integers, 
32-bit int, 32-bit or 64-bit long, 32-bit UTF-32 wchar_t, IEEE binary32 
float, IEEE binary64 double, various GNU tools present on the build system 
as documented in install.texi, ....

> ISO C is coded character set independent, as is also POSIX. Why would 
> the glibc project not follow ISO C and POSIX design goals? Why would 

Because glibc makes particular implementation choices in areas that are 
implementation-defined.  It's an implementation, not a meta-implementation 
that tries to cover the range of permitted implementation choices.  
Meta-implementations (at least of the language part of ISO C) exist, but 
they exist in the field of formal systems used to reason about C programs.

> glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux 
> and UNIX systems?

It's about 15-20 years since glibc was usable as a replacement C library 
for systems with an existing native non-free C library.  Those systems are 
not relevant to glibc nowadays (Apple and Microsoft systems fail the basic 
requirement of using ELF, which is assumed all over glibc).  UTF-16 is 
supported in iconv (only), just like EBCDIC.  Non-UTF-8 locales are 
supported, but deprecated (new non-UTF-8 locales should not be added, and 
any existing non-UTF-8 locales should have a UTF-8 counterpart), and to be 
usable in a POSIX-compliant way must have a character set that includes 
ASCII.

Given sufficiently many GNU tools built on a non-GNU build system, it 
should be possible to cross-compile glibc there - but localedef itself is 
only ever linked against glibc and run on a system using glibc (the 
cross-localedef functionality checked in to glibc is limited to allowing 
one glibc system to generate locales for another system with the same 
glibc version but a different endianness).

> > Now, it's true that the installed localedef utility should be usable in 
> > locale A to generate locale B, for any pair (A, B) of installed locales - 
> > rather than only being able to generate locales as part of the glibc build 
> > / install process.  If localedef interprets locale sources in the 
> > character set of the locale in which it runs, that may mean the installed 
> > locale sources do need to be in ASCII.  How does localedef determine the 
> > character set in which to interpret the textual locale source files?
> 
> Yes, that is why we use UCS symbolic code points. I would then rather to be

"Yes" does not answer my question about how localedef determines the 
character set of its input.

> fully consistent use UCS symbolic code points all the way thru a locale 
> source, it is a bit more cumbersome, but I would rather be consistent. 

I'd rather have some extension to allow a locale source file to declare 
that it is in UTF-8, and then use UTF-8 throughout except for control 
characters or combining characters used in isolation.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: OndÅej BÃlka
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Keld Simonsen

References:
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Mike Frysinger
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: keld
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Mike Frysinger
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Keld Simonsen
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Joseph Myers
- Re: [PATCH] Use Unicode code points for country_isbn
  - From: Keld Simonsen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]