This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
Re: [PATCH] Use Unicode code points for country_isbn
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Keld Simonsen <keld at keldix dot com>
- Cc: Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, <libc-locales at sourceware dot org>
- Date: Wed, 22 Jul 2015 20:02:23 +0000
- Subject: Re: [PATCH] Use Unicode code points for country_isbn
- Authentication-results: sourceware.org; auth=none
- References: <5571B8C2 dot 8000108 at redhat dot com> <20150609071130 dot GA26925 at domone> <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier> <20150721084006 dot GB29742 at www5 dot open-std dot org> <20150721092217 dot GG12267 at vapier> <20150721115852 dot GA24115 at rap dot rap dot dk> <alpine dot DEB dot 2 dot 10 dot 1507221719420 dot 21570 at digraph dot polyomino dot org dot uk> <20150722190228 dot GA18489 at www5 dot open-std dot org>
On Wed, 22 Jul 2015, Keld Simonsen wrote:
> > On the build system on which glibc is built, we can always assume that the
> > glibc sources are the exact sequences of octets provided by the glibc
> > project, not converted into another character set and without any
> > conversions of line endings. Furthermore, on any system using glibc and
> > executing tools such as localedef with the installed locale source files,
> > it can be assumed that those source files are the files shipped with
> > glibc, not those files after conversion into another character set. Use
> > of glibc source files after conversion into another character set is
> > outside the scope of the glibc project - glibc is not expected to build
> > with such converted source files.
>
> Sounds strange. glibc is the library for the GNU C language. Standard
No it's not. It's the C library for the GNU system. glibc has a range of
requirements, including ELF, TLS, an MMU, two's complement integers,
32-bit int, 32-bit or 64-bit long, 32-bit UTF-32 wchar_t, IEEE binary32
float, IEEE binary64 double, various GNU tools present on the build system
as documented in install.texi, ....
> ISO C is coded character set independent, as is also POSIX. Why would
> the glibc project not follow ISO C and POSIX design goals? Why would
Because glibc makes particular implementation choices in areas that are
implementation-defined. It's an implementation, not a meta-implementation
that tries to cover the range of permitted implementation choices.
Meta-implementations (at least of the language part of ISO C) exist, but
they exist in the field of formal systems used to reason about C programs.
> glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux
> and UNIX systems?
It's about 15-20 years since glibc was usable as a replacement C library
for systems with an existing native non-free C library. Those systems are
not relevant to glibc nowadays (Apple and Microsoft systems fail the basic
requirement of using ELF, which is assumed all over glibc). UTF-16 is
supported in iconv (only), just like EBCDIC. Non-UTF-8 locales are
supported, but deprecated (new non-UTF-8 locales should not be added, and
any existing non-UTF-8 locales should have a UTF-8 counterpart), and to be
usable in a POSIX-compliant way must have a character set that includes
ASCII.
Given sufficiently many GNU tools built on a non-GNU build system, it
should be possible to cross-compile glibc there - but localedef itself is
only ever linked against glibc and run on a system using glibc (the
cross-localedef functionality checked in to glibc is limited to allowing
one glibc system to generate locales for another system with the same
glibc version but a different endianness).
> > Now, it's true that the installed localedef utility should be usable in
> > locale A to generate locale B, for any pair (A, B) of installed locales -
> > rather than only being able to generate locales as part of the glibc build
> > / install process. If localedef interprets locale sources in the
> > character set of the locale in which it runs, that may mean the installed
> > locale sources do need to be in ASCII. How does localedef determine the
> > character set in which to interpret the textual locale source files?
>
> Yes, that is why we use UCS symbolic code points. I would then rather to be
"Yes" does not answer my question about how localedef determines the
character set of its input.
> fully consistent use UCS symbolic code points all the way thru a locale
> source, it is a bit more cumbersome, but I would rather be consistent.
I'd rather have some extension to allow a locale source file to declare
that it is in UTF-8, and then use UTF-8 throughout except for control
characters or combining characters used in isolation.
--
Joseph S. Myers
joseph@codesourcery.com