[PATCH] Use Unicode code points for country_isbn
Joseph Myers
joseph@codesourcery.com
Wed Jul 22 20:04:00 GMT 2015
On Wed, 22 Jul 2015, Keld Simonsen wrote:
> > On the build system on which glibc is built, we can always assume that the
> > glibc sources are the exact sequences of octets provided by the glibc
> > project, not converted into another character set and without any
> > conversions of line endings. Furthermore, on any system using glibc and
> > executing tools such as localedef with the installed locale source files,
> > it can be assumed that those source files are the files shipped with
> > glibc, not those files after conversion into another character set. Use
> > of glibc source files after conversion into another character set is
> > outside the scope of the glibc project - glibc is not expected to build
> > with such converted source files.
>
> Sounds strange. glibc is the library for the GNU C language. Standard
No it's not. It's the C library for the GNU system. glibc has a range of
requirements, including ELF, TLS, an MMU, two's complement integers,
32-bit int, 32-bit or 64-bit long, 32-bit UTF-32 wchar_t, IEEE binary32
float, IEEE binary64 double, various GNU tools present on the build system
as documented in install.texi, ....
> ISO C is coded character set independent, as is also POSIX. Why would
> the glibc project not follow ISO C and POSIX design goals? Why would
Because glibc makes particular implementation choices in areas that are
implementation-defined. It's an implementation, not a meta-implementation
that tries to cover the range of permitted implementation choices.
Meta-implementations (at least of the language part of ISO C) exist, but
they exist in the field of formal systems used to reason about C programs.
> glibc exclude itself from Apple and Microsoft (utf16) and non-utf8 Linux
> and UNIX systems?
It's about 15-20 years since glibc was usable as a replacement C library
for systems with an existing native non-free C library. Those systems are
not relevant to glibc nowadays (Apple and Microsoft systems fail the basic
requirement of using ELF, which is assumed all over glibc). UTF-16 is
supported in iconv (only), just like EBCDIC. Non-UTF-8 locales are
supported, but deprecated (new non-UTF-8 locales should not be added, and
any existing non-UTF-8 locales should have a UTF-8 counterpart), and to be
usable in a POSIX-compliant way must have a character set that includes
ASCII.
Given sufficiently many GNU tools built on a non-GNU build system, it
should be possible to cross-compile glibc there - but localedef itself is
only ever linked against glibc and run on a system using glibc (the
cross-localedef functionality checked in to glibc is limited to allowing
one glibc system to generate locales for another system with the same
glibc version but a different endianness).
> > Now, it's true that the installed localedef utility should be usable in
> > locale A to generate locale B, for any pair (A, B) of installed locales -
> > rather than only being able to generate locales as part of the glibc build
> > / install process. If localedef interprets locale sources in the
> > character set of the locale in which it runs, that may mean the installed
> > locale sources do need to be in ASCII. How does localedef determine the
> > character set in which to interpret the textual locale source files?
>
> Yes, that is why we use UCS symbolic code points. I would then rather to be
"Yes" does not answer my question about how localedef determines the
character set of its input.
> fully consistent use UCS symbolic code points all the way thru a locale
> source, it is a bit more cumbersome, but I would rather be consistent.
I'd rather have some extension to allow a locale source file to declare
that it is in UTF-8, and then use UTF-8 throughout except for control
characters or combining characters used in isolation.
--
Joseph S. Myers
joseph@codesourcery.com
More information about the Libc-locales
mailing list