This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Use Unicode code points for country_isbn
- From: Keld Simonsen <keld at keldix dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, libc-locales at sourceware dot org
- Date: Sat, 25 Jul 2015 15:18:13 +0200
- Subject: Re: [PATCH] Use Unicode code points for country_isbn
- Authentication-results: sourceware.org; auth=none
- References: <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier> <20150721084006 dot GB29742 at www5 dot open-std dot org> <20150721092217 dot GG12267 at vapier> <20150721115852 dot GA24115 at rap dot rap dot dk> <alpine dot DEB dot 2 dot 10 dot 1507221719420 dot 21570 at digraph dot polyomino dot org dot uk> <20150722190228 dot GA18489 at www5 dot open-std dot org> <alpine dot DEB dot 2 dot 10 dot 1507221951100 dot 19567 at digraph dot polyomino dot org dot uk> <20150724104349 dot GC10515 at rap dot rap dot dk> <alpine dot DEB dot 2 dot 10 dot 1507241504250 dot 5465 at digraph dot polyomino dot org dot uk>
On Fri, Jul 24, 2015 at 03:11:15PM +0000, Joseph Myers wrote:
> On Fri, 24 Jul 2015, Keld Simonsen wrote:
> > > Because glibc makes particular implementation choices in areas that are
> > > implementation-defined. It's an implementation, not a meta-implementation
> > > that tries to cover the range of permitted implementation choices.
> > > Meta-implementations (at least of the language part of ISO C) exist, but
> > > they exist in the field of formal systems used to reason about C programs.
> > I am also active in C standardization. I think it is a good goal to not
> > deviate and restrict an implementalton more than necessary. And at least
> > not restrict it further than already implemented. That would lead to a loss
> > of functionality.
> The point of things being implementation-defined is to allow
> implementations flexibility in what is convenient for those
> implementations. glibc duly uses that flexibility to adopt particular
> choices for implementation-defined behavior (some depending on the
> architecture, but most being globally fixed for all glibc configurations,
> so that all glibc code is free to rely on those choices).
Yes, of cause implementation defined allowance is to be used.
I then have another hat on, as I am involved in writing the standards.
I have to have a generic point of view, and also from the users point of view
implementation defined items are no good for portability, so you
cannot be sure of your independence. You are bound to the implementation
of which you used the implementation defined specs.
I don't know about the goals of the glibc project, but there are a number
of possibilities to get out to a bigger audience. Actually the locales are mostly used
for end user apps, and glibc has a end user audience, that could be made bigger.
Eg both the Apple end user community and the Android user community
are way bigger than the glibc end user community. And they could be a target for
at least glibc locales. I believe both Apple and Google use POSIX derived localization,
including the locale model. I, at least as the editor of ISO TR 30122, need to have those
communities in sight. I have been cooperating with the glibc community, especially
Ulrich, but also with FSF as I have donated many locale and charmap specs to them.
And I am usig glibc i18n locale as the locale source in the standard.
So I would welcome if glibc adhered to the design goals of character set independence,
that both POSIX and 30112 have, a design goal also shared by Unicode Inc.
> > I thought cygwin was a GNU implementation for windows, and that it also
> > implemented glibc. I now understand that the cygwin libc is different from
> > glibc. But how different? Do they use glibc locales, or are they able to?
> I don't think there's any use of glibc locales by newlib as Cygwin's libc.
I believe if that is true, then they use something based on my earlier locales,
that I released to X/Open many years ago. Those were widely used in the industry,
as they were the only and most comprehensive locales around, freely available.
They also were the basis for many of the glibc locales. I think there is a potential
for glibc locales to take that position today.
> > I would like the glibc locales to also be usable in other libc environments.
> > Most of all because they IMHO are the most comprehensive set of locales available.
> > So that would benefit users also outside glibc. Why not have this in mind
> > also for our project?
> I think CLDR is more likely to be the most comprehensive set of locales
> (it certainly claims to be "the largest and most extensive standard
> repository of locale data available"), and unlike glibc's locales is
> intended for wider use. Even if we did want wider use for glibc's locales
> (beyond use by glibc's locale-dependent functions after having been
> compiled into binary form by glibc's localedef program from the same
> version of glibc) I think we should still say: UTF-8 is the way of the
> present and future, other multibyte character sets are legacy. And, just
> as we require a range of GNU tools to build glibc, so we can rely on
> features of one part of the GNU system when working on another part, so we
> should require GNU localedef to build glibc's locales.
CLDR is not POSIX like locales, they are in XML. Also I believe they
are not in the same quality as the glibc locales. I for one had an experience with Unicode that
they would not take my specs, even if I represented Danish Standards. The result
was that their Danish spec did not adhere to Danish Standards and to Danish
official orthography rules. I then gave up contact with them.
> > > I'd rather have some extension to allow a locale source file to declare
> > > that it is in UTF-8, and then use UTF-8 throughout except for control
> > > characters or combining characters used in isolation.
> > That would make it difficult to maintain in environments that is not using utf8.
> It would make the locales easier to maintain for people using UTF-8, the
> number of which (among people concerned with i18n) can be presumed to be
> much greater than the number using legacy character sets.
Yes, but you are excluding some communities. So: easier for the majority,
impossible for a number of diverse minorities, which actually has the potential
to be much larger than the current user base.