This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Use Unicode code points for country_isbn
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: Keld Simonsen <keld at keldix dot com>, Marko Myllynen <myllynen at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, libc-locales at sourceware dot org
- Date: Fri, 24 Jul 2015 00:27:05 +0200
- Subject: Re: [PATCH] Use Unicode code points for country_isbn
- Authentication-results: sourceware.org; auth=none
- References: <5571B8C2 dot 8000108 at redhat dot com> <20150609071130 dot GA26925 at domone> <5576BC13 dot 5020001 at redhat dot com> <20150721081840 dot GE12267 at vapier> <20150721084006 dot GB29742 at www5 dot open-std dot org> <20150721092217 dot GG12267 at vapier> <20150721115852 dot GA24115 at rap dot rap dot dk> <alpine dot DEB dot 2 dot 10 dot 1507221719420 dot 21570 at digraph dot polyomino dot org dot uk> <20150722190228 dot GA18489 at www5 dot open-std dot org> <alpine dot DEB dot 2 dot 10 dot 1507221951100 dot 19567 at digraph dot polyomino dot org dot uk>
On Wed, Jul 22, 2015 at 08:02:23PM +0000, Joseph Myers wrote:
> > > Now, it's true that the installed localedef utility should be usable in
> > > locale A to generate locale B, for any pair (A, B) of installed locales -
> > > rather than only being able to generate locales as part of the glibc build
> > > / install process. If localedef interprets locale sources in the
> > > character set of the locale in which it runs, that may mean the installed
> > > locale sources do need to be in ASCII. How does localedef determine the
> > > character set in which to interpret the textual locale source files?
> >
> > Yes, that is why we use UCS symbolic code points. I would then rather to be
>
> "Yes" does not answer my question about how localedef determines the
> character set of its input.
>
> > fully consistent use UCS symbolic code points all the way thru a locale
> > source, it is a bit more cumbersome, but I would rather be consistent.
>
> I'd rather have some extension to allow a locale source file to declare
> that it is in UTF-8, and then use UTF-8 throughout except for control
> characters or combining characters used in isolation.
>
I second that. It would be technically easy to do, so its mostly matter
of selecting proper interface. If we require some utf8 locale (if we
decide for C.UTF8 then use it otherwise for example en_US.
Then it would be matter of selecting different locale on files marked
say by having UTF8 in first line. Sample implementation would be:
fgets (first_line, 5, locale);
if (!memcmp (first_line, "UTF8", 4))
setlocale(LC_ALL,"en_US.UTF8");
else
/* unget first line. */