[PATCH] Use Unicode code points for country_isbn

Wed Jul 22 19:34:00 GMT 2015

On Wed, Jul 22, 2015 at 05:25:04PM +0000, Joseph Myers wrote:
> On Tue, 21 Jul 2015, Keld Simonsen wrote:
> 
> > It would mean that you cannot use the locale sources for crosscompiling 
> > when using some different character sets on the hosting and the target 
> > machines. Eg if you are making embedded systems on IOS or Windows or 
> > other utf16 machines for an utf8 target, or making stuff for android. Or 
> > the other way round if you are omn an utf8 host and generate locales for 
> > a utf16 target such as a utf16 embedded system or an iphone or ipad 
> > system.
> 
> On the build system on which glibc is built, we can always assume that the 
> glibc sources are the exact sequences of octets provided by the glibc 
> project, not converted into another character set and without any 
> conversions of line endings.  Furthermore, on any system using glibc and 
> executing tools such as localedef with the installed locale source files, 
> it can be assumed that those source files are the files shipped with 
> glibc, not those files after conversion into another character set.  Use 
> of glibc source files after conversion into another character set is 
> outside the scope of the glibc project - glibc is not expected to build 
> with such converted source files.

Sounds strange. glibc is the library for the GNU C language. Standard ISO C
is coded character set independent, as is also POSIX. Why would the glibc project 
not follow ISO C and POSIX design goals? Why would glibc exclude itself
from Apple and Microsoft (utf16) and non-utf8 Linux and UNIX systems? 

Maybe we should clone glibc to make it available on other platforms
than those using utf8. Or maybe you are not correct. I have not been watching
the glibc project close enough to tell.

> Now, it's true that the installed localedef utility should be usable in 
> locale A to generate locale B, for any pair (A, B) of installed locales - 
> rather than only being able to generate locales as part of the glibc build 
> / install process.  If localedef interprets locale sources in the 
> character set of the locale in which it runs, that may mean the installed 
> locale sources do need to be in ASCII.  How does localedef determine the 
> character set in which to interpret the textual locale source files?

Yes, that is why we use UCS symbolic code points. I would then rather to be
fully consistent use UCS symbolic code points all the way thru a locale source,
it is a bit more cumbersome, but I would rather be consistent. And it would facilitate
the crosscompiling I wrote about. I don't think there is a mix of locales where it
matters on Linux boxes. Oh well, some thinkable scenarios:
Apple or Windosw users on a linux box, linux users on apple or Windows boxes,
Some mix with EBCDIC - more unlikely, but still thinkable is a big
mainfame and number cruncher environment, the mainframe being IBM mainframe
running VM/CMS and the number cruncher being a linux supercomputer, eg in
a financial institution.

Keld