This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

From: Keld Simonsen <keld at keldix dot com>
To: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
Cc: libc-locales at sourceware dot org
Date: Fri, 17 Nov 2017 01:39:01 +0200
Subject: Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
Authentication-results: sourceware.org; auth=none
References: <bug-22387-716@http.sourceware.org/bugzilla/> <bug-22387-716-yyc8wyqhYw@http.sourceware.org/bugzilla/>

On Wed, Nov 15, 2017 at 10:15:31AM +0000, maiku.fabian at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
> (In reply to Egmont Koblinger from comment #27)
> > (In reply to keld@keldix.com from comment #25)
> > 
> > > This commit is highly problematic, damaging the portablilty of glibc locales.
> > 
> > If this kind of portability is really a concern, someone could some up with
> > a script that converts from the new version to the old one. It could even be
> > integrated with the build system to the level where these generated files
> > are actually placed under BUILD and then further processed.
> 
> Yes, if that is really a concern, we could easily convert it to different
> formats.
> I really doubt that this can cause problems though. If the file contained
> ???<a>???, one still has to be able to read the ascii characters ???<???, ???a???, and ???>???
> to interpret the file, I don???t see anything which is lost by just writing ???a???
> instead. If one cannot read an ascii file, one would not be able to read the
> keywords in the file either. So if something else than ascii like EBCDIC
> is needed, one would need some conversion anyway. Using ???a??? instead of ???<a>???
> does not make such conversion any harder.

I have explained  earlier that not using symbolic character names will generate
wrong results in situations where the source and target coded character set have
different encodings of ascii characters. 

The locales as they have come from my hand even preserves portability when some 
characters in the ascii character set have different encodings, which happens
on EBCDICs with different national ebcdic character sets. These are still in use
on big banking and aviation systems AFAIK. 

As an editor of multiple ISO standards on POSIX/Linux locales I do strive for general specs
and portablility. I can understand that this is not an issue for glibc people. 
I just have been happy that glibc has been using the ISO specs, and that I as 
ISO editor could use the glibc specs in return. This is not the case anymore with the recent
patch. 

I do have a great concern for the readability of the locales. That is why I made
an elaborate set of symbolic character names, that were much easier to proofread
than the <uxxxx> names, such as the <a> and greek <a*> names, japanese kana, arabic,
hebrew etc. Thus the locales were both portable over almost all known platforms, and
readable to some extent.  I was quite happy when I saw that the Arabic name for the 
10th month was something like "octobr" - it meant that I as someone that could not
read arabic at all, could write and maintain an arabic locale, with some confidence.

Also, I cannot edit japanese or arabic characters in utf-8, as I don't know them, and 
I think this is also the case for many mauntainers or glibc locales. They may be fluent
in their own locale, but locales from other cultures may be beyond their capability
to edit in raw utf-8.

I wish that we could have some arrangement so that we can have mutual exchange again
of locale specs.

Best regards
keld

References:
- [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: claude at 2xlibre dot net
- [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: maiku.fabian at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]