[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

keld at keldix dot com sourceware-bugzilla@sourceware.org
Thu Nov 16 23:39:00 GMT 2017


https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #31 from keld at keldix dot com <keld at keldix dot com> ---
On Wed, Nov 15, 2017 at 10:15:31AM +0000, maiku.fabian at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
> (In reply to Egmont Koblinger from comment #27)
> > (In reply to keld@keldix.com from comment #25)
> > 
> > > This commit is highly problematic, damaging the portablilty of glibc locales.
> > 
> > If this kind of portability is really a concern, someone could some up with
> > a script that converts from the new version to the old one. It could even be
> > integrated with the build system to the level where these generated files
> > are actually placed under BUILD and then further processed.
> 
> Yes, if that is really a concern, we could easily convert it to different
> formats.
> I really doubt that this can cause problems though. If the file contained
> ???<a>???, one still has to be able to read the ascii characters ???<???, ???a???, and ???>???
> to interpret the file, I don???t see anything which is lost by just writing ???a???
> instead. If one cannot read an ascii file, one would not be able to read the
> keywords in the file either. So if something else than ascii like EBCDIC
> is needed, one would need some conversion anyway. Using ???a??? instead of ???<a>???
> does not make such conversion any harder.

I have explained  earlier that not using symbolic character names will generate
wrong results in situations where the source and target coded character set
have
different encodings of ascii characters. 

The locales as they have come from my hand even preserves portability when some 
characters in the ascii character set have different encodings, which happens
on EBCDICs with different national ebcdic character sets. These are still in
use
on big banking and aviation systems AFAIK. 

As an editor of multiple ISO standards on POSIX/Linux locales I do strive for
general specs
and portablility. I can understand that this is not an issue for glibc people. 
I just have been happy that glibc has been using the ISO specs, and that I as 
ISO editor could use the glibc specs in return. This is not the case anymore
with the recent
patch. 

I do have a great concern for the readability of the locales. That is why I
made
an elaborate set of symbolic character names, that were much easier to
proofread
than the <uxxxx> names, such as the <a> and greek <a*> names, japanese kana,
arabic,
hebrew etc. Thus the locales were both portable over almost all known
platforms, and
readable to some extent.  I was quite happy when I saw that the Arabic name for
the 
10th month was something like "octobr" - it meant that I as someone that could
not
read arabic at all, could write and maintain an arabic locale, with some
confidence.

Also, I cannot edit japanese or arabic characters in utf-8, as I don't know
them, and 
I think this is also the case for many mauntainers or glibc locales. They may
be fluent
in their own locale, but locales from other cultures may be beyond their
capability
to edit in raw utf-8.

I wish that we could have some arrangement so that we can have mutual exchange
again
of locale specs.

Best regards
keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libc-locales mailing list