This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: libc-locales at sourceware dot org
Date: Wed, 15 Nov 2017 10:15:31 +0000
Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
Auto-submitted: auto-generated
References: <bug-22387-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Egmont Koblinger from comment #27)
> (In reply to keld@keldix.com from comment #25)
> 
> > This commit is highly problematic, damaging the portablilty of glibc locales.
> 
> If this kind of portability is really a concern, someone could some up with
> a script that converts from the new version to the old one. It could even be
> integrated with the build system to the level where these generated files
> are actually placed under BUILD and then further processed.

Yes, if that is really a concern, we could easily convert it to different
formats.
I really doubt that this can cause problems though. If the file contained
“<a>”, one still has to be able to read the ascii characters “<”, “a”, and “>”
to interpret the file, I don’t see anything which is lost by just writing “a”
instead. If one cannot read an ascii file, one would not be able to read the
keywords in the file either. So if something else than ascii like EBCDIC
is needed, one would need some conversion anyway. Using “a” instead of “<a>”
does not make such conversion any harder.

> I wish the current change even pushed it further, towards raw UTF-8 at least
> for printable and "non-problematic" (to some vague, arbitrary definition)
> characters.

I agree. In the long run this would be even better. Readability of the
source is useful. Let’s see what our experiences with using ascii directly
are, if no problems occur we can think about using UTF-8 for “non-problematic”
characters.

> I have on a few occasions made some minor edits to effected parts of a
> locale file, dealing with the <Uxxxx> notation was a nightmare. Working with
> a string like "h<U00E9>tf<U0151>" is already much better than
> "<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.

Yes, I also found the <Uxxxx> annoying when browsing the files, it
makes it much harder to spot errors.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Follow-Ups:
- Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: Keld Simonsen

References:
- [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: claude at 2xlibre dot net

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]