This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Tue, 14 Nov 2017 13:19:49 +0000
- Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- Auto-submitted: auto-generated
- References: <bug-22387-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=22387
--- Comment #27 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #25)
> This commit is highly problematic, damaging the portablilty of glibc locales.
If this kind of portability is really a concern, someone could some up with a
script that converts from the new version to the old one. It could even be
integrated with the build system to the level where these generated files are
actually placed under BUILD and then further processed.
I wish the current change even pushed it further, towards raw UTF-8 at least
for printable and "non-problematic" (to some vague, arbitrary definition)
characters.
I have on a few occasions made some minor edits to effected parts of a locale
file, dealing with the <Uxxxx> notation was a nightmare. Working with a string
like "h<U00E9>tf<U0151>" is already much better than
"<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.
Source code is meant to be human-readable, which all these <Uxxxx>s is most
certainly not.
There's a reason people write code like
printf("Hello world!\n");
and not
printf("\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a");
If for whatever reason the latter, hard-to-read (and hard-to-write) form is
required, it should be auto-generated from the former, easy-to-read (and
easy-to-write) one.
--
You are receiving this mail because:
You are on the CC list for the bug.