This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Tue, 14 Nov 2017 13:19:49 +0000
Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
Auto-submitted: auto-generated
References: <bug-22387-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #27 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #25)

> This commit is highly problematic, damaging the portablilty of glibc locales.

If this kind of portability is really a concern, someone could some up with a
script that converts from the new version to the old one. It could even be
integrated with the build system to the level where these generated files are
actually placed under BUILD and then further processed.

I wish the current change even pushed it further, towards raw UTF-8 at least
for printable and "non-problematic" (to some vague, arbitrary definition)
characters.

I have on a few occasions made some minor edits to effected parts of a locale
file, dealing with the <Uxxxx> notation was a nightmare. Working with a string
like "h<U00E9>tf<U0151>" is already much better than
"<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.

Source code is meant to be human-readable, which all these <Uxxxx>s is most
certainly not.

There's a reason people write code like
  printf("Hello world!\n");
and not
  printf("\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a");

If for whatever reason the latter, hard-to-read (and hard-to-write) form is
required, it should be auto-generated from the former, easy-to-read (and
easy-to-write) one.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: claude at 2xlibre dot net

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]