[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

egmont at gmail dot com sourceware-bugzilla@sourceware.org
Thu Nov 2 20:40:00 GMT 2017


https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #6 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Andreas Schwab from comment #5)

> See
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.
> html#tag_07_03> for the full rules.

Bullet point 2 here says "Within a string, the double-quote character, the
escape character, and the right angle bracket character shall be escaped [...]"

Why not the left angle bracket too? Otherwise you can't tell for sure whether
"<U+0020>" stands for a space, or for literal
lessthan-you-plus-oh-oh-two-oh-greaterthan.

I think it doesn't hurt to remain a bit safer with special characters, e.g.
escape the comma, semicolon, less-than, greater-than, backshash, and whatever
the escape character (typically overridden to slash in locale files)
everywhere.

---

On the other hand, what about non-ASCII characters? Are they allowed as raw
UTF-8, or do they still need to be escaped? Allowing raw UTF-8, such as a
weekday name of "hétfő" rather than "h<U00E9>tf<U0151>" would highly improve
readability of the file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libc-locales mailing list