[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

keld at keldix dot com sourceware-bugzilla@sourceware.org
Thu Nov 9 10:19:00 GMT 2017


https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #20 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
> 
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.

Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if
you
move sources from a non ascii-compatible system to an ascii-compatible system.

This process can be done automatically using eg iconv.

> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.

No they are not unicode (or UCS) codepoints. When you compile the locale into a
binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character
names get
encoded according to the EBCDIC encoding applied by localedef -f option
question.

> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.

No, only one level of conversion is needed and that can be fully automated.

> This does not become any harder or any more complicated by allowing plain ASCII
> characters.

Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa. 

best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libc-locales mailing list