[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
keld at keldix dot com
sourceware-bugzilla@sourceware.org
Thu Nov 9 10:19:00 GMT 2017
https://sourceware.org/bugzilla/show_bug.cgi?id=22387
--- Comment #20 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
>
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
>
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.
Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if
you
move sources from a non ascii-compatible system to an ascii-compatible system.
This process can be done automatically using eg iconv.
> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.
No they are not unicode (or UCS) codepoints. When you compile the locale into a
binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character
names get
encoded according to the EBCDIC encoding applied by localedef -f option
question.
> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.
No, only one level of conversion is needed and that can be fully automated.
> This does not become any harder or any more complicated by allowing plain ASCII
> characters.
Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa.
best regards
Keld
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libc-locales
mailing list