Unicode 3.2 support (4)
Bruno Haible
bruno@clisp.org
Thu Apr 18 16:02:00 GMT 2002
Anthony Fok writes:
> 1. GB18030 is intended to be an UTF: Just as UTF-8 is ASCII-compatible
> and can map to all unassigned-yet-legal codepoints in Unicode, so is
> GB18030 GB2312/GBK-compatible and can map to all unassigned-yet-legal
> codepoints in Unicode.
Currently (i.e. with or without yesterday's proposed patch), the
GB18030 converter has the following problems:
a) It doesn't treat unassigned codepoints < 0x10000, thus violating
Anthony's request 1.
b) It treats all non-ASCII characters as having width 2, i.e. not
only the characters that have "ambiguous width" in Unicode 3.1/3.2,
but even the zero-width characters! This should be enough to make
GB18030 unusable in all terminal emulators.
Ulrich, can you mention the rationale of this patch to the width
table, from Yu Shao, that you accepted in January? I cannot find it
in the public archives.
c) (a) makes an artificial distinction between characters < 0x10000
and >= 0x10000 in Unicode.
Bruno
More information about the Libc-alpha
mailing list