This is the mail archive of the
mailing list for the glibc project.
Re: New GB18030 gconv module for glibc (from ThizLinux Laboratory)
- From: Yu Shao <yshao at redhat dot com>
- To: Markus Scherer <markus dot scherer at us dot ibm dot com>
- Cc: Ulrich Drepper <drepper at redhat dot com>, libc-alpha at sources dot redhat dot com, kevin at thizlinux dot com, fai at thizlinux dot com, sunnygu at thizgroup dot com, suzhe at gnuchina dot org, Markus Scherer <markus dot scherer at us dot ibm dot com>, Bruno Haible <haible at ilog dot fr>
- Date: Fri, 18 Jan 2002 10:47:38 +1000
- Subject: Re: New GB18030 gconv module for glibc (from ThizLinux Laboratory)
- References: <OFA37FFE2D.E724FA8B-ON88256B44.006732B8@raleigh.ibm.com>
There is no definition of Unicode range U+10000..U+10ffff in the
standard book published on March 17, 2000.
Markus Scherer wrote:
>I agree with what Anthony said about mapping code points: Even if they do
>not have assigned characters, their mappings are defined. This is true for
>all Unicode code points except _single_ surrogate code points
>Mapping _from_ GB 18030 may sometimes result in "unassigned" handling
>because some 4-byte GB 18030 sequences are defined but do not have
>mappings to Unicode.
>Dirk and my publications on this are based on a printed version of the GB
>18030 standard from 2000 (plus the published electronic mapping tables),
>and from following discussions about the standard as much as possible. (I
>do not read/speak Chinese, but Dirk does; our companies had Chinese
>representatives that were in frequent discussion with the Chinese
>Note that the supplementary Unicode code points U+10000..U+10ffff were
>_designated_ in Unicode 2.0 (1996), with the pseudo-assignment of
>128*1024-4 of those code points (U+f0000..U+ffffd and U+100000..U+10fffd)
>as a Private-Use Area.
>Unicode 3.1 did not invent this supplementary range but was "only" the
>first Unicode version that assigned "real" characters to such code points
>(and assigned >40000 of them).
>Note also that formally GB 18030 defines mappings to ISO 10646, not
>Unicode. One of the differences is the publication schedule. Supplementary
>character assignments were published only in December 2001 with ISO 10646
>part 2, which synchronized with Unicode 3.1 several months after its
>Markus Scherer IBM GCoC-Unicode/ICU San Josť, CA
>email@example.com (also for SameTime)
Red Hat Asia-Pacific
+61 7 3872 4835