This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Unicode 3.1 support (4)
- To: libc-alpha at sources dot redhat dot com
- Subject: Unicode 3.1 support (4)
- From: Bruno Haible <haible at ilog dot fr>
- Date: Wed, 30 May 2001 20:08:46 +0200 (CEST)
The converters that profit most from Unicode 3.1 are EUC-TW and
ISO-2022-CN-EXT, because Unicode 3.1 now covers most of CNS 11643
planes 1 to 7.
This patch updates the CNS 11643 converters (to include planes 3 to 7 and
15, and to remove plane 14), the EUC-TW charmap, and the EUC-TW and
ISO-2022-CN-EXT converters.
A note about plane 14. According to Ken Lunde's "CJKV Information Processing"
book, p. 93-96, CNS 11643 - 1986 was made up of four planes (1, 2, 14, 15),
with plane 14 being a late-comer (published in 1988). CNS 11643 - 1992 then
renamed plane 14 to plane 3 and added four more planes. I think the fact that
glibc's EUC-TW and ISO-2022-CN-EXT converters were up to now assuming the
existence of a plane 14 is a bug because:
- According to Ken Lunde, p. 162-165, EUC-TW uses CNS 11643 - 1992, not 1986.
- The RFC which defines ISO-2022-CN-EXT, namely RFC 1922, talks about
planes 1 to 7. That implies it assumes CNS 11643 - 1992, not 1986.
2001-05-29 Bruno Haible <haible@clisp.cons.org>
* iconvdata/cns11643l1.c: Update to Unicode 3.1.
(__cns11643l1_to_ucs4_tab): Regenerated.
(__cns11643l1_from_ucs4_tab12): Regenerated.
* iconvdata/cns11643.c: Update to Unicode 3.1.
(__cns11643l14_to_ucs4_tab): Remove array.
(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New arrays.
(__cns11643_from_ucs4p0_tab): Renamed from __cns11643_from_ucs4_tab.
(__cns11643_from_ucs4p2_tab): New array.
* iconvdata/cns11643.h (__cns11643l14_to_ucs4_tab): Remove declaration.
(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New declarations.
(cns11643_to_ucs4): Treat planes 3, 4, 5, 6, 7, 15 instead of 14.
(__cns11643_from_ucs4_tab): Remove declaration.
(__cns11643_from_ucs4p0_tab, __cns11643_from_ucs4p2_tab): New
declarations.
(ucs4_to_cns11643): Update for new arrays. Treat U+3400..U+4DFF and
U+20000..U+2A6D6.
* iconvdata/cns11643l2.h (__cns11643_from_ucs4_tab): Remove
declaration.
(__cns11643_from_ucs4p0_tab): New declaration.
(ucs4_to_cns11643l2): Update for new arrays.
* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Handle planes
3 to 7.
(BODY for TO_LOOP): Handle planes 3 to 7, instead of plane 14.
* iconvdata/EUC-TW.irreversible: New file.
* iconvdata/tst-table.sh: Use it.
* iconvdata/Makefile (distribute): Add CP1255.irreversible,
CP1258.irreversible, EUC-TW.irreversible.
2001-05-29 Bruno Haible <haible@clisp.cons.org>
* charmaps/EUC-TW: Update to Unicode 3.1. Add mappings for
<U4EA0>, <U51AB>, <U52F9>. Remove 0x8EAExxxx mappings. Add
0x8EA3xxxx, 0x8EA4xxxx, 0x8EA5xxxx, 0x8EA6xxxx, 0x8EA7xxxx,
0x8EAFxxxx mappings instead.
The patch is too big for this mailing list or private mail, please
fetch it from ftp://ftp.ilog.fr/pub/Users/haible/gnu/CNS11643.diff.bz2
Bruno