This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Unicode 3.1 support (4)



The converters that profit most from Unicode 3.1 are EUC-TW and
ISO-2022-CN-EXT, because Unicode 3.1 now covers most of CNS 11643
planes 1 to 7.

This patch updates the CNS 11643 converters (to include planes 3 to 7 and
15, and to remove plane 14), the EUC-TW charmap, and the EUC-TW and
ISO-2022-CN-EXT converters.

A note about plane 14. According to Ken Lunde's "CJKV Information Processing"
book, p. 93-96, CNS 11643 - 1986 was made up of four planes (1, 2, 14, 15),
with plane 14 being a late-comer (published in 1988). CNS 11643 - 1992 then
renamed plane 14 to plane 3 and added four more planes. I think the fact that
glibc's EUC-TW and ISO-2022-CN-EXT converters were up to now assuming the
existence of a plane 14 is a bug because:

  - According to Ken Lunde, p. 162-165, EUC-TW uses CNS 11643 - 1992, not 1986.

  - The RFC which defines ISO-2022-CN-EXT, namely RFC 1922, talks about
    planes 1 to 7. That implies it assumes CNS 11643 - 1992, not 1986.


2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* iconvdata/cns11643l1.c: Update to Unicode 3.1.
	(__cns11643l1_to_ucs4_tab): Regenerated.
	(__cns11643l1_from_ucs4_tab12): Regenerated.
	* iconvdata/cns11643.c: Update to Unicode 3.1.
	(__cns11643l14_to_ucs4_tab): Remove array.
	(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
	__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
	__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New arrays.
	(__cns11643_from_ucs4p0_tab): Renamed from __cns11643_from_ucs4_tab.
	(__cns11643_from_ucs4p2_tab): New array.
	* iconvdata/cns11643.h (__cns11643l14_to_ucs4_tab): Remove declaration.
	(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
	__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
	__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New declarations.
	(cns11643_to_ucs4): Treat planes 3, 4, 5, 6, 7, 15 instead of 14.
	(__cns11643_from_ucs4_tab): Remove declaration.
	(__cns11643_from_ucs4p0_tab, __cns11643_from_ucs4p2_tab): New
	declarations.
	(ucs4_to_cns11643): Update for new arrays. Treat U+3400..U+4DFF and
	U+20000..U+2A6D6.
	* iconvdata/cns11643l2.h (__cns11643_from_ucs4_tab): Remove
	declaration.
	(__cns11643_from_ucs4p0_tab): New declaration.
	(ucs4_to_cns11643l2): Update for new arrays.
	* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Handle planes
	3 to 7.
	(BODY for TO_LOOP): Handle planes 3 to 7, instead of plane 14.
	* iconvdata/EUC-TW.irreversible: New file.
	* iconvdata/tst-table.sh: Use it.
	* iconvdata/Makefile (distribute): Add CP1255.irreversible,
	CP1258.irreversible, EUC-TW.irreversible.

2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* charmaps/EUC-TW: Update to Unicode 3.1. Add mappings for
	<U4EA0>, <U51AB>, <U52F9>. Remove 0x8EAExxxx mappings. Add
	0x8EA3xxxx, 0x8EA4xxxx, 0x8EA5xxxx, 0x8EA6xxxx, 0x8EA7xxxx,
	0x8EAFxxxx mappings instead.

The patch is too big for this mailing list or private mail, please
fetch it from ftp://ftp.ilog.fr/pub/Users/haible/gnu/CNS11643.diff.bz2

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]