This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings
- From: "arthur200126 at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Fri, 25 Nov 2016 04:34:51 +0000
- Subject: [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
Bug ID: 20865
Summary: iconv: cp950 does not contain EUDC/PUA mappings
Product: glibc
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: arthur200126 at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---
Microsoft's cp950 mapping contains sequential mappings from Big5's Extended
User-defined Characters (EUDC) to Unicode PUA. Such mappings are used by a
number of Big5 extensions, including HKSCS which uses these PUA code points
when a character is not yet available in the target UCS version.
The following sessions come from GNU bash running in a UTF-8 console. $''
denotes bash's ANSI C-style quoting, where \xhh generates a raw hex byte and
\uhhhh generates the representation of U+hhhh under current locale.
Currently glibc's cp950 implementation does not contain these mappings:
# iconv (Ubuntu GLIBC 2.23-0ubuntu4) 2.23
ubuntu$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
iconv: illegal input sequence at position 0
ubuntu$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0
The desired behavior for decoding can be seen in libiconv:
# iconv (GNU libiconv 1.14)
cygwin$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
00000000 b8 ee 00 00 0a 00 00 00 |........|
00000008
Note that libiconv is not interested in doing the reverse:
cygwin$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0
libiconv's mapping:
http://git.savannah.gnu.org/cgit/libiconv.git/tree/lib/cp950.h#n72
--
You are receiving this mail because:
You are on the CC list for the bug.