This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings


https://sourceware.org/bugzilla/show_bug.cgi?id=20865

            Bug ID: 20865
           Summary: iconv: cp950 does not contain EUDC/PUA mappings
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: arthur200126 at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Microsoft's cp950 mapping contains sequential mappings from Big5's Extended
User-defined Characters (EUDC) to Unicode PUA. Such mappings are used by a
number of Big5 extensions, including HKSCS which uses these PUA code points
when a character is not yet available in the target UCS version.

The following sessions come from GNU bash running in a UTF-8 console. $''
denotes bash's ANSI C-style quoting, where \xhh generates a raw hex byte and
\uhhhh generates the representation of U+hhhh under current locale.

Currently glibc's cp950 implementation does not contain these mappings:

# iconv (Ubuntu GLIBC 2.23-0ubuntu4) 2.23
ubuntu$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
iconv: illegal input sequence at position 0
ubuntu$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0

The desired behavior for decoding can be seen in libiconv:

# iconv (GNU libiconv 1.14)
cygwin$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
00000000  b8 ee 00 00 0a 00 00 00                           |........|
00000008

Note that libiconv is not interested in doing the reverse:

cygwin$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0

libiconv's mapping:
http://git.savannah.gnu.org/cgit/libiconv.git/tree/lib/cp950.h#n72

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]