This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 1386
  iconv incorrectly convert bytes 1A, 1C and 7F for IBM943 and IBM942 Last modified: 2006-04-28 03:30
     Query page      Enter new bug
Bug#: 1386   Hardware:   Reporter: George Rhoten <grhoten@gmail.com>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: RESOLVED   Priority:  
Resolution: FIXED   Severity:  
Assigned To: GOTO Masanori <gotom@debian.or.jp>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 1386 depends on: Show dependency tree
Show dependency graph
Bug 1386 blocks:

Additional Comments:


Leave as RESOLVED FIXED
Reopen bug
Mark bug as VERIFIED

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2005-09-28 03:48
The conversion tables for IBM943 and IBM942 are incorrect for iconv. The byte
values for 1A, 1C and 7F do not round trip to Unicode (UTF-8) and back to these
Shift-JIS codepages. Normally Unicode 1A roundtrip maps to Shift-JIS 7F, Unicode
7F roundtrip maps to Shift-JIS 1C and Unicode 1C roundtrip maps to Shift-JIS 1A.

iconv does not have this behavior. For example iconv has the following behavior,
Unicode 1F converts to Shift-JIS 1C, and Shift-JIS 1C converts to Unicode 1A.

If you would like the mapping tables generated from IBM's official repository of
coded character sets, I recommend you look at these tables, and use them for the
basis of iconv.

http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-942_P12A-1999.ucm
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-943_P15A-2003.ucm

For reference, here are other tables that can be used for the same CCSID (coded
character set identifier).
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-942_P120-1999.ucm
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-942_P12A-1998.ucm
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-943_P130-1999.ucm
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-943_P14A-1998.ucm
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/charset/data/ucm/ibm-943_P14A-1999.ucm
 
(full disclosure) I work for IBM, and I am a part of the ICU project.

------- Additional Comment #1 From Ulrich Drepper 2005-10-14 06:12 -------
We don't have an ibm942 conversion module and the ibm943 module was generated by
IBM.  I have no reason to believe the ICU tables more than those used to
generate the module.

I'll leave the bug open, maybe the module author will comment.  If this doesn't
happen I'll close it sometime soon.

------- Additional Comment #2 From George Rhoten 2005-10-15 00:05 -------
(In reply to comment #1)

You're right. iconv doesn't have ibm-942. I meant ibm-932. Sorry about that.

The ibm-* tables from ICU's charset repository are generated directly from IBM's
CDRA. I'm sure that the ibm943 iconv module was also generated from IBM, but
this seems to be a typo in the iconv module.

The main issue is not whether \u007F goes to \x7F or \x1C. Both mapping
behaviors are considered valid in IBM's CDRA. The problem is that those bytes
don't map back to the original Unicode character. You have to round trip convert
your data three times to get your original data back.

------- Additional Comment #3 From Ulrich Drepper 2005-10-16 08:06 -------
It's pointless to argue here.  Talk to the author of the modules.  I'm
suspending the bug until that happened.

------- Additional Comment #4 From George Rhoten 2005-10-17 01:25 -------
(In reply to comment #3)

Since I am unfamiliar with the authors of the module, where or who should I
really be reporting this problem to?

------- Additional Comment #5 From Bruno Haible 2005-12-28 15:36 -------
(In reply to comment #4) 
 
George Rhoten, you find the authors of the conversion modules in the glibc 
source files and ChangeLogs. For both iconvdata/ibm932.c and 
iconvdata/ibm943.c, it is Masahide Washizawa <washi@jp.ibm.com>. 
 

------- Additional Comment #6 From Ulrich Drepper 2006-04-26 06:14 -------
No reply in almost 4 months.  Reopen if you get real information.

------- Additional Comment #7 From George Rhoten 2006-04-26 18:32 -------
I have contacted the writer of this code, and he has created a patch to fix the
code.

------- Additional Comment #8 From George Rhoten 2006-04-27 14:57 -------
Quote from Masahide Washizawa, "I have just sent the patch to Ulrich-san who is 
the glibc maintainer, and he applied it to the glibc tree immediately."

So if the patch is applied, then I'm happy.

------- Additional Comment #9 From GOTO Masanori 2006-04-28 03:30 -------
The patch was applied to the cvs.  I close it.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In