Bug 10392 - iconv's CP932 doesn't recognize apparently valid input
Summary: iconv's CP932 doesn't recognize apparently valid input
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.9
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-15 08:03 UTC by Michael Mohr
Modified: 2014-07-01 07:49 UTC (History)
1 user (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build: x86_64-pc-linux-gnu
Last reconfirmed:
fweimer: security-


Attachments
test case for failed input (874 bytes, text/plain)
2009-07-15 08:08 UTC, Michael Mohr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Mohr 2009-07-15 08:03:38 UTC
I'm running sys-libs/glibc-2.9_p20081201-r2 on Gento Linux.  I'm attempting to
use iconv to convert sjis/cp932 input into utf8 output.  However, some
apparently valid input is causing iconv to return EILSEQ.  It is worthwhile to
note that this content can be displayed by Firefox both on Linux and a Windows
Vista test box as well as in IE8.  It appears that there is some problem within
iconv, but there is no other charset that matches the input (as seen here):

http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html
Comment 1 Michael Mohr 2009-07-15 08:08:15 UTC
Created attachment 4054 [details]
test case for failed input

Open this file in Firefox and manually set encoding to ShiftJIS.  Then run it
through iconv and notice that it fails.
Comment 2 Michael Mohr 2009-07-16 07:34:34 UTC
It appears that the first 64 indices of row 81 are invalid:

http://web.mit.edu/shutkin/MacData_1124b/afs/sipb/project/dia/src/libunicode-0.4/msft/cp932.h

but then why does this character still render -- apparently correctly -- elsewhere?
Comment 3 Ulrich Drepper 2009-07-17 05:55:36 UTC
What code positions do you mean?  When I read

   It appears that the first 64 indices of row 81 are invalid:

I hope you don't mean 0x81,0x00 to 0x81,0x3f.  These are of course invalid. 
Just look at

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

The tables is used to generate the tables.
Comment 4 Ulrich Drepper 2009-10-30 05:53:35 UTC
No reply in more than 3 months.  Closing.