bugs in CP949 converter
Bruno Haible
haible@ilog.fr
Mon Jul 31 08:26:00 GMT 2000
The iconv converter module for Korean CP949 (a.k.a. UHC) has three bugs:
- In the Unicode to CP949 direction, ASCII bytes are converted but
produce no output. I.e. the output pointers are not updated.
- In the CP949 to Unicode direction, two-byte sequences of the form
0x81FF, 0x82FF, ..., 0xA0FF are not rejected.
- In the Unicode to CP949 direction, ASCII 0x7f is rejected. But the
inverse direction accepts it, and all other CP949 converters and tables
I know of accept it as well.
Here is a fix.
2000-07-30 Bruno Haible <haible@clisp.cons.org>
* iconvdata/uhc.c (BODY for FROM_LOOP): Reject ch2 == 0xff as invalid.
(BODY for TO_LOOP): Accept 0x7f. Increment outptr as needed.
*** glibc-20000729/iconvdata/uhc.c.bak Mon Jul 3 16:39:27 2000
--- glibc-20000729/iconvdata/uhc.c Sun Jul 30 23:19:08 2000
***************
*** 3118,3123 ****
--- 3118,3124 ----
{ \
if (__builtin_expect (ch, 0xc5) > 0xc6 \
|| __builtin_expect (ch2, 0x41) < 0x41 \
+ || __builtin_expect (ch2, 0x41) > 0xfe \
|| (__builtin_expect (ch2, 0x41) > 0x5a && ch2 < 0x61) \
|| (__builtin_expect (ch2, 0x41) > 0x7a && ch2 < 0x81) \
|| (__builtin_expect (ch, 0xc5) == 0xc6 && ch2 > 0x52)) \
***************
*** 3194,3202 ****
{ \
uint32_t ch = get32 (inptr); \
\
! if (ch < 0x7f) \
/* XXX Think about 0x5c ; '\'. */ \
! *outptr = ch; \
else if (ch >= 0xac00 && ch <= 0xd7a3) \
{ \
const char *s = uhc_hangul_from_ucs[ch - 0xac00]; \
--- 3195,3203 ----
{ \
uint32_t ch = get32 (inptr); \
\
! if (ch <= 0x7f) \
/* XXX Think about 0x5c ; '\'. */ \
! *outptr++ = ch; \
else if (ch >= 0xac00 && ch <= 0xd7a3) \
{ \
const char *s = uhc_hangul_from_ucs[ch - 0xac00]; \
More information about the Libc-alpha
mailing list