fixes for UTF-8 decoder
Bruno Haible
haible@ilog.fr
Fri Jan 21 10:07:00 GMT 2000
Hi Ulrich,
There are two problems with the UTF-8 decoder in iconv/gconv_simple.c.
Bug 1: The "break;" statement inside the "for (i = 1; i < cnt; ++i)" loop
terminates only that loop, whereas it should terminate the outer loop.
Bug 2: The UTF-8 encoder currently accepts multibyte representations that
are longer than necessary. RFC 2279 says that "It is important to note
that ... there is only one valid way to encode a given UCS-4 character."
As Markus Kuhn pointed out, this could some day become security relevant:
if some malformed UTF-8 sequence would, after UTF-8 -> UCS-4 conversion,
exhibit special ASCII characters (like backquote, slash, escape) but before
the UTF-8 -> UCS-4 conversion they were not visible, many programs could
become vulnerable.
The appended patch fixes both.
Bruno
* iconv/gconv_simple.c (utf8_internal_loop): Reject invalid UTF-8
input.
*** glibc/iconv/gconv_simple.c.bak Sun Apr 25 20:06:02 1999
--- glibc/iconv/gconv_simple.c Fri Jan 21 01:14:39 2000
***************
*** 255,262 ****
} \
else \
{ \
! if ((ch & 0xe0) == 0xc0) \
{ \
cnt = 2; \
ch &= 0x1f; \
} \
--- 255,265 ----
} \
else \
{ \
! if (ch >= 0xc2 && ch < 0xe0) \
{ \
+ /* We expect two bytes. The first byte cannot be 0xc0 or 0xc1, \
+ otherwise the wide character could have been represented \
+ using a single byte. */ \
cnt = 2; \
ch &= 0x1f; \
} \
***************
*** 304,318 ****
uint32_t byte = inptr[i]; \
\
if ((byte & 0xc0) != 0x80) \
! { \
! /* This is an illegal encoding. */ \
! result = GCONV_ILLEGAL_INPUT; \
! break; \
! } \
\
ch <<= 6; \
ch |= byte & 0x3f; \
} \
inptr += cnt; \
} \
\
--- 307,329 ----
uint32_t byte = inptr[i]; \
\
if ((byte & 0xc0) != 0x80) \
! /* This is an illegal encoding. */ \
! break; \
\
ch <<= 6; \
ch |= byte & 0x3f; \
} \
+ \
+ /* If i < cnt, some trail byte was not >= 0x80, < 0xc0. \
+ If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+ have been represented with fewer than cnt bytes. */ \
+ if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)) \
+ { \
+ /* This is an illegal encoding. */ \
+ result = GCONV_ILLEGAL_INPUT; \
+ break; \
+ } \
+ \
inptr += cnt; \
} \
\
More information about the Libc-alpha
mailing list