The iconv infopage says the following: `EILSEQ' The conversion stopped because of an invalid byte sequence in the input. After the call, `*INBUF' points at the first byte of the invalid byte sequence. However, this is clearly not the case when an //IGNORE target charset is specified: #include <iconv.h> #include <string.h> #include <stdio.h> #include <errno.h> int main() { iconv_t i = iconv_open("ascii//IGNORE", "utf-8"); char inbuf[10000]; char outbuf[10000]; char *in = inbuf; char *out = outbuf; int inleft = 10000; int outleft = 10000; int s; memset(inbuf, 0x77, 10000); inbuf[0] = 0xC2; inbuf[1] = 0xA2; s = iconv(i, &in, &inleft, &out, &outleft); printf("s = %d, errno = %d, in[0] = %x, inleft = %d\n", s, errno, (unsigned char)*in, inleft); } Outputs the following: s = -1, errno = 84, in[0] = 77, inleft = 1839 'iconv' appears to have gobbled up another ~8000 bytes after the invalid byte sequence, before returning EILSEQ (84). The documentation here cannot possibly correct, if we want 'IGNORE' to actually do anything. So we have two options: 1. Claim that the semantics of EILSEQ change when the magic //IGNORE flag is specified, and require user code to work around it properly. This is what the '-c' flag in iconv_prog.c does, by magically "converting" these errors into E2BIG errors, and re-running iconv appropriately. 2. Claim that the this API is wrong, and modify the API such that an iconv operating on an //IGNORE character set *never* returns EILSEQ (what one might expect, since IGNORE is supposed to allow us to ignore sequences that are illegal in the target). This would make glibc's iconv implementation consistent with libiconv's. I favor (2), since it makes client code considerably simpler and easier to implement correctly.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
Created attachment 9861 [details] HTMLPurifier.standalone
Created attachment 9862 [details] HTMLPurifier.standalone
I agree that option (2) (never return EILSEQ with //IGNORE) makes the most sense.