iconv seems to truncate inputs at around 8157 bytes if they contain invalid characters for the target set, even if IGNORE is specified. Steps to reproduce: 1. Download iconv.html ezyang@javelin:~$ wget http://www.oppcharts.com/iconv.html 2. Attempt to convert UTF-8 to iso-8859-1//IGNORE Expected behavior (from libiconv-1.14): ezyang@javelin:~/Dev/glibc/build$ ~/Desktop/libiconv-1.14/src/iconv_no_i18n -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c 15312 Actual behavior (from latest Git glibc-2.14-567-ga4647e7): ezyang@javelin:~/Dev/glibc/build$ ./testrun.sh iconv/iconv_prog -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c iconv/iconv_prog: illegal input sequence at position 8168 8157
Created attachment 6117 [details] Alpha, followed by a lot of x's Here's a better, more minimal test-case. ezyang@javelin:~$ Dev/glibc/build/testrun.sh Dev/glibc/build/iconv/iconv_prog -f utf-8 -t ascii//IGNORE < test.txt | wc -c Dev/glibc/build/iconv/iconv_prog: illegal input sequence at position 8161 8159 ezyang@javelin:~$ Desktop/libiconv-1.14/src/iconv_no_i18n -f utf-8 -t ascii//IGNORE < test.txt | wc -c 11059
The iconv program cannot be used with the magic //IGNORE suffix. You have to use the -c parameter.
I think there still is a bug here. If //IGNORE is not supported by iconv_prog, the behavior between -t with IGNORE and -c should be the same. However, this is not the case: ezyang@javelin:~$ Dev/glibc/build/testrun.sh Dev/glibc/build/iconv/iconv_prog -f utf-8 -t ascii//IGNORE < test.txt | wc -c Dev/glibc/build/iconv/iconv_prog: illegal input sequence at position 8161 8159 ezyang@javelin:~$ Dev/glibc/build/testrun.sh Dev/glibc/build/iconv/iconv_prog -f utf-8 -t ascii < test.txt | wc -c Dev/glibc/build/iconv/iconv_prog: illegal input sequence at position 0 0 For reference, here is iconv running with an invalid extra flag: ezyang@javelin:~$ Dev/glibc/build/testrun.sh Dev/glibc/build/iconv/iconv_prog -f utf-8 -t ascii//FOOBAR < test.txt | wc -c Dev/glibc/build/iconv/iconv_prog: illegal input sequence at position 0 0
OK, I think I understand the underlying issue better. I'll file a new bug.