The following equivalent iconv invocations lead to a SIGSEGV: $ echo -en "\x00\xfe" | iconv -f EUC-KR -t "UTF-8//IGNORE" $ echo -en "\x00\xfe" | iconv -c -f EUC-KR -t "UTF-8"
Fixed in master: https://sourceware.org/git/?p=glibc.git;a=commit;h=ee7a3144c9922808181009b7b3e50e852fb4999b Author: Andreas Schwab <schwab@suse.de> Date: Mon Dec 21 08:56:43 2020 +0530 Fix buffer overrun in EUC-KR conversion module (bz #24973) The byte 0xfe as input to the EUC-KR conversion denotes a user-defined area and is not allowed. The from_euc_kr function used to skip two bytes when told to skip over the unknown designation, potentially running over the buffer end.
Using iconv results in corrupted output, when "-c" flag is used for input where characters that *can* and *cannot* be converted appear together. The issue only manifests for rather large inputs (presumably > 32K). Run in bash: >export LANG=C >perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2\x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c Expected output: >15000 XâX�XâXXâX�XâX Actual output: > 1 > 2 XXâX�XâX > 2 XâX�XXâX > 2 XâX�XâX > 1 XâX�XâXX > 2 XâX�XâXXâX�X�XâXXâX�XâX > 14917 XâX�XâXXâX�XâX As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000). Actual specific input does not matter, as long as it has a mix of convertable and non-convertable characters. Reducing number of input lines to smaller number (ex. 1000) and all works as expected: >1000 XâX�XâXXâX�XâX I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results. Using piconv (Perl variant of iconv) instead of iconv produces correct results.
Please file a separate bug for it.