The following equivalent iconv invocations lead to a SIGSEGV:
$ echo -en "\x00\xfe" | iconv -f EUC-KR -t "UTF-8//IGNORE"
$ echo -en "\x00\xfe" | iconv -c -f EUC-KR -t "UTF-8"
Fixed in master:
Author: Andreas Schwab <email@example.com>
Date: Mon Dec 21 08:56:43 2020 +0530
Fix buffer overrun in EUC-KR conversion module (bz #24973)
The byte 0xfe as input to the EUC-KR conversion denotes a user-defined
area and is not allowed. The from_euc_kr function used to skip two bytes
when told to skip over the unknown designation, potentially running over
the buffer end.
Using iconv results in corrupted output, when "-c" flag is used for input where characters that *can* and *cannot* be converted appear together.
The issue only manifests for rather large inputs (presumably > 32K).
Run in bash:
>perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2\x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c
> 2 XXâX�XâX
> 2 XâX�XXâX
> 2 XâX�XâX
> 1 XâX�XâXX
> 2 XâX�XâXXâX�X�XâXXâX�XâX
> 14917 XâX�XâXXâX�XâX
As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000).
Actual specific input does not matter, as long as it has a mix of convertable and non-convertable characters.
Reducing number of input lines to smaller number (ex. 1000) and all works as expected:
I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results.
Using piconv (Perl variant of iconv) instead of iconv produces correct results.
Please file a separate bug for it.