Bug 24973 (CVE-2019-25013)

Summary: iconv encounters segmentation fault when converting 0x00 0xfe in EUC-KR to UTF-8 (CVE-2019-25013)
Product: glibc Reporter: Arjun Shankar <arjun.is>
Component: localeAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: carnil, fweimer, siddhesh, soko246
Priority: P2 Flags: fweimer: security+
Version: 2.30   
Target Milestone: 2.33   
Host: Target:
Build: Last reconfirmed:

Description Arjun Shankar 2019-09-06 10:47:28 UTC
The following equivalent iconv invocations lead to a SIGSEGV:

$ echo -en "\x00\xfe" | iconv -f EUC-KR -t "UTF-8//IGNORE"

$ echo -en "\x00\xfe" | iconv -c -f EUC-KR -t "UTF-8"
Comment 1 Siddhesh Poyarekar 2020-12-21 03:37:02 UTC
Fixed in master:

https://sourceware.org/git/?p=glibc.git;a=commit;h=ee7a3144c9922808181009b7b3e50e852fb4999b

Author: Andreas Schwab <schwab@suse.de>
Date:   Mon Dec 21 08:56:43 2020 +0530

    Fix buffer overrun in EUC-KR conversion module (bz #24973)
    
    The byte 0xfe as input to the EUC-KR conversion denotes a user-defined
    area and is not allowed.  The from_euc_kr function used to skip two bytes
    when told to skip over the unknown designation, potentially running over
    the buffer end.
Comment 2 soko246 2021-09-30 17:45:15 UTC
Using iconv results in corrupted output, when "-c" flag is used for input where characters that *can* and *cannot* be converted appear together.
The issue only manifests for rather large inputs (presumably > 32K).

Run in bash:
>export LANG=C
>perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2\x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c

Expected output:
>15000 XâX�XâXXâX�XâX

Actual output:
> 1
> 2 XXâX�XâX
> 2 XâX�XXâX
> 2 XâX�XâX
> 1 XâX�XâXX
> 2 XâX�XâXXâX�X�XâXXâX�XâX
> 14917 XâX�XâXXâX�XâX

As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000). 

Actual specific input does not matter, as long as it has a mix of convertable and non-convertable characters.
Reducing number of input lines to smaller number (ex. 1000) and all works as expected:
>1000 XâX�XâXXâX�XâX

I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results.

Using piconv (Perl variant of iconv) instead of iconv produces correct results.
Comment 3 Siddhesh Poyarekar 2021-10-01 02:03:48 UTC
Please file a separate bug for it.