Bug 30883 - with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF
Summary: with a field width, print/sprintf may output an additional space character in...
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: stdio (show other bugs)
Version: 2.37
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-25 00:10 UTC by Vincent Lefèvre
Modified: 2023-09-25 13:09 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefèvre 2023-09-25 00:10:56 UTC
The ISO C standard says about the field width: "If the converted value has fewer characters than the field width, it is padded with spaces (by default) on the left (or right, if the left adjustment flag, described later, has been given) to the field width."

By "characters", it is meant bytes, even in multibyte locales like UTF-8 (glibc normally follows that, as this can be seen with %s). But with %g, a multibyte decimal-point character yields the output of (at least) an additional space in the padding. Since the field width can be a way to fix/limit the size of the output, this can trigger a buffer overflow.

Example:

#include <stdio.h>
#include <float.h>
#include <string.h>
#include <locale.h>

static void f (void)
{
  char s[256];
  double x = .1;
  printf ("[%8g]\n", x);
  sprintf (s, "%8g", x);
  printf ("%zu\n", strlen (s));
}

int main (void)
{
  f ();
  setlocale (LC_ALL, "ps_AF");
  f ();
  return 0;
}

With the ps_AF locale available, where the decimal-point character is U+066B ARABIC DECIMAL SEPARATOR, encoded as D9 AB (UTF-8), this gives on my Debian machine:

[     0.1]
8
[     0٫1]
9

With x being 0.1, the %g output should normally fit in 8 bytes, but in the ps_AF locale, 9 bytes are output for %g!
Comment 1 Florian Weimer 2023-09-25 12:39:52 UTC
See bug 28943.

Your output shows why the current behavior makes sense: It achieves column alignment. Padding based on byte width makes little sense for multi-byte locales.
Comment 2 Vincent Lefèvre 2023-09-25 13:09:40 UTC
(In reply to Florian Weimer from comment #1)
> Your output shows why the current behavior makes sense: It achieves column
> alignment. Padding based on byte width makes little sense for multi-byte
> locales.

This is a point of view, but this is not what was decided for ISO C.

And note that glibc already behaves differently for %s, as I've said:

  printf ("[%8s]\n", "éèê");
  printf ("[%8s]\n", "eee");

gives

[  éèê]
[     eee]

(both 8 bytes). There is no alignment.