30883 – with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF

Bug 30883 - with a field width, print/sprintf may output an additional space character in multibyte locales like ps_AF

Summary: with a field width, print/sprintf may output an additional space character in...

Status:	UNCONFIRMED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	stdio (show other bugs)
Version:	2.37

Importance:	P2 critical
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-09-25 00:10 UTC by Vincent Lefèvre
Modified:	2023-09-25 13:09 UTC (History)
CC List:	1 user (show)

See Also:	28943
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Vincent Lefèvre 2023-09-25 00:10:56 UTC

The ISO C standard says about the field width: "If the converted value has fewer characters than the field width, it is padded with spaces (by default) on the left (or right, if the left adjustment flag, described later, has been given) to the field width."

By "characters", it is meant bytes, even in multibyte locales like UTF-8 (glibc normally follows that, as this can be seen with %s). But with %g, a multibyte decimal-point character yields the output of (at least) an additional space in the padding. Since the field width can be a way to fix/limit the size of the output, this can trigger a buffer overflow.

Example:

#include <stdio.h>
#include <float.h>
#include <string.h>
#include <locale.h>

static void f (void)
{
  char s[256];
  double x = .1;
  printf ("[%8g]\n", x);
  sprintf (s, "%8g", x);
  printf ("%zu\n", strlen (s));
}

int main (void)
{
  f ();
  setlocale (LC_ALL, "ps_AF");
  f ();
  return 0;
}

With the ps_AF locale available, where the decimal-point character is U+066B ARABIC DECIMAL SEPARATOR, encoded as D9 AB (UTF-8), this gives on my Debian machine:

[     0.1]
8
[     0٫1]
9

With x being 0.1, the %g output should normally fit in 8 bytes, but in the ps_AF locale, 9 bytes are output for %g!

Comment 1 Florian Weimer 2023-09-25 12:39:52 UTC

See bug 28943.

Your output shows why the current behavior makes sense: It achieves column alignment. Padding based on byte width makes little sense for multi-byte locales.

Comment 2 Vincent Lefèvre 2023-09-25 13:09:40 UTC

(In reply to Florian Weimer from comment #1)
> Your output shows why the current behavior makes sense: It achieves column
> alignment. Padding based on byte width makes little sense for multi-byte
> locales.

This is a point of view, but this is not what was decided for ISO C.

And note that glibc already behaves differently for %s, as I've said:

  printf ("[%8s]\n", "éèê");
  printf ("[%8s]\n", "eee");

gives

[  éèê]
[     eee]

(both 8 bytes). There is no alignment.