Bug 23432 - incorrect printf output for integers with thousands separator and precision field larger than the number of digits (needing leading zeros)
Summary: incorrect printf output for integers with thousands separator and precision f...
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: stdio (show other bugs)
Version: 2.27
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-19 09:17 UTC by Vincent Lefèvre
Modified: 2023-02-08 01:12 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefèvre 2018-07-19 09:17:39 UTC
When leading zeros are output by printf on an integer, thanks to the precision field, the thousands' grouping characters in the part before the first non-zero digit are missing.

Consider the following program:

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv)
{
  setlocale (LC_ALL, "");
  printf ("%'.17d\n", 123456789);
  return 0;
}

With LC_ALL=en_US.utf8 I get

000000123,456,789

instead of

000,000,123,456,789

POSIX http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html says:

  An optional precision that gives the minimum number of digits to appear for the d, i, o, u, x, and X conversion specifiers[...]

and

  (The <apostrophe>.) The integer portion of the result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G ) shall be formatted with thousands' grouping characters. [...]

It does not mention any special treatment of the leading zeros compared to the other digits. And even though for readability, the thousands' grouping character is useless before the first non-zero digit, I assume that it may be important for alignment with other numbers and in other cases, such as implementing multiple precision in radix 1000^n (say, each radix 1000^3 digit is output with "%'.9d"). Thus I don't think there is a defect in POSIX and the real intent is to output these thousands' grouping characters.

Note: When using 0 padding, e.g. "%017d", I assume that the current behavior is correct, because this is just padding, not additional digits.
Comment 1 Vincent Lefèvre 2023-01-10 11:47:09 UTC
This bug is still there in glibc 2.36, but this is even worse than what I've said, as some leading zeros are also missing, probably because glibc considers the number of bytes of the string with the thousands separator instead of just the number of digits. Consider the following new testcase:

#include <stdio.h>
#include <locale.h>

int main (void)
{
  volatile int m = 1234567;
  volatile long n = 1234567890;
  if (setlocale (LC_ALL, ""))
    for (int i = 0; i < 2; i++)
      {
        printf ("%.17d\n", m);
        printf ("%'.17d\n", m);
        printf ("%.17ld\n", n);
        printf ("%'.17ld\n", n);
        m = -m;
        n = -n;
      }
  return 0;
}

zira% LC_ALL=en_US.utf8 ./tst
00000000001234567
000000001,234,567
00000001234567890
00001,234,567,890
-00000000001234567
-000000001,234,567
-00000001234567890
-00001,234,567,890

zira% LC_ALL=fr_FR.utf8 ./tst
00000000001234567
00001 234 567
00000001234567890
1 234 567 890
-00000000001234567
-00001 234 567
-00000001234567890
-1 234 567 890

In the fr_FR.utf8 locale, the space is U+202F NARROW NO-BREAK SPACE, which takes 3 bytes in UTF-8: e2 80 af. This is probably why there are even fewer leading zeros in the output.

As this can be seen with this testcase, only the thousands separator is handled incorrectly, not the minus sign for the negative numbers (which doesn't affect the number of leading zeros).

Note: this might have been partially fixed recently in master (though I couldn't see any mention related to the thousands' grouping character or the width field in the Git log), since a change of behavior triggered a failure in the GNU MPFR testsuite:

https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00001.html
https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00002.html

(the "expected" value is actually incorrect as it was based on the incorrect behavior of glibc; so this is currently a bug in the MPFR testsuite). But note that there is still a missing thousands' grouping character.
Comment 2 Vincent Lefèvre 2023-01-10 11:49:38 UTC
(In reply to Vincent Lefèvre from comment #1)
> Note: this might have been partially fixed recently in master (though I
> couldn't see any mention related to the thousands' grouping character or the
> width field in the Git log), [...]

I actually meant "precision field" (I did a search on "precision").
Comment 3 Vincent Lefèvre 2023-01-10 12:32:32 UTC
Note: This concerns only integers (e.g. %d), as with floating-point numbers, specifying a precision cannot yield leading zeros. The POSIX rule is different with zero padding ("If the '0' and <apostrophe> flags both appear, the grouping characters are inserted before zero padding."), which is due to the width field rather than the precision field.
Comment 4 Andreas Schwab 2023-01-17 12:15:17 UTC
> https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00001.html
> https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00002.html

These links no longer work (infinite redirection).
Comment 5 Vincent Lefèvre 2023-01-17 12:51:58 UTC
(In reply to Andreas Schwab from comment #4)
> > https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00001.html
> > https://sympa.inria.fr/sympa/arc/mpfr/2023-01/msg00002.html
> 
> These links no longer work (infinite redirection).

I don't have any issue with the Firefox and Opera web browsers. I suggest to remove cookies associated with the inria.fr domain (obsolete cookies can sometimes yield strange behavior).
Comment 6 Andreas Schwab 2023-01-17 13:32:41 UTC
I don't have any cookies.
Comment 7 Vincent Lefèvre 2023-02-02 01:22:09 UTC
Though I fixed the MPFR testsuite, there's still a failure due to a new bug in glibc from git and 2.37. See bug 30068 (which I've just reported) for this new bug.
Comment 8 Vincent Lefèvre 2023-02-06 13:58:21 UTC
Note that I was wrong in my initial bug report. For

  printf ("%'.17d\n", 123456789);

I said that it should output "000,000,123,456,789". Actually, there were 2 issues in the glibc 2.36- behavior: the missing thousands' grouping characters with the leading zeros and the incorrect number of digits (15 instead of 17). So the correct output should be "00,000,000,123,456,789".
Comment 9 Sourceware Commits 2023-02-06 15:21:52 UTC
The master branch has been updated by Carlos O'Donell <carlos@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c980549cc6a1c03c23cc2fe3e7b0fe626a0364b0

commit c980549cc6a1c03c23cc2fe3e7b0fe626a0364b0
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Thu Jan 19 12:50:20 2023 +0100

    Account for grouping in printf width (bug 30068)
    
    This is a partial fix for mishandling of grouping when formatting
    integers.  It properly computes the width in the presence of grouping
    characters when the width is larger than the number of significant
    digits. The precision related issue is documented in bug 23432.
    
    Co-authored-by: Andreas Schwab <schwab@suse.de>
Comment 10 Sourceware Commits 2023-02-08 01:12:33 UTC
The release/2.37/master branch has been updated by Carlos O'Donell <carlos@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=07b9521fc6369d000216b96562ff7c0ed32a16c4

commit 07b9521fc6369d000216b96562ff7c0ed32a16c4
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Thu Jan 19 12:50:20 2023 +0100

    Account for grouping in printf width (bug 30068)
    
    This is a partial fix for mishandling of grouping when formatting
    integers.  It properly computes the width in the presence of grouping
    characters when the width is larger than the number of significant
    digits. The precision related issue is documented in bug 23432.
    
    Co-authored-by: Andreas Schwab <schwab@suse.de>
    (cherry picked from commit c980549cc6a1c03c23cc2fe3e7b0fe626a0364b0)