Bug 30647 - scanf functions wrong on nan()
Summary: scanf functions wrong on nan()
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: stdio (show other bugs)
Version: 2.37
: P2 normal
Target Milestone: 2.41
Assignee: Avinal Kumar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-17 15:30 UTC by Vincent Lefèvre
Modified: 2024-10-25 18:11 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2024-04-30 00:00:00
Project(s) to access:
ssh public key:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefèvre 2023-07-17 15:30:06 UTC
On a string input containing "nan()" with parentheses (possibly with n-char-sequence), the scanf functions assume that the subject sequence is just "nan". Note that strtod is correct, i.e. it takes the parentheses into account.

Consider the following testcase:

#include <stdio.h>
#include <stdlib.h>

static void test_strtod (const char *s)
{
  char *endptr;
  double d;

  printf ("strtod test on %s\n", s);
  d = strtod (s, &endptr);
  printf ("d = %g \"%s\"\n", d, endptr);
}

int main (void)
{
  int r;
  double a, b, c;

  test_strtod ("nan*");
  test_strtod ("nan()*");

  r = sscanf ("nan nan() 1", "%lf%lf%lf", &a, &b, &c);
  printf ("sscanf return value: %d\n", r);
  if (r >= 1)
    printf ("a = %g\n", a);
  if (r >= 2)
    printf ("b = %g\n", b);
  if (r >= 3)
    printf ("c = %g\n", c);

  r = fscanf (stdin, "%lf%lf%lf", &a, &b, &c);
  printf ("fscanf return value: %d\n", r);
  if (r >= 1)
    printf ("a = %g\n", a);
  if (r >= 2)
    printf ("b = %g\n", b);
  if (r >= 3)
    printf ("c = %g\n", c);
  return 0;
}

I get the following output with GNU libc 2.31 and 2.37 on Debian:

$ printf "nan nan() 1" | ./naninput
strtod test on nan*
d = nan "*"
strtod test on nan()*
d = nan "*"
sscanf return value: 2
a = nan
b = nan
fscanf return value: 2
a = nan
b = nan

instead of

strtod test on nan*
d = nan "*"
strtod test on nan()*
d = nan "*"
sscanf return value: 3
a = nan
b = nan
c = 1
fscanf return value: 3
a = nan
b = nan
c = 1

(as obtained with MacOS X 12.6 and Android 13).
Comment 1 Vincent Lefèvre 2023-07-18 11:55:54 UTC
Note that if the string starts with "nan(" but does not match nan(n-char-sequence_opt), then scanf must reject the conversion (after reading the longest prefix).

Examples:
* "nan(foo" (no closing parenthesis)
* "nan(a b)" (the space is not valid in n-char-sequence)

Currently it doesn't, because it stops at "nan" (it does not read the longest prefix). These cases are similar to issues mentioned in bug 12701, but currently this is not the same bug.
Comment 2 Carlos O'Donell 2024-04-30 16:42:25 UTC
I agree this does look like a conformance issue with the scanf family of functions using __vfscanf_internal() implemetnation.
Comment 3 Avinal Kumar 2024-07-19 08:40:49 UTC
I am trying to debug the issue. I have an additional question. For the examples, strtod returns:

* "nan(foo"     : nan "(foo" 
* "nan(foo bar) : nan "(foo bar)"

And sscanf should return conversion error for both of these cases, effectively no output, am I right?
Comment 4 Vincent Lefèvre 2024-07-19 13:59:53 UTC
(In reply to Avinal Kumar from comment #3)
> And sscanf should return conversion error for both of these cases,
> effectively no output, am I right?

Yes, a conversion error after reading the longest prefix.
Comment 5 Vincent Lefèvre 2024-07-19 14:20:52 UTC
And for fscanf, footnote 289 of ISO C17 says: "fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf."

The bug on the general behavior of parsing numbers is bug 12701.

So, to completely fix the glibc behavior on NaN strings, both this bug 30647 and bug 12701 need to be fixed.
Comment 6 Avinal Kumar 2024-10-18 11:32:20 UTC
I have submitted a patch to fix this bug along with tests, please take a look: https://sourceware.org/pipermail/libc-alpha/2024-October/160708.html
Comment 7 Sourceware Commits 2024-10-25 18:10:53 UTC
The master branch has been updated by Adhemerval Zanella <azanella@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=04e8698fcca7d1e932bc54f5b60e1bbce2e87601

commit 04e8698fcca7d1e932bc54f5b60e1bbce2e87601
Author: Avinal Kumar <avinal.xlvii@gmail.com>
Date:   Fri Oct 25 15:48:27 2024 +0530

    stdio-common: Fix scanf parsing for NaN types [BZ #30647]
    
    The scanf family of functions like sscanf and fscanf currently
    ignore nan() and nan(n-char-sequence).  This happens because
    __vfscanf_internal only checks for 'nan'.
    
    This commit adds support for all valid nan types i.e.  nan, nan()
    and nan(n-char-sequence), where n-char-sequence can be
    [a-zA-Z0-9_]+, thus fixing the bug 30647.  Any other representation
    of NaN should result in conversion error.
    
    New tests are also added to verify the correct parsing of NaN types for
    float, double and long double formats.
    
    Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 8 Adhemerval Zanella 2024-10-25 18:11:20 UTC
Fixed on 2.41.