12701 – scanf accepts non-matching input

Bug 12701 - scanf accepts non-matching input

Summary: scanf accepts non-matching input

Status:	REOPENED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	stdio (show other bugs)
Version:	unspecified

Importance:	P2 critical
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (4):	1765 12437 22672 23274 (view as bug list)
Depends on:
Blocks:

Reported:	2011-04-25 15:13 UTC by Rich Felker
Modified:	2024-10-29 16:50 UTC (History)
CC List:	14 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security+

Attachments
scanf test cases (2.05 KB, text/plain) 2012-04-18 05:21 UTC, Rich Felker	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rich Felker 2011-04-25 15:13:20 UTC

glib'c scanf function incorrectly handles cases where it reads a sequence of characters which are an initial subsequence of a matching sequence, but not actually a matching sequence, for the conversion specifier. Examples include:

sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is correct) and leaves no way for the caller to know buf[3] is unfilled.

sscanf("0xz", "%x%c", &x, &c) returns 2 instead of 0.

sscanf("1.0e+!", "%f%c", &x, &c) returns 2 instead of 0.

etc.

Comment 1 Ulrich Drepper 2011-05-02 01:40:09 UTC

All of these cases are correctly handled.

scanf is badly designed, just don't use it if you cannot live with these results.

Comment 2 Rich Felker 2011-05-02 02:35:40 UTC

They are not correctly handled. Please refer to C99, 7.19.6.2, paragraph 9, which defines an input item as:

"the longest sequence of input characters which does not exceed any speciﬁed ﬁeld width and which is, or is a preﬁx of, a matching input sequence"

Paragraph 10 then reads:

"If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure."

Clearly in the case of sscanf("0xz", "%x%c", &x, &c), the first "input item" is "0x", and it is not a matching sequence for the %x conversion (see the specification of strtoul, in terms of which scanf %x is specified), so the result must be a matching failure.

If you're going to wrongly mark this bug as "RESOLVED", at least mark it "WONTFIX" rather than "INVALID" and acknowledge that it's a bug that you're unwilling to fix, and that glibc is intentionally non-conformant in this matter.

Comment 3 Ulrich Drepper 2011-05-03 00:30:31 UTC

They are handled correctly.  You don't understand the limit of push backs.

Comment 4 Rich Felker 2011-05-03 00:40:26 UTC

Yes I understand pushbacks.

Scanning "0xz" for %x results in an input item of "0x" with "z" pushed back into the unread buffer. The bug has nothing to do with pushbacks, because the right data is pushed back. The bug is that a non-matching input item is treated as a match rather than a matching error.

Perhaps you thought I was saying the input item should be "0", successfully converted, with "x" as the next unread character in the buffer. Of course this is wrong and I do not believe such a thing.

Perhaps you should try reading the actual language standard rather than assuming you're right.

Comment 5 Ulrich Drepper 2011-05-03 01:13:29 UTC

(In reply to comment #4)
> Yes I understand pushbacks.

You apparently don't.   This is no place to get a free education.

Don't reopen the bug, there will be no change.

Comment 6 Rich Felker 2011-05-03 02:22:50 UTC

OK if you insist that I don't reopen it, I'm fixing the resolution to "WONTFIX".

Comment 7 Rich Felker 2011-09-25 04:42:31 UTC

Reopening since I found a statement from an official source (Fred J. Tydeman, Vice-char of PL22.11) that the glibc behavior is incorrect:

http://newsgroups.derkeiler.com/Archive/Comp/comp.std.c/2009-09/msg00045.html

Sorry I don't have a better newsgroup archive link.

Comment 8 Ulrich Drepper 2011-10-29 17:14:29 UTC

What on earth are you talking about.  Fred said exactly the same: 0xz causes the z to be rejected for the %x and therefore used for the %c.  Stop wasting my time.

Comment 9 Rich Felker 2011-10-29 21:24:08 UTC

Apparently you only read the first quoted paragraph and not the second:

> > - the input item "0x" is not a matching sequence, so the execution of
> > the whole directive fails;
> 
> Correct

What part of "the execution of the whole directive fails" are you not understanding? When a directive fails, scanf stops and returns the number of directives successfully converted and stored. This number is zero, not two. The %c is never processed. glibc is wrong. Please fix it.

If you insist on keeping compatibility with hypothetical existing binaries that depend on the wrong behavior, that's what glibc has symbol versioning for...

Comment 10 Ulrich Drepper 2011-10-29 21:36:38 UTC

The behavior is correct and wanted.  Now stop wasting people's time.

Comment 11 Rich Felker 2011-10-30 05:42:51 UTC

Fred Tydeman (vice chair of PL22.11/J11) has stated as clearly and directly that the current glibc behavior is NOT correct. Whether it's wanted is a more subjective question, but I have not seen anyone but yourself who wants scanf to behave incorrectly in this manner. Please fix this bug.

Comment 12 Rich Felker 2012-03-17 20:39:24 UTC

Ping. Would somebody other than Mr. Drepper be willing to review this bug report?

Comment 13 Joseph Myers 2012-03-18 14:28:19 UTC

This bug report appears to be correct, and the erroneous behavior described still present with current glibc (tested x86_64).

Comment 14 Rich Felker 2012-04-18 05:21:24 UTC

Created attachment 6345 [details]
scanf test cases

I recently wrote a set of test cases for verifying my scanf implementation, and running it against glibc reproduces A LOT of instances of this bug... See attached test program.

Comment 15 paxdiablo 2012-11-26 08:26:04 UTC

I think this bug report is correct, at least in relation to the '%x/0xz' sample.

There's a big difference between an input item, which *may* be an initial subset of a properly scanned directive, and the *properly scanned directive* itself.

Pushback controls how far you can back up the "input stream pointer" and is the reason why scanf is usually not used by professionals, who prefer a fgets/sscanf combo so they can bak up to the start of the line themselves. However, the pushback is only relevant here in that context. The failure of '0x' when scanning '%x' will not be able to push back all the way to the '0' because of this limitation.

The function call sscanf ("a0xz", "%c%x%c") should return 1, not 3.

The controlling part of the standard is the bit dealing with the 'x' directive itself:

=====
Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument.
=====

The strtoul stuff states:

=====
If the value of base is zero, the expected form of the subject sequence is that of an integer constant as described in 6.4.4.1, optionally preceded by a plus or minus sign, but not including an integer suffix. If the value of base is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign, but not including an integer suffix. The letters from a (or A) through z (or Z) are ascribed the values 10 through 35; only letters and digits whose ascribed values are less than that of base are permitted. If the value of base is 16, the characters 0x or 0X may optionally precede the sequence of letters and digits, following the sign if present.
=====

The controlling part there would be "a sequence of letters and digits representing an integer" - you may argue that such a sequence may consist of zero characters but I don't think anyone in their right mind would suggest that definition represented an integer. In any case, the '0x' string fails on strtoul:

    char *x;
    int rc = 42;
    rc = strtoul ("0x", &x, 16);
    printf ("%d [%s]/n", rc, x);
produces:

    0 [0x]
So even though rc is set to 0, the fact that the pointer points to the first bad character means that the '0x' itself is not a valid hex number.

Putting in '0x5' as the string gives you:

    5 []
so that the first bad character is the end of the string (ie, there WERE no bad characters).

Comment 16 Florian Weimer 2014-06-13 14:54:27 UTC

(In reply to Rich Felker from comment #0)

> sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is
> correct) and leaves no way for the caller to know buf[3] is unfilled.

So this is an information leak.

Comment 17 Florian Weimer 2014-06-27 14:00:25 UTC

*** Bug 12437 has been marked as a duplicate of this bug. ***

Comment 18 Florian Weimer 2018-01-04 12:17:34 UTC

*** Bug 22672 has been marked as a duplicate of this bug. ***

Comment 19 Joseph Myers 2018-06-12 11:15:17 UTC

*** Bug 23274 has been marked as a duplicate of this bug. ***

Comment 20 Joseph Myers 2018-06-12 11:16:01 UTC

*** Bug 1765 has been marked as a duplicate of this bug. ***

Comment 21 Mark winds 2021-02-10 19:30:14 UTC Comment hidden (spam)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.19/master has been updated
       via  10d268070a8aa9a878668e7f060e92ed668de146 (commit)
       via  c08e8bd0ef1d16d0139dbc80a976e2cbf2517f02 (commit)
      from  https://www.targetedwebtraffic.com/ 
  762aafec34478bcef01a16acf1959732ab8bb2b6 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

Comment 22 mark 2021-03-05 15:27:45 UTC Comment hidden (spam)

Yes I understand pushbacks.

Scanning "0xz" for %x results in an input item of "0x" with "z" pushed back into the unread buffer. The bug has nothing to do with pushbacks, because the right data is pushed back. The bug is that a non-matching input item is treated as a match rather than a matching error.

Perhaps you thought I was saying the input item should be "0", successfully converted, with "x" as the next unread character in the buffer. Of course this is wrong and I do not believe such a thing.
https://www.targetedwebtraffic.com/product-category/website-traffic/
Apparently you only read the first quoted paragraph and not the second:

> > - the input item "0x" is not a matching sequence, so the execution of
> > the whole directive fails;
> 
> Correct

Comment 23 namboru 2021-09-15 02:40:36 UTC Comment hidden (spam)

jasa seo https://seohandal.id/jasa-seo/

Comment 24 Vincent Lefèvre 2023-07-18 12:01:35 UTC

Note that scanf also accepts "nan(" while it shouldn't (because "nan()" is valid), but for a different reason. See bug 30647 for issues related to scanf with nan.

Comment 25 Vincent Lefèvre 2024-07-19 14:11:40 UTC

(In reply to Rich Felker from comment #7)
> Reopening since I found a statement from an official source (Fred J.
> Tydeman, Vice-char of PL22.11) that the glibc behavior is incorrect:
> 
> http://newsgroups.derkeiler.com/Archive/Comp/comp.std.c/2009-09/msg00045.html

It no longer exists, but here's another link:

https://comp.std.c.narkive.com/7JdevQ08/fscanf-strtol-and-the-parsing-of-numbers

Comment 26 Avinal Kumar 2024-10-29 07:55:06 UTC

I was able to fix scanf behavior on NaN in bug 30647 and have been looking into this issue for a few weeks now. The opinion seems divided, but I would like to get a final nod before I start working on it.

To put everything in simple words:

- the input should always match the given specifier that is:
 - For "%[width]specifier" the input must be wide enough, or it is a failure
 - For "%[x, other special specifier] where input has some extra prefix, i.e. 0x, it should fully match, 0x0 is valid 0x is not
- to support the width requirement the input "0x123" should fail on "%2x" specifier or is the format wrong?

Other questions:
- Would fixing this bug break existing applications that might be depending on this buggy behavior?

Comment 27 Florian Weimer 2024-10-29 08:19:39 UTC

(In reply to Avinal Kumar from comment #26)
> Other questions:
> - Would fixing this bug break existing applications that might be depending
> on this buggy behavior?

We already have scanf variants per C standard. One possibility is to make the change for recently added variants only, so that they impact recently built applications only (which are presumably more easily fixed than historic applications).

Comment 28 Andreas Schwab 2024-10-29 09:01:18 UTC

The field width is a maximum, not a minimum.  If the input stream ends before width characters have been read, those characters read so far become the input item.  If the width is 2 then the input item can only consist one or two characters.  If the input stream contains 0x123 and the directive is %2x then the input item is 0x and this becomes a matching failure.

Comment 29 Joseph Myers 2024-10-29 16:50:21 UTC

I think this is simply a bug that should be fixed as such (for all standard versions), not something applications are at all likely to be relying on.

(We'll need to add more strtol/scanf versions for C2Y because of 0o / 0O octal input, but I'd expect that, and any other incompatible changes in C2Y scanf in future, to be the only difference in how those new versions behave.)