glib'c scanf function incorrectly handles cases where it reads a sequence of characters which are an initial subsequence of a matching sequence, but not actually a matching sequence, for the conversion specifier. Examples include:
sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is correct) and leaves no way for the caller to know buf is unfilled.
sscanf("0xz", "%x%c", &x, &c) returns 2 instead of 0.
sscanf("1.0e+!", "%f%c", &x, &c) returns 2 instead of 0.
All of these cases are correctly handled.
scanf is badly designed, just don't use it if you cannot live with these results.
They are not correctly handled. Please refer to C99, 126.96.36.199, paragraph 9, which defines an input item as:
"the longest sequence of input characters which does not exceed any speciﬁed ﬁeld width and which is, or is a preﬁx of, a matching input sequence"
Paragraph 10 then reads:
"If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure."
Clearly in the case of sscanf("0xz", "%x%c", &x, &c), the first "input item" is "0x", and it is not a matching sequence for the %x conversion (see the specification of strtoul, in terms of which scanf %x is specified), so the result must be a matching failure.
If you're going to wrongly mark this bug as "RESOLVED", at least mark it "WONTFIX" rather than "INVALID" and acknowledge that it's a bug that you're unwilling to fix, and that glibc is intentionally non-conformant in this matter.
They are handled correctly. You don't understand the limit of push backs.
Yes I understand pushbacks.
Scanning "0xz" for %x results in an input item of "0x" with "z" pushed back into the unread buffer. The bug has nothing to do with pushbacks, because the right data is pushed back. The bug is that a non-matching input item is treated as a match rather than a matching error.
Perhaps you thought I was saying the input item should be "0", successfully converted, with "x" as the next unread character in the buffer. Of course this is wrong and I do not believe such a thing.
Perhaps you should try reading the actual language standard rather than assuming you're right.
(In reply to comment #4)
> Yes I understand pushbacks.
You apparently don't. This is no place to get a free education.
Don't reopen the bug, there will be no change.
OK if you insist that I don't reopen it, I'm fixing the resolution to "WONTFIX".
Reopening since I found a statement from an official source (Fred J. Tydeman, Vice-char of PL22.11) that the glibc behavior is incorrect:
Sorry I don't have a better newsgroup archive link.
What on earth are you talking about. Fred said exactly the same: 0xz causes the z to be rejected for the %x and therefore used for the %c. Stop wasting my time.
Apparently you only read the first quoted paragraph and not the second:
> > - the input item "0x" is not a matching sequence, so the execution of
> > the whole directive fails;
What part of "the execution of the whole directive fails" are you not understanding? When a directive fails, scanf stops and returns the number of directives successfully converted and stored. This number is zero, not two. The %c is never processed. glibc is wrong. Please fix it.
If you insist on keeping compatibility with hypothetical existing binaries that depend on the wrong behavior, that's what glibc has symbol versioning for...
The behavior is correct and wanted. Now stop wasting people's time.
Fred Tydeman (vice chair of PL22.11/J11) has stated as clearly and directly that the current glibc behavior is NOT correct. Whether it's wanted is a more subjective question, but I have not seen anyone but yourself who wants scanf to behave incorrectly in this manner. Please fix this bug.
Ping. Would somebody other than Mr. Drepper be willing to review this bug report?
This bug report appears to be correct, and the erroneous behavior described still present with current glibc (tested x86_64).
Created attachment 6345 [details]
scanf test cases
I recently wrote a set of test cases for verifying my scanf implementation, and running it against glibc reproduces A LOT of instances of this bug... See attached test program.
I think this bug report is correct, at least in relation to the '%x/0xz' sample.
There's a big difference between an input item, which *may* be an initial subset of a properly scanned directive, and the *properly scanned directive* itself.
Pushback controls how far you can back up the "input stream pointer" and is the reason why scanf is usually not used by professionals, who prefer a fgets/sscanf combo so they can bak up to the start of the line themselves. However, the pushback is only relevant here in that context. The failure of '0x' when scanning '%x' will not be able to push back all the way to the '0' because of this limitation.
The function call sscanf ("a0xz", "%c%x%c") should return 1, not 3.
The controlling part of the standard is the bit dealing with the 'x' directive itself:
Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument.
The strtoul stuff states:
If the value of base is zero, the expected form of the subject sequence is that of an integer constant as described in 188.8.131.52, optionally preceded by a plus or minus sign, but not including an integer suffix. If the value of base is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign, but not including an integer suffix. The letters from a (or A) through z (or Z) are ascribed the values 10 through 35; only letters and digits whose ascribed values are less than that of base are permitted. If the value of base is 16, the characters 0x or 0X may optionally precede the sequence of letters and digits, following the sign if present.
The controlling part there would be "a sequence of letters and digits representing an integer" - you may argue that such a sequence may consist of zero characters but I don't think anyone in their right mind would suggest that definition represented an integer. In any case, the '0x' string fails on strtoul:
int rc = 42;
rc = strtoul ("0x", &x, 16);
printf ("%d [%s]/n", rc, x);
So even though rc is set to 0, the fact that the pointer points to the first bad character means that the '0x' itself is not a valid hex number.
Putting in '0x5' as the string gives you:
so that the first bad character is the end of the string (ie, there WERE no bad characters).
(In reply to Rich Felker from comment #0)
> sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is
> correct) and leaves no way for the caller to know buf is unfilled.
So this is an information leak.
*** Bug 12437 has been marked as a duplicate of this bug. ***
*** Bug 22672 has been marked as a duplicate of this bug. ***
*** Bug 23274 has been marked as a duplicate of this bug. ***
*** Bug 1765 has been marked as a duplicate of this bug. ***