Bug 19391 - strnlen invokes UB by adding maxlen to str
Summary: strnlen invokes UB by adding maxlen to str
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: string (show other bugs)
Version: 2.24
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-21 22:51 UTC by Pascal Cuoq
Modified: 2015-12-30 10:33 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pascal Cuoq 2015-12-21 22:51:06 UTC
Consider the function strnlen:

size_t strnlen (const char *str, size_t maxlen);

The POSIX standard does not mandate that maxlen bytes are valid to access from the pointer str:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/strlen.html

The maxlen argument is used to limit the number of chars accessed (and the length returned) but there is no constraint that all bytes between str + 0 and  str + maxlen - 1  are part of a same object.

When maxlen is larger than the number of bytes that are part of an object including str, the addition str + maxlen invokes undefined behavior:

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strnlen.c;h=d2bb843fddbf93eebb857cd0896cb3441bafa431;hb=HEAD#l36

The comparison end_ptr < str is nonsensical: it is always false when the pointer arithmetic str + maxlen is defined. An optimizing compiler is allowed to treat this expression as false:

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strnlen.c;h=d2bb843fddbf93eebb857cd0896cb3441bafa431;hb=HEAD#l43

Glibc is only intended to be compiled with GCC. Unfortunately, it is GCC that made headlines in 2008 for optimizing “end_ptr < str”-type pointer overflow checks to false:

https://lwn.net/Articles/278137/
Comment 1 Florian Weimer 2015-12-23 16:21:17 UTC
Isn't strnlen (p, -1) equivalent to strlen (p)?  It's difficult to tell from the specification.
Comment 2 Pascal Cuoq 2015-12-28 14:05:26 UTC
Florian,

my interpretation of the standards is that in all the strn* functions, as well as memchr (as an explicit exception among the mem* functions), the size argument limits the number of characters read but is allowed to be arbitrarily larger than the number of characters that would be valid to read (if reading stops because '\0' or the searched character is found).

In other words, yes, I think that strnlen (p, -1) should not invoke UB and be equivalent to strlen (p).

In other words, this report is part of the same wave as https://sourceware.org/bugzilla/show_bug.cgi?id=19390 and https://sourceware.org/bugzilla/show_bug.cgi?id=19387 , except that those describe concrete problems in assembly versions whereas this one is a theoretical problem in the C version.

FWIW, the idea of looking at very large size arguments for standard functions started with memchr, for which they are very explicitly allowed in POSIX and C11, and it was this remark of Jed Davis that incited me to look at the strn* functions:

https://twitter.com/xlerb/status/678963983756333056
Comment 3 Alexander Cherepanov 2015-12-28 14:18:05 UTC
(In reply to Florian Weimer from comment #1)
> Isn't strnlen (p, -1) equivalent to strlen (p)?  It's difficult to tell from
> the specification.

1. I'm not sure what you mean. IMHO, yes, according to the specifications[1][2], strnlen(p, -1) is equivalent to strlen(p) (modulo their behavior on an array of SIZE_MAX non-NUL bytes and if objects can't be larger than SIZE_MAX bytes). There is no explicit language requiring accesses to all bytes up to maxlen to be valid hence an implementation can't assume it. And IIUC this bug report filed based on this POV.

Is there a chance that strnlen("abc", -1) and, by generalization, strncmp("ab", "abc", 5) are considered invalid?

Anyway, strnlen(p, -1) is in glibc tests[3].

The glibc manual is indeed silent on some aspects of the strn* functions -- I files pr19407 (and pr19406 while at it). But it's not applicable to strnlen.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/strlen.html#tag_16_577_03
[2] https://www.gnu.org/software/libc/manual/html_node/String-Length.html#index-strnlen
[3] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/tst-strlen.c;h=8376831de559c12057a7a432dc91953fb4e9624f;hb=HEAD#l43

2. No matter what, a comparison of the form "buffer + len < buffer" (where buffer is a pointer and len is a non-negative integer) is always a sign of problems. According to the C standard it's either false or UB. If the code does something when the result of the comparison is true it means two things:
1) such a case is expected;
2) the code to handle the case is invalid (invokes UB). 

The straight comparison "buffer + len < buffer" is optimized into "(ptrdiff_t)len < 0" right now by gcc (and by clang) which is not what a programmer intended. The variant with an intermediary variable -- "end_ptr = buffer + len; if (end_ptr < buffer)..." -- seems not to be optimized by current gcc (but is optimized by clang). To depend on this is not the safest thing to do. Please note that the current tests with n=SIZE_MAX will not catch this optimization (because (ptrdiff_t)SIZE_MAX < 0).

Overall, very handy and robust way to find invalid pointer arithmetic. But, to iterate, the comparison itself is only a sign of it and it's not enough to fix the comparison alone.

3. I guess (and hope) that, in the end, this is considered a bug by the glibc community even if a very low priority one (but who knows, given that "the GNU C Library is implemented in GNU C, not in ISO C"). So I went and filed several more similar bugs: pr19411, pr19412, pr19413.
Comment 4 Alexander Cherepanov 2015-12-30 10:33:48 UTC
On 2015-12-28 17:18, cherepan at mccme dot ru wrote:
> (In reply to Florian Weimer from comment #1)
>> Isn't strnlen (p, -1) equivalent to strlen (p)?  It's difficult to tell from
>> the specification.
>
> 1. I'm not sure what you mean.

Ok, I think I found the reason of the confusion. I was looking at glibc 
manual as available on https://www.gnu.org/ but manual/string.texi was 
modified recently. In particular, the description of strnlen was changed 
by [1]. The current description[2] indeed talks about "the array @var{s} 
of size @var{maxlen}". IMHO this is plainly wrong (and contradicts the 
use of strnlen in glibc itself).

[1] 
https://sourceware.org/git/?p=glibc.git;a=commit;h=2cc4b9cce51643ec299e97450ccde4deeecfb083
[2] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/string.texi;h=016fd0b9a05f3126a1594e942c5e6da0669d0410;hb=HEAD#l313