This is the mail archive of the
mailing list for the glibc project.
Re: Undefined behavior in glibc
- From: Alexander Cherepanov <ch3root at openwall dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: Dwight Guth <dwight dot guth at runtimeverification dot com>, libc-alpha at sourceware dot org
- Date: Fri, 5 Feb 2016 21:24:35 +0300
- Subject: Re: Undefined behavior in glibc
- Authentication-results: sourceware.org; auth=none
- References: <27c31890079f41775175b94a4abedb0c dot squirrel at server316 dot webhostingpad dot com> <alpine dot DEB dot 2 dot 10 dot 1601282115100 dot 6102 at digraph dot polyomino dot org dot uk> <CACLXh_1_dQ5D1QrKQN0pVPzt001WmS4BgwcKZkULK8XnbEMb+g at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1601282246340 dot 6102 at digraph dot polyomino dot org dot uk> <CACLXh_3rAudocTEbtZQpVoDcWgm_ww3KcX6j9XCkSRTZVPTUMg at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1601282251350 dot 6102 at digraph dot polyomino dot org dot uk> <20160128225845 dot GE14840 at vapier dot lan> <alpine dot DEB dot 2 dot 10 dot 1601282311351 dot 6102 at digraph dot polyomino dot org dot uk> <20160128234356 dot GH14840 at vapier dot lan> <56AAB6CE dot 8060101 at openwall dot com> <20160129005816 dot GK14840 at vapier dot lan> <56AC16C8 dot 4030202 at openwall dot com> <alpine dot DEB dot 2 dot 10 dot 1601311559210 dot 31071 at digraph dot polyomino dot org dot uk> <56B1F294 dot 5020105 at openwall dot com> <alpine dot DEB dot 2 dot 10 dot 1602031234280 dot 31480 at digraph dot polyomino dot org dot uk>
On 2016-02-03 16:07, Joseph Myers wrote:
If something involves undefined behavior in GNU C,
GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
described in  + GCC extensions, right?
Plus undocumented features
The moment you say "undocumented" everything falls apart. One cannot be
sure whether some observed behavior is an undocumented feature or
something that will be broken by the next gcc release.
and a good understanding of aspects of how C
relates to the underlying machines (that is, you cannot assume e.g. that a
C addition corresponds to a machine add instruction, but you can assume
that memory is made of pages and that if it's not visible to the compiler
how memory was allocated, having particular access to any byte in a page
means having such access to all bytes in that page, subject to avoiding
data races on write).
1. This rules out all modern hardening/debugging solutions like
Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant
2. You seem to be concerned with crashes only but there are other
problems too, e.g., info leaks (think Heartbleed). Saying that the code
is free to access any accessible memory means that one cannot catch info
(but the non-ISO aspects of GNU C as used in low-level system software
such as glibc and the Linux kernel should be assumed to be much more
wide-ranging than the issues discussed there).
Yes, I have seen it. I have even linked to your reply to this survey in
my last email.
1) strlen and other string functions read memory by chunks which is aliasing
2) strlen accesses the OOB memory.
Both of those fall under deliberate use of separate compilation, i.e. it's
not visible to the compiler what the original effective type of the memory
was or its original size, so a whole page can be accessed provided the
types used don't involve aliasing violations within the source visible to
I thought that this falls under "there is no plausible optimization [due
to the info not available to a compiler]". Well, ok.
Then let's look at strlen (&co.) from another angle. It starts accessing
its parameter as chars and then continues accessing it as longs, like this:
char *p = ...;
Is there a situation where this is valid in GNU C? For ISO C the
question seems quite moot but I think Question 16 in DR 017 plus DR 260
say that it's not valid. Maybe there are no restrictions in GNU C on
intra-object pointer arithmetic / accesses at all (apart from aliasing
I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
The todo list basically has ideas that one person once thought might be
useful. In general, you can't assume that things there have consensus,
and nor do you need consensus to put something there. I think
facilitating LTO of the standard C library is a very hard whole-system
issue, not something that could be considered for glibc in isolation;
you'd probably need several different forms of source annotation that act
as barriers to particular forms of LTO, saying that calls to a function
can only be optimized with reference to its semantics and not with
reference to the contents of a particular implementation of it.
Isn't noinline+noclone enough for this right now?
Well, before I go and file the bugs, perhaps it's worth skimming through the
list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
ISO C in this topic. Should the issues be grouped by type, by file, by subdir
or in some other way?
I think such cases should generally be considered bugs except for limited
cases such as string functions that deliberately work in terms of pages or
aligned units smaller than pages, and so may go slightly outside the
original object (before or after) but obviously cannot wrap around the
address space to affect the results of comparisons.
The natural division is based on whether it might make sense to fix two
issues separately - so any issues in parts of glibc for which different
people might have expertise, or that differ in how clear it is that there
is a problem or how clear it is what the correct fix would be, should be
filed separately. Of course, it's even better if fixes (following the
contribution checklist) for such issues are submitted as well.
Capping pointers at the end of the address space
Some functions in C11 have a parameter which limits a number of characters
written. IMHO it's clear that C11 doesn't intend this parameter to be a real
size of output buffer and, thus, it could validly be SIZE_MAX. But we have
seen differing opinions in the recent thread about strlcpy. Perhaps you can
clear the question.
Given the dispute in the Austin Group over snprintf, a C11 DR is clearly
Structs at NULL
Plausible optimization: this is unconditional UB, hence mark the code as
unreachable and remove everything around.
I think such offsetof-type code should be considered valid GNU C, although
it would still be better to use offsetof. At least some are inside
"#ifndef offsetof" (some of these examples are code shared with gnulib or
gettext) and so obviously not bugs in glibc (given that <stddef.h> is
Yeah, and I managed to include some that are used to get a size of a
field instead of an offset and so are fine even in ISO C. Have to filter
them out in the future.
Pointers used after free (without dereferencing)
(IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
FTR: this one seems to be a false. fp->_IO_read_base could be accessed
on line 1063 after a free on line 1020 but it's set inside _IO_setg on
I'd say such comparisons should be avoided, but that's more a cleanup than
a bug fix.
BTW is there a formal method to tell test files from important files? I
see .../tst-* and .../test-* files but io/pwd.c looks like a test too.