This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Undefined behavior in glibc -- was: Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.


On Wed, 3 Feb 2016, Alexander Cherepanov wrote:

> But those parts which cannot be implemented in ISO C would be implemented in
> asm? AIUI there is no much magic in compiling C parts of glibc so they should

No, not necessarily.  Written in GNU C with a good understanding of what 
transformations are actually possible given the code visible to the 
compiler and the rules of GNU C.

> > If something involves undefined behavior in GNU C,
> 
> GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
> described in [1] + GCC extensions[2], right?

Plus undocumented features and a good understanding of aspects of how C 
relates to the underlying machines (that is, you cannot assume e.g. that a 
C addition corresponds to a machine add instruction, but you can assume 
that memory is made of pages and that if it's not visible to the compiler 
how memory was allocated, having particular access to any byte in a page 
means having such access to all bytes in that page, subject to avoiding 
data races on write).  Cf 
<https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html> 
(but the non-ISO aspects of GNU C as used in low-level system software 
such as glibc and the Linux kernel should be assumed to be much more 
wide-ranging than the issues discussed there).

> 1) strlen and other string functions read memory by chunks which is aliasing
> violation;
> 2) strlen accesses the OOB memory.

Both of those fall under deliberate use of separate compilation, i.e. it's 
not visible to the compiler what the original effective type of the memory 
was or its original size, so a whole page can be accessed provided the 
types used don't involve aliasing violations within the source visible to 
the compiler.

> > Where glibc code relies on separate compilation to avoid undefined
> > behavior, this is not a bug; use of LTO on glibc's own code is not
> > supported.  For example, where functions with different C types but
> > compatible ABIs are aliased, or a function is otherwise defined with a
> > different type from that used when it is called.  Similarly, asm may be
> > used to limit code movement, and that could e.g. mean that aliasing is not
> > undefined behavior where it would otherwise be undefined.
> 
> Not sure what you mean. Is it specific to calling functions or it's supposed
> to cover the case of strlen too?

It covers strlen.  strlen is well-defined in isolation, so can be presumed 
to be compiled to an ABI-conforming object file, and such an object file, 
if it would work for programs where it would be valid for that strlen 
implementation to be part of the program (under a different name), would 
also work for other programs that are valid with ISO C strlen.

> I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
> there?

The todo list basically has ideas that one person once thought might be 
useful.  In general, you can't assume that things there have consensus, 
and nor do you need consensus to put something there.  I think 
facilitating LTO of the standard C library is a very hard whole-system 
issue, not something that could be considered for glibc in isolation; 
you'd probably need several different forms of source annotation that act 
as barriers to particular forms of LTO, saying that calls to a function 
can only be optimized with reference to its semantics and not with 
reference to the contents of a particular implementation of it.

> Well, before I go and file the bugs, perhaps it's worth skimming through the
> list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
> ISO C in this topic. Should the issues be grouped by type, by file, by subdir
> or in some other way?

I think such cases should generally be considered bugs except for limited 
cases such as string functions that deliberately work in terms of pages or 
aligned units smaller than pages, and so may go slightly outside the 
original object (before or after) but obviously cannot wrap around the 
address space to affect the results of comparisons.

The natural division is based on whether it might make sense to fix two 
issues separately - so any issues in parts of glibc for which different 
people might have expertise, or that differ in how clear it is that there 
is a problem or how clear it is what the correct fix would be, should be 
filed separately.  Of course, it's even better if fixes (following the 
contribution checklist) for such issues are submitted as well.

> Capping pointers at the end of the address space
> ------------------------------------------------
> 
> Some functions in C11 have a parameter which limits a number of characters
> written. IMHO it's clear that C11 doesn't intend this parameter to be a real
> size of output buffer and, thus, it could validly be SIZE_MAX. But we have
> seen differing opinions in the recent thread about strlcpy. Perhaps you can
> clear the question.

Given the dispute in the Austin Group over snprintf, a C11 DR is clearly 
needed.

> Structs at NULL
> ---------------
> 
> Plausible optimization: this is unconditional UB, hence mark the code as
> unreachable and remove everything around.

I think such offsetof-type code should be considered valid GNU C, although 
it would still be better to use offsetof.  At least some are inside 
"#ifndef offsetof" (some of these examples are code shared with gnulib or 
gettext) and so obviously not bugs in glibc (given that <stddef.h> is 
included).

> Pointers used after free (without dereferencing)
> ------------------------------------------------
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
> (IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
> be swapped.)
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41

I'd say such comparisons should be avoided, but that's more a cleanup than 
a bug fix.

> Memory allocations without check
> --------------------------------
> 
> Not UB but while we are at it...
> 
> AIUI the policy in glibc is to check results of malloc no matter how small the
> requested amount of memory is. Right?

Yes.

> So mallocs like in https://sourceware.org/bugzilla/show_bug.cgi?id=19416
> should be reported?

Yes.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]