[PATCH RFC] __builtin_dynamic_object_size with -D_FORTIFY_SOURCE=3

Jakub Jelinek jakub@redhat.com
Mon Nov 30 09:27:32 GMT 2020


On Mon, Nov 30, 2020 at 10:11:51AM +0100, Florian Weimer wrote:
> > For example at the moment the simple case of memcpy, memmove,
> > etc. where they're implemented with compiler builtins like
> > __builtin___memcpy_chk, clang is able to generate compact code that
> > passes the expression to __memcpy_chk or just generates a memcpy when
> > possible.  In the non-builtin cases though, where the size evaluates
> > to an expression, one may see patterns like:
> >
> > 	cmpq	$-1, %rbx
> > 	je	.LBB0_2
> > 	callq	fortified_chk
> > 	jmp	.LBB0_3
> > .LBB0_2:                                # %if.else
> > 	callq	fortified
> >
> > since the compiler isn't smart enough yet to reduce that condition.
> > This can be fixed (I'm working on that right now) in the simple case
> > of a direct comparison and thus work for _FORTIFY_SOURCE=3, but it may
> > be harder to evaluate for __builtin_dynamic_object_size in general
> > where the comparison happens indirectly.  This is also why
> > __builtin_dynamic_object_size isn't exactly a drop-in replacement for
> > __builtin_object_size in all cases.
> 
> I think this should be fixed in the compiler, and the level 3 should be
> dropped.  A compiler bug is not a good reason to change the external
> interface, especially if it is just a performance bug.  The 2 vs 3
> choice isn't something that's useful to developers.

What bug do you mean?  The fact that __builtin_object_size is required to be
a constant has been a fundamental requirement of the whole _FORTIFY_SOURCE
design.  Without that, it adds completely unbounded runtime overhead,
turning the program ultimately into yet another bounded pointers
implementation.  In the worst case, every pointer arithmetic will need to be
accomodated by tracking of its runtime length, using maximum/minimum,
saturated arithmetics etc.

Consider just for a start e.g.

#include <stdlib.h>
void foo (void *, void *, void *);

size_t
bar (size_t x, size_t y, size_t z, size_t w, size_t v)
{
  char *p = malloc (x + 15);
  char *q = calloc (y + 12, z + 31);
  char *r = w ? p : q;
  if (v > 23)
    r += v;
  foo (r, p, q);
  size_t ret = __builtin_dynamic_object_size (r, 0);
  free (p);
  free (q);
  return ret;
}

LLVM seems to handle this as adding
  size_t len1 = x + 15;
  size_t len2 = (y + 12) * (z + 31);
/* Note, no overflow checking here, though as calloc would return NULL
   perhaps it is not needed.  */
  size_t len3 = w ? len1 : len2;
  size_t len4 = v > 23 ? (len3 - 23 /* saturating */) : len3;
Now consider the involved pointer to be conditionally incremented in some
loop...

__builtin_object_size is only exact if 0 mode returns the same value as
2 mode (or 1 vs. 3), otherwise it is an upper bound,
__builtin_dynamic_object_size is bounded pointer implementation (for
selected pointers).

I think users should have a choice whether they want up to 2x slowdown in
their code or not.

	Jakub



More information about the Libc-alpha mailing list