This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Potential issue with strstr on x86 with sse4.2 in glibc-2.18


On Mon, Aug 19, 2013 at 11:34:30PM -0400, Rich Felker wrote:
> > I would have assumed that it is gcc's responsibility to ensure alignment
> > if it decides to use SSE and our responsibility if our functions
> > explicitly use SSE.   Is that being too naive?
> 
> If by "explicitly use SSE" you mean using the intrinsics, alignment
> _should_ be GCC's responsibility just as if GCC had chosen to use SSE
> itself. However I don't know if the reality is like this. The only way
> I can see that GCC would not be expected to take care of alignment is
> when the SSE code resides in inline assembly.
> 
> Actually, it's not really the use of SSE, but the use of automatic
> objects with 16-byte-alignment requirements that should cause GCC to
> align the stack. For example, if you have a char array declared with
> __attribute__((aligned(16))) with the intent to pass it to an external
> function that uses SSE, GCC needs to ensure its alignment.
> 
> I'm unclear on what GCC's capabilities are in this area; that's why I
> asked.

I just did some tests, and it seems that with the above options, GCC
generates prologue to realign the stack in all non-leaf functions.
This is definitely unacceptable overhead for global usage.

What may be viable is globally using -mpreferred-stack-boundary=2
along with the force_align_arg_pointer attribute on individual
functions that need to make callbacks to application code. From my
experiments, it seems that when -mpreferred-stack-boundary=2 is in
use, GCC will generate code to align the stack only in functions whose
automatic objects need alignment greater than 4-byte.

Unfortunately, force_align_arg_pointer seeme to be a no-op with
-mpreferred-stack-boundary=2, so I think to make a method like this
work, it would need to be combined with the attributes to override
optimization/misc options for a single function.

This all looks like a big mess, and it's all GCC's fault. With such a
nasty incompatible ABI change, they should have added a minimally
invasive way to build code that interoperates: not assuming the stack
pointer is aligned on entry, but preserving the alignment on calls
(i.e. keeping it the same mod 16 as it was on entry) so that both of
these cases work:

1. Caller is using old 4-byte alignment.

2. Caller is using 16-byte alignment and needs its callbacks to be
   called with 16-byte alignment.

At present, the only way to get GCC to support both of these usages
seems to be imposing LARGE prologue overhead on every single function.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]