This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: New optimized string routines for Intel and alignment of stack.
- From: Florian Weimer <fweimer at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Tue, 7 Jun 2016 11:52:04 +0200
- Subject: Re: New optimized string routines for Intel and alignment of stack.
- Authentication-results: sourceware.org; auth=none
- References: <57566200 dot 2040203 at redhat dot com>
On 06/07/2016 07:56 AM, Carlos O'Donell wrote:
H.J.,
We have had several users that have built legacy applications
for 32-bit x86 with stack alignment that does not match the
ABI.
Let's say the GNU project broke the i386 ABI, which is more accurate.
The stack pointer alignment requirement is a recent change.
In all of these cases it has to do with the application
having been compiled with -falign-stack=assume-4-byte which
violates the ABI, usually with icc. However, if you're careful
it all just works.
It will get worse with increased vectorization and GCC 6. We already
saw this on x86_64 with the non-compliant malloc in tcsh, where GCC 6
used vector instructions to copy a struct dirstream object. I assume
this could easily happen with any stack-to-stack copy with SSE2 enabled.
Currently, GCC does not seem to exploit the fact that it knows the
alignment of stack objects. I played with this:
struct fields
{
double a, b;
};
struct fields get (void);
void put (struct fields *, struct fields *);
void
copy (void)
{
struct fields f1 = get ();
struct fields f2 = f1;
put (&f1, &f2);
}
And: gcc -m32 -O3 -msse2 -march=westmere -mtune=westmere -o- -S
stack-align.c
I expected to see an SSE load/store for the copy, but that's not what I got.
I think we need to decide if we want to roll back the ABI change before
GCC learns about this optimization because eventually, it will not just
be a matter of string routines. Any glibc code optimized for 32-bit x86
CPUs with SSE2 enabled could be affected.
Florian