This is the mail archive of the
pthreads-win32@sourceware.org
mailing list for the pthreas-win32 project.
Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
Sébastien Kunz-Jacques wrote:
Ramiro Polla a écrit :
[...]
Imagine if someone wants to use that ATLAS library but instead of
starting a new thread it wants to call directly the function that
needs SSE (no I haven't checked if it is possible in this case but it
could happen theoretically). And imagine that someone is using MSVC++
to call that function. MSVC++ only aligns to 4-byte (and again it is
valid). That function would also crash, independent of your patch.
So in your specific case I think it is the ATLAS functions that should
be aligned (= it would also help to use the library with other
compilers).
[...]
Actually I have tried calling ATLAS from MSVC, and it (appears to) work.
I suspect that ATLAS interface functions realign stack already, but I
didn't check this (I am going to ask the ATLAS maintainer about this).
The problem that made ATLAS crash without the above fix is that some
internal ATLAS functions get started through pthreads, and these ones
definitely do not realign the stack.
Then I suspect it is only these ones that should need force_align.
[...]
>> Your patch can also be seen as a way to always sufficiently align the
>> stack so that any thread started by pthreads-win32 is ok for SSE
>> instructions (the same way glibc does I think). In that case I don't
>> have a strong opinion about it. The overhead really is negligible.
>> Starting the thread takes much longer.
[...]
Regarding your last comment, do you imply that the stack realignment is
slow? from disassemblies I saw, it stores %esp in another register,
aligns esp (andl $-16, %esp), and restores it in the function
epilogue. The main performance penalty therefore occurs because one
register is used, and this is a reason to do the alignment in a function
like threadStart instead of the called function, if the latter does some
register-intensive task.
I didn't express myself very well then. I meant to say: "The overhead
really is negligible. Starting the thread takes much longer, so the
overhead in aligning the stack gets hidden away in the delay to start
the thread".
Ramiro Polla
- References:
- pthreads-win32 2.8.0, stack alignment, and SSE code
- From: Sébastien Kunz-Jacques
- Re: pthreads-win32 2.8.0, stack alignment, and SSE code
- Re: pthreads-win32 2.8.0, stack alignment, and SSE code
- From: Sébastien Kunz-Jacques
- Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
- Re: pthreads-win32 2.8.0, stack alignment, and SSE code
- From: Sébastien Kunz-Jacques