This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Align the stack in __tls_get_addr [BZ #21609]
* Jakub Jelinek:
> On Fri, Jul 07, 2017 at 08:13:49PM +0300, Alexander Monakov wrote:
>> On Fri, 7 Jul 2017, Carlos O'Donell wrote:
>> > My apologies, when I wrote 'hot path' I was thinking of process startup
>> > where the first call to __tls_get_addr (for a given dtv entry) always goes
>> > through the slow path.
>> >
>> > The reason I still want to highlight this is that there is a non-zero
>> > cost paid, and it adds up over time with other decisions we make.
>>
>> In this case you're talking about literally 5 instructions on paths that
>> involve syscalls and thousands of other instructions.
>
> And it could be even 2 (have the fast path done for when __tls_get_addr
> has been called on that TLS object already first, then
> testb $15, %spl
> je __tls_get_addr_slow
> ! Now do the actual stack realignment and __tls_get_addr_slow call
>
> (or testl $15, %esp; whatever is faster).
I think the condition needs to be reversed, at the very least.
I don't know the exact nature of the GCC bug. If it only causes
misalignment by 8 bytes, this could work:
testl $8, %esp
/* Jump if the stack is already properly aligned for the function entry. */
jne __tls_get_addr_slow
/* Otherwise, __tls_get_addr was called with a misaligned stack, but this
means the stack is properly aligned for a call instruction. */
call __tls_get_addr_slow
ret
But it really assumes that the stack alignment is either 0 or 8
(mod 16) at the entry of the __tls_get_addr function.