This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10


On 11/27/2013 04:32 AM, Andreas Krebbel wrote:
> Hi Carlos,
> 
> On 27/11/13 07:03, Carlos O'Donell wrote:
>> Yes, _dl_sym and _dl_vsym use this e.g. ->do_sym->_dl_tls_symaddr.
>>
>> An asm with clobbers is the only way of making this work reliably.
>>
>> The asm could set r12 to GOT, call __tls_get_offset, restore r12
>> (or list it clobbered along with all the register __tls_get_offset
>> touches, and __tls_get_addr), save result, then the rest can be
>> in C and does the math required to compute the result.
>>
>> You have to avoid the compiler using r12 for something else in the
>> middle and the asm is the only way to avoid that.
> 
> When building with -fpic r12 is fixed and cannot be used by the compiler freely. So this is not
> supposed to happen.

I don't know the s390 backend well enough to comment, but if you say it
is true then I believe you. I was only commenting on issues I've seen
with other ports. Not to mention that future improvements could invalidate
what you've said. Why can't the compiler eventually reuse r12 if it finds
no PIC accesses via r12 are requried? As long as r12 is reloaded to point
to the GOT before a future use? I'm urging caution here, and that we design
a robust solution that is future proof so we don't need to come back to this
at a later stage.

> ...
>>>>  #ifdef PIC
>>>>  # define TLS_IE(x) \
>>>> -  ({ unsigned long __offset;						      \
>>>> +  ({ unsigned long __offset, __save;					      \
>>>>       asm ("bras %0,1f\n"						      \
>>>>  	  "0:\t.quad " #x "@gotntpoff\n"				      \
>>>> -	  "1:\tlg %0,0(%0)\n\t"						      \
>>>> -	  "lg %0,0(%0,%%r12):tls_load:" #x				      \
>>>> -	  : "=&a" (__offset) : : "cc" );				      \
>>>> +	  "1:\tlgr %1,%%r12\n\t"					      \
>>>> +	  "larl %%r12,_GLOBAL_OFFSET_TABLE_\n\t"			      \
>>>> +	  "lg %0,0(%0)\n\t"						      \
>>>> +	  "lg %0,0(%0,%%r12):tls_load:" #x "\n\t"			      \
>>>> +	  "lgr %%r12,%1"						      \
>>>> +	  : "=&a" (__offset), "=&a" (__save) : : "cc" );		      \
>>>>       (int *) (__builtin_thread_pointer() + __offset); })
>>>>  #else
>>>>  # define TLS_IE(x) \
>>>>
>>>
>>> For this code to work the GOT pointer is not required to be in r12. Can't you just set it up in a
>>> compiler chosen reg? Something like this:
>>
>> Why? This code attempts to emulate exactly what the compiler is going to do
>> and should be representative of the instruction sequences you'd see being
>> generated for TLS accesses. These macros are only ever used in testing and
>> never anywhere else so their speed is not important.
>>
>> I suggest leaving the longer sequences that Siddhesh has here that save/restore
>> r12 and follow the ABI.
> 
> All this happens inside the macro so I don't think the ABI is relevant here. It is only needed to
> add the GOT pointer to an offset.  Since we cannot do the add with a single instruction we need a
> scratch register. This can be r12 or any other GPR (except r0 of course).  I would prefer the
> shorter variant because well ... it's shorter :)

This macro is only ever used in testing TLS, thus shorter sequences
don't really impact application performance.

>From a maintenance perspective I would rather see these sequences
logically identical to what the compiler would generate for valid
ABI conforming code sequences (so I can compare it to disassembly
dumps of TLS variable accesses).

Does that make sense? It's a stronger argument than shorter for
shorter's sake? :-)

Cheers,
Carlos.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]