This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10


On Mon, Nov 25, 2013 at 11:28:43AM -0500, Carlos O'Donell wrote:
> On 11/25/2013 12:50 AM, Siddhesh Poyarekar wrote:
> > Hi,
> > 
> > -march=z10 eliminates GOT pointer loads for static variables whenever
> > it can, which exposes a bug in the TLS implementation in s390x.  The
> > TLS implementation assumes that the compiler will load the GOT pointer
> > (%r12) and implements __tls_get_offset with that assumption.  This
> > however is true only in cases where the compiler sets up TLS for a
> > __thread.  For cases where we call TLS_GET_ADDR directly (like in
> > dl-sym.c and in some test cases), we have to explicitly ensure that
> > %r12 is set up.
> 
> How are you ensuring r12 is "setup?"
> 
> The bug here seems to be that r12 != GOT, not that the compiler has
> failed to do anything special. This might have been a binutils
> issue or a compiler issue that moved r12 away from GOT to optimize
> loads. I can see that in tls-macros.h you need to artificially setup
> r12, but that's not a bug fix, that's simply fixing the test cases ;-)

The bug here is that the value that is added in __tls_get_offset
before it jumps to __tls_get_addr is not the same value that was
subtracted earlier.  In dl-sym.c, the value subtracted was the GOT
pointer and the value added was 0 (since r12 was not set to the GOT
pointer).  Likewise in the test case.  So we're fixing both, a valid
use case *and* a test case.

> >  
> > +# ifdef __s390x__
> > +#  define LOAD_R12(r12) asm ("lgr %0,%%r12" : "=r" (r12) ::)
> > +# elif defined __s390__
> > +#  define LOAD_R12(r12) asm ("lr %0,%%r12" : "=r" (r12) ::)
> > +# endif
> 
> You should add an `else ... error ..' case here in the even the compiler
> ever breaks these defines. If you don't want to do that I'd just put
> the __s390__ case in the else as is done in string.h for s390.
> 
> I know there is at least one other case of this kind of idef/elif/endif
> without an error case in dl-tls.h for s390, but that should also be made
> more robust or just accept that !defined __s390x__ == defined __s390__.

I'll do that for both cases in this file as a separate patch if it is
OK.  But then your suggestion to use __asm__("%r12") should eliminate
this?

> > +
> >  # define GET_ADDR_OFFSET \
> >    (ti->ti_offset - (unsigned long) __builtin_thread_pointer ())
> >  
> > +/* Subtract contents of r12 from __TI.  We don't really care if this is the
> > +   GOT pointer.  */
> 
> This comment is a bit obscure and oblique. Why don't you care that r12 is the
> GOT pointer? Is it just because the ABI says to use r12 and it wouldn't
> really matter what it pointed at as long as you had a register to use for an
> offset? 
> 
> I suggest rewriting this comment to explain why we use r12.
> 
> e.g.
> 
> /* The s390/s390x ABIs use r12 as a pointer to access the GOT and other data
>    via register+offset addressing. Their TLS ABI also uses r12 to access the
>    tls_index structure via r12+offset in __tls_get_offset. Thus knowing the
>    final address of the tls_index variable in __TLS_GET_ADDR we are forced
>    to subtract r12 here to get an offset which is what __tls_get_offset expects.
>    We don't use _GLOBAL_OFFSET_TABLE_ because that's wrong, r12 is not the GOT,
>    it is simply a register near the GOT to be used for register+offset
>    addressing.  */
> 
> Did I understand this correctly?

No.  in terms of the strict definition (which is relevant for
interoperability between gcc-generated code for __thread access and
glibc) %r12 must be set to the GOT pointer before calling
__tls_get_offset - it does not have to be something else.  The
compiler will generate the code to load the GOT pointer in %r12 and
generate the offset to pass to __tls_get_offset.

In our case in __TLS_GET_ADDR (ignoring TLS_IE for a moment since it
is just a test case fix), the compiler is not aware of a TLS access
and is hence under no obligation to load the GOT pointer in %r12, nor
is it responsible to generate the offset to the tls_index.  We have to
explicitly do that by subtracting the GOT pointer from the tls_index
structure passed to the __TLS_GET_ADDR macro and then setting up %r12
to the GOT pointer so that __tls_get_offset adds it back to the
offset.

However, since we have full control over these operations, we can
avoid loading the GOT pointer in %r12 and just get away with
subtracting whatever is there in %r12 so that adding it back in
__tls_get_offset gives us back the tls_index address.

> >  # define __TLS_GET_ADDR(__ti) \
> > -  ({ extern char _GLOBAL_OFFSET_TABLE_[] attribute_hidden;		  \
> > -     (void *) __tls_get_offset ((char *) (__ti) - _GLOBAL_OFFSET_TABLE_)  \
> > -     + (unsigned long) __builtin_thread_pointer (); })
> > +({									      \
> > +  unsigned long int r12;						      \
> > +  LOAD_R12 (r12);							      \
> 
> register unsigned long int r12 __asm__ ("%r12"); ?

That looks nicer too, will try it, thanks.

> > +  ((void *) __tls_get_offset ((unsigned long int) (__ti) - r12)		      \
> > +   + (unsigned long) __builtin_thread_pointer ());			      \
> > +})
> >  
> >  #endif
> >  
> > diff --git a/sysdeps/s390/s390-32/tls-macros.h b/sysdeps/s390/s390-32/tls-macros.h
> > index 8a0ad58..0a11998 100644
> > --- a/sysdeps/s390/s390-32/tls-macros.h
> > +++ b/sysdeps/s390/s390-32/tls-macros.h
> > @@ -8,12 +8,17 @@
> >  
> >  #ifdef PIC
> >  # define TLS_IE(x) \
> > -  ({ unsigned long __offset;						      \
> > +  ({ unsigned long __offset, __save;					      \
> >       asm ("bras %0,1f\n"						      \
> > -	  "0:\t.long " #x "@gotntpoff\n"				      \
> > -	  "1:\tl %0,0(%0)\n\t"						      \
> > -	  "l %0,0(%0,%%r12):tls_load:" #x				      \
> 
> This old code looks like it expects r12 to be setup to a specific
> value, but it's probably not given the optimizations you point
> out initially.

It expects GOT pointer to be loaded in %r12.

> > -	  : "=&a" (__offset) : : "cc" );				      \
> > +	  "0:\t.long _GLOBAL_OFFSET_TABLE_-0b\n\t"			      \
> > +	  ".long " #x "@gotntpoff\n"					      \
> 
> Compute the offset from the GOT to this literal pool.
> 
> > +	  "1:\tlr %1,%%r12\n\t"						      \
> 
> Save r12.
> 
> > +	  "l %%r12,0(%0)\n\t"                                                 \
> > +	  "la %%r12,0(%%r12,%0)\n\t"					      \
> > +	  "l %0,4(%0)\n\t"						      \
> > +	  "l %0,0(%0,%%r12):tls_load:" #x "\n\t"			      \
> 
> Set r12 to the GOT, set %0 to the offset of the GOT entry that holds the
> TLS variable offset and then do the load to get the value?
> 

Set r12 to the GOT and set %0 to x@gotntpoff.  So basr loads the
address of the word after itself (which is the location of .long
_GLOBAL_OFFSET_TABLE_-0b) into %0.  Dereference that and add that
value to its own address to get the GOT address.  The word after the
address in %0 is x@gotntpoff, i.e. the offset to compute the location
of the TLS variable.  Finally, add %r12 and the offset and dereference
to get the value.

> I reviewed the other macros and noticed they setup r12, so they are OK.
> However, they are artificially making r12 == GOT, but it need not be,
> it's just for the test.

No, they need to be == GOT, because the compiler will generate offsets
relative to GOT.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]