This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org, Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>
- Date: Tue, 26 Nov 2013 15:27:50 +0530
- Subject: Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10
- Authentication-results: sourceware.org; auth=none
- References: <20131125055009 dot GK19834 at spoyarek dot pnq dot redhat dot com> <52937ABB dot 20009 at redhat dot com>
On Mon, Nov 25, 2013 at 11:28:43AM -0500, Carlos O'Donell wrote:
> On 11/25/2013 12:50 AM, Siddhesh Poyarekar wrote:
> > Hi,
> >
> > -march=z10 eliminates GOT pointer loads for static variables whenever
> > it can, which exposes a bug in the TLS implementation in s390x. The
> > TLS implementation assumes that the compiler will load the GOT pointer
> > (%r12) and implements __tls_get_offset with that assumption. This
> > however is true only in cases where the compiler sets up TLS for a
> > __thread. For cases where we call TLS_GET_ADDR directly (like in
> > dl-sym.c and in some test cases), we have to explicitly ensure that
> > %r12 is set up.
>
> How are you ensuring r12 is "setup?"
>
> The bug here seems to be that r12 != GOT, not that the compiler has
> failed to do anything special. This might have been a binutils
> issue or a compiler issue that moved r12 away from GOT to optimize
> loads. I can see that in tls-macros.h you need to artificially setup
> r12, but that's not a bug fix, that's simply fixing the test cases ;-)
The bug here is that the value that is added in __tls_get_offset
before it jumps to __tls_get_addr is not the same value that was
subtracted earlier. In dl-sym.c, the value subtracted was the GOT
pointer and the value added was 0 (since r12 was not set to the GOT
pointer). Likewise in the test case. So we're fixing both, a valid
use case *and* a test case.
> >
> > +# ifdef __s390x__
> > +# define LOAD_R12(r12) asm ("lgr %0,%%r12" : "=r" (r12) ::)
> > +# elif defined __s390__
> > +# define LOAD_R12(r12) asm ("lr %0,%%r12" : "=r" (r12) ::)
> > +# endif
>
> You should add an `else ... error ..' case here in the even the compiler
> ever breaks these defines. If you don't want to do that I'd just put
> the __s390__ case in the else as is done in string.h for s390.
>
> I know there is at least one other case of this kind of idef/elif/endif
> without an error case in dl-tls.h for s390, but that should also be made
> more robust or just accept that !defined __s390x__ == defined __s390__.
I'll do that for both cases in this file as a separate patch if it is
OK. But then your suggestion to use __asm__("%r12") should eliminate
this?
> > +
> > # define GET_ADDR_OFFSET \
> > (ti->ti_offset - (unsigned long) __builtin_thread_pointer ())
> >
> > +/* Subtract contents of r12 from __TI. We don't really care if this is the
> > + GOT pointer. */
>
> This comment is a bit obscure and oblique. Why don't you care that r12 is the
> GOT pointer? Is it just because the ABI says to use r12 and it wouldn't
> really matter what it pointed at as long as you had a register to use for an
> offset?
>
> I suggest rewriting this comment to explain why we use r12.
>
> e.g.
>
> /* The s390/s390x ABIs use r12 as a pointer to access the GOT and other data
> via register+offset addressing. Their TLS ABI also uses r12 to access the
> tls_index structure via r12+offset in __tls_get_offset. Thus knowing the
> final address of the tls_index variable in __TLS_GET_ADDR we are forced
> to subtract r12 here to get an offset which is what __tls_get_offset expects.
> We don't use _GLOBAL_OFFSET_TABLE_ because that's wrong, r12 is not the GOT,
> it is simply a register near the GOT to be used for register+offset
> addressing. */
>
> Did I understand this correctly?
No. in terms of the strict definition (which is relevant for
interoperability between gcc-generated code for __thread access and
glibc) %r12 must be set to the GOT pointer before calling
__tls_get_offset - it does not have to be something else. The
compiler will generate the code to load the GOT pointer in %r12 and
generate the offset to pass to __tls_get_offset.
In our case in __TLS_GET_ADDR (ignoring TLS_IE for a moment since it
is just a test case fix), the compiler is not aware of a TLS access
and is hence under no obligation to load the GOT pointer in %r12, nor
is it responsible to generate the offset to the tls_index. We have to
explicitly do that by subtracting the GOT pointer from the tls_index
structure passed to the __TLS_GET_ADDR macro and then setting up %r12
to the GOT pointer so that __tls_get_offset adds it back to the
offset.
However, since we have full control over these operations, we can
avoid loading the GOT pointer in %r12 and just get away with
subtracting whatever is there in %r12 so that adding it back in
__tls_get_offset gives us back the tls_index address.
> > # define __TLS_GET_ADDR(__ti) \
> > - ({ extern char _GLOBAL_OFFSET_TABLE_[] attribute_hidden; \
> > - (void *) __tls_get_offset ((char *) (__ti) - _GLOBAL_OFFSET_TABLE_) \
> > - + (unsigned long) __builtin_thread_pointer (); })
> > +({ \
> > + unsigned long int r12; \
> > + LOAD_R12 (r12); \
>
> register unsigned long int r12 __asm__ ("%r12"); ?
That looks nicer too, will try it, thanks.
> > + ((void *) __tls_get_offset ((unsigned long int) (__ti) - r12) \
> > + + (unsigned long) __builtin_thread_pointer ()); \
> > +})
> >
> > #endif
> >
> > diff --git a/sysdeps/s390/s390-32/tls-macros.h b/sysdeps/s390/s390-32/tls-macros.h
> > index 8a0ad58..0a11998 100644
> > --- a/sysdeps/s390/s390-32/tls-macros.h
> > +++ b/sysdeps/s390/s390-32/tls-macros.h
> > @@ -8,12 +8,17 @@
> >
> > #ifdef PIC
> > # define TLS_IE(x) \
> > - ({ unsigned long __offset; \
> > + ({ unsigned long __offset, __save; \
> > asm ("bras %0,1f\n" \
> > - "0:\t.long " #x "@gotntpoff\n" \
> > - "1:\tl %0,0(%0)\n\t" \
> > - "l %0,0(%0,%%r12):tls_load:" #x \
>
> This old code looks like it expects r12 to be setup to a specific
> value, but it's probably not given the optimizations you point
> out initially.
It expects GOT pointer to be loaded in %r12.
> > - : "=&a" (__offset) : : "cc" ); \
> > + "0:\t.long _GLOBAL_OFFSET_TABLE_-0b\n\t" \
> > + ".long " #x "@gotntpoff\n" \
>
> Compute the offset from the GOT to this literal pool.
>
> > + "1:\tlr %1,%%r12\n\t" \
>
> Save r12.
>
> > + "l %%r12,0(%0)\n\t" \
> > + "la %%r12,0(%%r12,%0)\n\t" \
> > + "l %0,4(%0)\n\t" \
> > + "l %0,0(%0,%%r12):tls_load:" #x "\n\t" \
>
> Set r12 to the GOT, set %0 to the offset of the GOT entry that holds the
> TLS variable offset and then do the load to get the value?
>
Set r12 to the GOT and set %0 to x@gotntpoff. So basr loads the
address of the word after itself (which is the location of .long
_GLOBAL_OFFSET_TABLE_-0b) into %0. Dereference that and add that
value to its own address to get the GOT address. The word after the
address in %0 is x@gotntpoff, i.e. the offset to compute the location
of the TLS variable. Finally, add %r12 and the offset and dereference
to get the value.
> I reviewed the other macros and noticed they setup r12, so they are OK.
> However, they are artificially making r12 == GOT, but it need not be,
> it's just for the test.
No, they need to be == GOT, because the compiler will generate offsets
relative to GOT.
Siddhesh