This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10


On 11/25/2013 12:50 AM, Siddhesh Poyarekar wrote:
> Hi,
> 
> -march=z10 eliminates GOT pointer loads for static variables whenever
> it can, which exposes a bug in the TLS implementation in s390x.  The
> TLS implementation assumes that the compiler will load the GOT pointer
> (%r12) and implements __tls_get_offset with that assumption.  This
> however is true only in cases where the compiler sets up TLS for a
> __thread.  For cases where we call TLS_GET_ADDR directly (like in
> dl-sym.c and in some test cases), we have to explicitly ensure that
> %r12 is set up.

How are you ensuring r12 is "setup?"

The bug here seems to be that r12 != GOT, not that the compiler has
failed to do anything special. This might have been a binutils
issue or a compiler issue that moved r12 away from GOT to optimize
loads. I can see that in tls-macros.h you need to artificially setup
r12, but that's not a bug fix, that's simply fixing the test cases ;-)

This is similar to other machines where a small data pointer or a
plt pointer is setup by the static linker to point near the GOT/PLT
to use for register+offset addressing to access as much data as 
optimally as possible. For example on hppa we have r19 as the GOT/PLT
pointer and the static linker moves it based on the size of the GOT
or PLT to optimize accesses and the final value (without load offset)
is stored into the binary in DT_PLTGOT.

> Attached patch does this for s390 and s390x.  It wasn't needed on
> s390, but I did it anyway to prevent a possible breakage in future.
> Tested and verified that the test cases are fixed and no new
> regressions were seen.  OK to commit?
> 
> Siddhesh
> 
> 	[BZ #16214]
> 	* sysdeps/s390/dl-tls.h (LOAD_R12): New macro.
> 	(__TLS_GET_ADDR): Use it.
> 	* sysdeps/s390/s390-32/tls-macros.h [PIC] (TLS_IE): Load GOT
> 	pointer.
> 	* sysdeps/s390/s390-64/tls-macros.h [PIC] (TLS_IE): Likewise.
> 
> diff --git a/sysdeps/s390/dl-tls.h b/sysdeps/s390/dl-tls.h
> index 68a5af4..92c32e6 100644
> --- a/sysdeps/s390/dl-tls.h
> +++ b/sysdeps/s390/dl-tls.h
> @@ -65,13 +65,24 @@ __tls_get_offset:\n\
>  #  endif
>  # endif
>  
> +# ifdef __s390x__
> +#  define LOAD_R12(r12) asm ("lgr %0,%%r12" : "=r" (r12) ::)
> +# elif defined __s390__
> +#  define LOAD_R12(r12) asm ("lr %0,%%r12" : "=r" (r12) ::)
> +# endif

You should add an `else ... error ..' case here in the even the compiler
ever breaks these defines. If you don't want to do that I'd just put
the __s390__ case in the else as is done in string.h for s390.

I know there is at least one other case of this kind of idef/elif/endif
without an error case in dl-tls.h for s390, but that should also be made
more robust or just accept that !defined __s390x__ == defined __s390__.

> +
>  # define GET_ADDR_OFFSET \
>    (ti->ti_offset - (unsigned long) __builtin_thread_pointer ())
>  
> +/* Subtract contents of r12 from __TI.  We don't really care if this is the
> +   GOT pointer.  */

This comment is a bit obscure and oblique. Why don't you care that r12 is the
GOT pointer? Is it just because the ABI says to use r12 and it wouldn't
really matter what it pointed at as long as you had a register to use for an
offset? 

I suggest rewriting this comment to explain why we use r12.

e.g.

/* The s390/s390x ABIs use r12 as a pointer to access the GOT and other data
   via register+offset addressing. Their TLS ABI also uses r12 to access the
   tls_index structure via r12+offset in __tls_get_offset. Thus knowing the
   final address of the tls_index variable in __TLS_GET_ADDR we are forced
   to subtract r12 here to get an offset which is what __tls_get_offset expects.
   We don't use _GLOBAL_OFFSET_TABLE_ because that's wrong, r12 is not the GOT,
   it is simply a register near the GOT to be used for register+offset
   addressing.  */

Did I understand this correctly?

>  # define __TLS_GET_ADDR(__ti) \
> -  ({ extern char _GLOBAL_OFFSET_TABLE_[] attribute_hidden;		  \
> -     (void *) __tls_get_offset ((char *) (__ti) - _GLOBAL_OFFSET_TABLE_)  \
> -     + (unsigned long) __builtin_thread_pointer (); })
> +({									      \
> +  unsigned long int r12;						      \
> +  LOAD_R12 (r12);							      \

register unsigned long int r12 __asm__ ("%r12"); ?

> +  ((void *) __tls_get_offset ((unsigned long int) (__ti) - r12)		      \
> +   + (unsigned long) __builtin_thread_pointer ());			      \
> +})
>  
>  #endif
>  
> diff --git a/sysdeps/s390/s390-32/tls-macros.h b/sysdeps/s390/s390-32/tls-macros.h
> index 8a0ad58..0a11998 100644
> --- a/sysdeps/s390/s390-32/tls-macros.h
> +++ b/sysdeps/s390/s390-32/tls-macros.h
> @@ -8,12 +8,17 @@
>  
>  #ifdef PIC
>  # define TLS_IE(x) \
> -  ({ unsigned long __offset;						      \
> +  ({ unsigned long __offset, __save;					      \
>       asm ("bras %0,1f\n"						      \
> -	  "0:\t.long " #x "@gotntpoff\n"				      \
> -	  "1:\tl %0,0(%0)\n\t"						      \
> -	  "l %0,0(%0,%%r12):tls_load:" #x				      \

This old code looks like it expects r12 to be setup to a specific
value, but it's probably not given the optimizations you point
out initially.

> -	  : "=&a" (__offset) : : "cc" );				      \
> +	  "0:\t.long _GLOBAL_OFFSET_TABLE_-0b\n\t"			      \
> +	  ".long " #x "@gotntpoff\n"					      \

Compute the offset from the GOT to this literal pool.

> +	  "1:\tlr %1,%%r12\n\t"						      \

Save r12.

> +	  "l %%r12,0(%0)\n\t"                                                 \
> +	  "la %%r12,0(%%r12,%0)\n\t"					      \
> +	  "l %0,4(%0)\n\t"						      \
> +	  "l %0,0(%0,%%r12):tls_load:" #x "\n\t"			      \

Set r12 to the GOT, set %0 to the offset of the GOT entry that holds the
TLS variable offset and then do the load to get the value?

> +	  "lr %%r12, %1\n"						      \

Restore r12.

> +	  : "=&a" (__offset), "=&a" (__save) : : "cc" );		      \
>       (int *) (__builtin_thread_pointer() + __offset); })

And lastly add TP + offset to the TLS variable address.

>  #else
>  # define TLS_IE(x) \
> diff --git a/sysdeps/s390/s390-64/tls-macros.h b/sysdeps/s390/s390-64/tls-macros.h
> index be8aa6c..a65047c 100644
> --- a/sysdeps/s390/s390-64/tls-macros.h
> +++ b/sysdeps/s390/s390-64/tls-macros.h
> @@ -8,12 +8,15 @@
>  
>  #ifdef PIC
>  # define TLS_IE(x) \
> -  ({ unsigned long __offset;						      \
> +  ({ unsigned long __offset, __save;					      \
>       asm ("bras %0,1f\n"						      \
>  	  "0:\t.quad " #x "@gotntpoff\n"				      \
> -	  "1:\tlg %0,0(%0)\n\t"						      \
> -	  "lg %0,0(%0,%%r12):tls_load:" #x				      \
> -	  : "=&a" (__offset) : : "cc" );				      \
> +	  "1:\tlgr %1,%%r12\n\t"					      \
> +	  "larl %%r12,_GLOBAL_OFFSET_TABLE_\n\t"			      \
> +	  "lg %0,0(%0)\n\t"						      \
> +	  "lg %0,0(%0,%%r12):tls_load:" #x "\n\t"			      \
> +	  "lgr %%r12,%1"						      \
> +	  : "=&a" (__offset), "=&a" (__save) : : "cc" );		      \

Similar to above. OK.

>       (int *) (__builtin_thread_pointer() + __offset); })
>  #else
>  # define TLS_IE(x) \
> 

I reviewed the other macros and noticed they setup r12, so they are OK.
However, they are artificially making r12 == GOT, but it need not be,
it's just for the test.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]