ELF TLS technical details

Alan Modra amodra@bigpond.net.au
Fri Mar 31 23:19:00 GMT 2006


On Wed, Mar 29, 2006 at 05:31:13PM -0600, Steve Munroe wrote:
> The 32-bit TLS design was simular. Alan, Paul, Do you have the PowerPC
> 32-bit TLS documentation?

Attached.

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre
-------------- next part --------------
PowerPC Specific Thread Local Storage ABI
For insertion in http://people.redhat.com/drepper/tls.pdf


3.4.x  PowerPC32 Specific
-------------------------

The PowerPC32 TLS ABI is similar to the PowerPC64 model.  The thread-local
storage data structures follow variant I.  The TCB is 8 bytes, with the
first 4 bytes containing the pointer to the dynamic thread vector.
tlsoffset calculations and definition of __tls_get_addr are identical to
PowerPC64.  r2 is the thread pointer, and points 0x7000 past the end of the
thread control block.  Dynamic thread vector pointers point 0x8000 past the
start of each TLS block.  (*)  This allows the first 64K of each block to
be addressed from a dtv pointer using fewer machine instructions.  The tp
offset allows for efficient addressing of the TCB and up to 4K-8 of other
thread library information.

(*) For implementation reasons the actual value stored in dtv may point to
the start of a block, however values returned by accessor functions will be
offset by 0x8000.


4.1.x  PowerPC32 General Dynamic TLS Model
------------------------------------------

The PowerPC32 general dynamic access model is similar to that for PowerPC64. 
The __tls_get_addr function is called with one parameter which is a pointer
to an object of type tls_index.  In the following code it is assumed that
register r31 points to the GOT.  Different registers may well be used.

Code sequence		Reloc			Sym
 addi 3,31,x@got@tlsgd	R_PPC_GOT_TLSGD16	x
 bl __tls_get_addr	R_PPC_REL24		__tls_get_addr

GOT[n]			R_PPC_DTPMOD32		x
GOT[n+1]		R_PPC_DTPREL32		x

The relocation specifier @got@tlsgd causes the linker to create an object
of type tls_index in the GOT.  The address of this object is loaded into
the first argument register with the addi instruction, then a standard
function call is made.


4.2.x  PowerPC32 Local Dynamic TLS Model
----------------------------------------

This is similar to other architectures.  Two different sequences may be
used, depending on the size of the offset to the variable.

Code sequence		Reloc			Sym
 addi 3,31,x1@got@tlsld	R_PPC_GOT_TLSLD16	x1
 bl __tls_get_addr	R_PPC_REL24		__tls_get_addr
..
 addi 9,3,x1@dtprel	R_PPC_DTPREL16		x1
..
 addis 9,3,x2@dtprel@ha	R_PPC_DTPREL16_HA	x2
 addi 9,9,x2@dtprel@l	R_PPC_DTPREL16_LO	x2

GOT[n]			R_PPC_DTPMOD32		x1
GOT[n+1]		0

@got@tlsld in the first instruction causes the linker to generate a
tls_index object in the GOT with a fixed 0 offset.  The code shown assumes
that x1 is in the first 64k of the thread storage block, while x2 isn't.
If we wanted to load the values of x1 and x2 instead of the address, then
we could access int variables with

..
 lwz 0,x1@dtprel(3)	R_PPC_DTPREL16		x1
..
 addis 9,3,x2@dtprel@ha	R_PPC_DTPREL16_HA	x2
 lwz 0,x2@dtprel@l(9)	R_PPC_DTPREL16_LO	x2


4.3.x  PowerPC32 Initial Exec TLS Model
---------------------------------------

Code sequence		Reloc			Sym
 lwz 9,x@got@tprel(31)	R_PPC_GOT_TPREL16	x
 add 9,9,x@tls		R_PPC_TLS		x

GOT[n]			R_PPC_TPREL32		x

@got@tprel in the first instruction causes the linker to generate a GOT
entry with a relocation that the dynamic linker will replace with the
offset for x relative to the thread pointer.  x@tls tells the assembler to
use an r2 form of the instruction (ie. add 9,9,2 in this case), and tag the
instruction with a reloc that indicates it belongs to a TLS sequence.  This
may be later used by the linker when optimizing TLS code.

To read the contents of the variable instead of calculating its address,
the "add 9,9,x@tls" instruction might be replaced with "lwzx 0,9,x@tls".


4.4.x  PowerPC32 Local Exec TLS Model
-------------------------------------

Two different sequences may be used, depending on the size of the offset to
the variable.  The first one handles offsets within 60K of the end of the
TLS block (remember that r2 points 28K past the end of the TCB, which is
immediately prior to the first TLS block).

Code sequence		Reloc			Sym
 addi 9,2,x1@tprel	R_PPC_TPREL16		x1
..
 addis 9,2,x2@tprel@ha	R_PPC_TPREL16_HA	x2
 addi 9,9,x2@tprel@l	R_PPC_TPREL16_LO	x2


5.x  PowerPC32 Linker Optimizations
-----------------------------------

The linker transformations for PowerPC32 are quite straightforward, since
all the relevant code sequences are two instructions long.

5.x.1  General Dynamic To Initial Exec
--------------------------------------

Code sequence		Reloc			Sym
 addi 3,31,x@got@tlsgd	R_PPC_GOT_TLSGD16	x
 bl __tls_get_addr	R_PPC_REL24		__tls_get_addr

GOT[n]			R_PPC_DTPMOD32		x
GOT[n+1]		R_PPC_DTPREL32		x

is replaced by

 lwz 3,x@got@tprel(31)	R_PPC_GOT_TPREL16	x
 add 3,3,2

GOT[n]			R_PPC_TPREL32		x

The linker relies on this sequence being emitted without intervening
instructions.  A register other than r31 may be used as the GOT pointer.

5.x.2  General Dynamic To Local Exec
------------------------------------

Code sequence		Reloc			Sym
 addi 3,31,x@got@tlsgd	R_PPC_GOT_TLSGD16	x
 bl __tls_get_addr	R_PPC_REL24		__tls_get_addr

GOT[n]			R_PPC_DTPMOD32		x
GOT[n+1]		R_PPC_DTPREL32		x

is replaced by

 addis 3,2,x@tprel@ha	R_PPC_TPREL16_HA	x
 addi 3,3,x@tprel@l	R_PPC_TPREL16_LO	x

The linker relies on this sequence being emitted without intervening
instructions.  A register other than r31 may be used as the GOT pointer.

5.x.3  Local Dynamic to Local Exec
----------------------------------

In this case, the function call is replaced with an equivalent code
sequence.  As shown, following dtprel sequences are left unchanged.

Code sequence		Reloc			Sym
 addi 3,31,x1@got@tlsld	R_PPC_GOT_TLSLD16	x1
 bl __tls_get_addr	R_PPC_REL24		__tls_get_addr
..
 addi 9,3,x1@dtprel	R_PPC_DTPREL16		x1
..
 addis 9,3,x2@dtprel@ha	R_PPC_DTPREL16_HA	x2
 addi 9,9,x2@dtprel@l	R_PPC_DTPREL16_LO	x2

GOT[n]			R_PPC_DTPMOD32		x1
GOT[n+1]

is replaced by

 addis 3,2,L@tprel@ha	R_PPC_TPREL16_HA	linker generated local sym
 addi 3,3,L@tprel@l	R_PPC_TPREL16_LO	linker generated local sym
..
 addi 9,3,x1@dtprel	R_PPC_DTPREL16		x1
..
 addis 9,3,x2@dtprel@ha	R_PPC_DTPREL16_HA	x2
 addi 9,9,x2@dtprel@l	R_PPC_DTPREL16_LO	x2

The "linker generated local sym" points to the start of the thread storage
block plus 0x7000.  In practice, a section symbol with a suitable offset
will be used.  The linker relies on code for the tls_get_addr call being
emitted without intervening instructions.  A register other than r31 may
be used as the GOT pointer.

5.x.4  Initial Exec To Local Exec
---------------------------------

Code sequence		Reloc			Sym
 lwz 9,x@got@tprel(31)	R_PPC_GOT_TPREL16	x
 add 9,9,x@tls		R_PPC64_TLS		x

GOT[n]			R_PPC_TPREL32		x

is replaced by

 addis 9,2,x@tprel@ha	R_PPC_TPREL16_HA	x
 addi 9,9,x@tprel@l	R_PPC_TPREL16_LO	x

Other sizes and types of thread-local variables may use any of the X-FORM
indexed loads or stores.  The "lwz" and "add" instruction in this case may
have intervening code inserted by the compiler.

An example showing access to the contents of a variable:

Code sequence		Reloc			Sym
 lwz 9,x@got@tprel(31)	R_PPC_GOT_TPREL16	x
 lbzx 10,9,x@tls	R_PPC_TLS		x
 addi 10,10,1
 stbx 10,9,x@tls	R_PPC_TLS		x

GOT[n]			R_PPC_TPREL32		x

is replaced by

 addis 9,2,x@tprel@ha	R_PPC_TPREL16_HA	x
 lbz 10,x@tprel@l(9)	R_PPC_TPREL16_LO	x
 addi 10,10,1
 stb 10,x@tprel@l(9)	R_PPC_TPREL16_LO	x


6.x  New PowerPC32 ELF Definitions
----------------------------------

Reloc Name                  Value  Field         Expression
R_PPC_TLS                   67     none          (sym+add)@tls
R_PPC_DTPMOD32              68 	   word32        (sym+add)@dtpmod
R_PPC_TPREL16               69 	   half16*       (sym+add)@tprel
R_PPC_TPREL16_LO            60 	   half16        (sym+add)@tprel@l
R_PPC_TPREL16_HI            71 	   half16        (sym+add)@tprel@h
R_PPC_TPREL16_HA            72 	   half16        (sym+add)@tprel@ha
R_PPC_TPREL32               73 	   word32        (sym+add)@tprel
R_PPC_DTPREL16              74 	   half16*       (sym+add)@dtprel
R_PPC_DTPREL16_LO           75 	   half16        (sym+add)@dtprel@l
R_PPC_DTPREL16_HI           76 	   half16        (sym+add)@dtprel@h
R_PPC_DTPREL16_HA           77 	   half16        (sym+add)@dtprel@ha
R_PPC_DTPREL32              78 	   word32        (sym+add)@dtprel
R_PPC_GOT_TLSGD16           79 	   half16*       (sym+add)@got@tlsgd
R_PPC_GOT_TLSGD16_LO        80 	   half16        (sym+add)@got@tlsgd@l
R_PPC_GOT_TLSGD16_HI        81 	   half16        (sym+add)@got@tlsgd@h
R_PPC_GOT_TLSGD16_HA        82 	   half16        (sym+add)@got@tlsgd@ha
R_PPC_GOT_TLSLD16           83 	   half16*       (sym+add)@got@tlsld
R_PPC_GOT_TLSLD16_LO        84 	   half16        (sym+add)@got@tlsld@l
R_PPC_GOT_TLSLD16_HI        85 	   half16        (sym+add)@got@tlsld@h
R_PPC_GOT_TLSLD16_HA        86 	   half16        (sym+add)@got@tlsld@ha
R_PPC_GOT_TPREL16           87 	   half16*       (sym+add)@got@tprel
R_PPC_GOT_TPREL16_LO        88 	   half16        (sym+add)@got@tprel@l
R_PPC_GOT_TPREL16_HI        89 	   half16        (sym+add)@got@tprel@h
R_PPC_GOT_TPREL16_HA        90 	   half16        (sym+add)@got@tprel@ha

(sym+add)@tls
Merely causes the R_PPC_TLS marker reloc to be emitted.

(sym+add)@dtpmod
Computes the load module index of the load module that contains the
definition of sym.  The addend, if present, is ignored.

(sym+add)@dtprel
Computes a dtv-relative displacement, the difference between the value
of sym+add and the base address of the thread-local storage block that
contains the definition of sym, minus 0x8000.  The minus 0x8000 is because
dtv elements point to the start of the storage block plus 0x8000.

(sym+add)@tprel
Computes a tp-relative displacement, the difference between the value of
sym+add and the value of the thread pointer (r2).

(sym+add)@got@tlsgd
Allocates two contiguous entries in the GOT to hold a tls_index structure,
with values (sym+add)@dtpmod and (sym+add)@dtprel, and computes the offset
of the first entry within the GOT.

(sym+add)@got@tlsld
Allocates two contiguous entries in the GOT to hold a tls_index structure,
with values (sym+add)@dtpmod and zero, and computes the offset of the first
entry within the GOT.

(sym+add)@got@tprel
Allocates an entry in the GOT with value (sym+add)@tprel, and computes the
offset of the entry within the GOT.

@l, @h
These modifiers affect the value computed, returning the low 16 bits or the
high 16 bits of a 32 bit value.

@ha
This modifier is like the corresponding @h modifier, except it adjusts for
@l being treated as a signed number.

Relocations not using these modifiers (those flagged with `*' above) will
trigger a relocation failure if the value computed does not fit in the
field specified.

Local variables:
fill-column: 75
End:


More information about the Libc-alpha mailing list