s390/s390x TLS ABI.

Martin Schwidefsky schwidefsky@de.ibm.com
Mon Jan 20 13:51:00 GMT 2003


Hi,
the TLS ABI for s390/s390x is finally done. It works, I got the glibc running
with TLS support switched on and TLS testcases succeed. We still can
change details in the ABI and I'd be happy if someone could review the
ABI document. It is appended and is meant for inclusion info the tls.pdf
by Ulrich Drepper.

blue skies,
  Martin.

\documentclass{article}

\begin{document}

\setcounter{section}{3}
\setcounter{subsection}{4}
\setcounter{subsubsection}{6}
\subsubsection{s390 specific}
The s390 ABI uses variant II of the  thread-local storage data structures.
The size of the TCB does not matter for the ABI.
The thread pointer is stored in access register {\tt \%a0} and needs to
get extracted into a general purpose register before it can be used as
an address. One way to get the thread pointer from {\tt \%a0} to e.g.
{\tt \%r1} is by use of the {\tt ear} instruction:
\begin{verbatim}
   ear %r1, %a0
\end{verbatim}

The TLS blocks of the modules present at startup are allocated according to
variant II of the data structure layout and the offsets are computed with
the same formulas. The $tlsoffset_i$ values must be subtracted from the
thread register value.
\begin{eqnarray*}
tlsoffset_1 & = & {\tt round}(tlssize_1, align_1)\\
tlsoffset_{m+1} & = & {\tt round}(tlsoffset_m + tlssize_{m+1}, align_{m+1})
\end{eqnarray*}
for all $m$ in $1 \le m < M$ where $M$ is the total number of modules.

\medskip
The s390 ABI defines the {\tt \_\_tls\_get\_offset} function instead of
the standard {\tt \_\_tls\_get\_addr} function. The prototype is:
\begin{verbatim}
   extern unsigned long __tls_get_offset (unsigned long offset);
\end{verbatim}
The function has second, hidden parameter. The caller needs to setup the
GOT register {\tt \%r12} to contain the address of the global offset table
of the caller's module. The {\tt offset} parameter when added to the value
of the GOT register yields the address of a {\tt tls\_index} structure
located in the caller's global offset table.  The type {\tt tls\_index}
is defined as
\begin{verbatim}
   typedef struct
     {
       unsigned long int ti_module;
       unsigned long int ti_offset;
     } tls_index;
\end{verbatim}
The return value of {\tt \_\_tls\_get\_offset} is an offset to the thread
pointer. To get the address of the requested variable the thread pointer
needs to be added to the return value. The use of {\tt \_\_tls\_get\_offset}
might seem more complicated than the standard {\tt \_\_tls\_get\_addr}
but for s390 the use of {\tt \_\_tls\_get\_offset} allows for better code
sequences.

\subsubsection{s390x specific}
The s390x ABI is a close match to the s390 ABI. The thread-local storage
data structures follows variant II. The size of the TCB does not matter for
the ABI. The thread pointer is stored in the pair of access registers
{\tt \%a0} and {\tt \%a1} with the higher 32 bits of the thread pointer in
{\tt \%a0} and the lower 32 bits in {\tt \%a1}. One way to get the thread
pointer into e.g. register {\tt \%r1} is to use the following sequence of
instructions:
\begin{verbatim}
   ear  %r1,%a0
   sllg %r1,%r1,32
   ear  %r1,%a1
\end{verbatim}

The TLS block allocation of the modules present at startup uses the same
formulas for $tlsoffset_m$ as s390 and the s390x ABI uses the same
{\tt \_\_tls\_get\_offset} interface as s390.

\setcounter{section}{4}
\setcounter{subsection}{1}
\setcounter{subsubsection}{6}
\subsubsection{s390 General Dynamic TLS Model}
For the s390 general dynamic access model the compiler has to setup the
GOT register {\tt \%r12} before it can call {\tt \_\_tls\_get\_offset}.
The {\tt \_\_tls\_get\_offset} function gets one parameter which is
a GOT offset to an object of type {\tt tls\_index}. The return value
of the function call has to be added to the thread pointer to get the address
of the requested variable.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf General Dynamic Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bas & \tt \%r14,0(\%r6,\%r13) &
                \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x@tlsgd} &
                \tt R\_390\_TLS\_GD32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
& \multicolumn{2}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_DTPMOD & \tt x \\
& \multicolumn{2}{l|}{\tt GOT[n+1]} & \tt R\_390\_TLS\_DTPOFF & \tt x \\
\end{tabular}
}
\medskip

The {\tt R\_390\_TLS\_GD32} relocation created for the literal pool entry
{\tt x@tlsgd} instructs the linker to allocate a {\tt tls\_index} structure
in the GOT, occupying two consecutive GOT entries. These two GOT entries have
the relocations {\tt R\_390\_TLS\_DTPMOD} and {\tt R\_390\_TLS\_DTPOFF}
associated with them.
The {\tt R\_390\_TLS\_GDCALL} relocation tags the function call instruction
to {\tt \_\_tls\_get\_offset}. This instructions is subject to TLS model
optimization. The tag is necessary because the linker needs to known the
location of the call to be able to replace it with an instruction of a
different TLS model. How the instruction tag is specified in the assembler
syntax is up to the assembler implementation.

The instruction sequence is divided into four parts. The first part extracts
the thread pointer from {\tt \%a0} and loads the branch offset to
{\tt \_\_tls\_get\_offset}. The first part can be reused for other TLS
accesses. A second TLS access doesn't have to repeat these two instruction,
but can use {\tt \%r6} and {\tt \%r7} if these registers have not been
clobbered between the two TLS accesses. The second part is the core of the
TLS access. For every variable that is accessed by the general dynamic access
model these two instruction have to be present. The first loads the GOT offset
to the variables {\tt tls\_index} structure from the literal pool and the
second calls {\tt \_\_tls\_get\_offset}. The third part uses the extracted
thread pointer in {\tt \%r7} and the offset in {\tt \%r2} returned by the
call to {\tt \_\_tls\_get\_offset} to perform an operation on the variable.
In the example the address of {\tt x} is loaded to register {\tt \%r8}. The
compiler can choose any other suitable instruction to access {\tt x},
for example a ``{\tt l~\%r8,0(\%r2,\%r7)}'' would load the content of
{\tt x} to {\tt \%r8}. That leaves room for optimizations in the
compiler. The fourth part is the literal pool that
needs to have an entry for the {\tt x@tlsgd} offset.

All the instruction in the general dynamic access model for s390 can be
scheduled freely by the compiler as long as the obvious data dependencies
are fulfilled and the registers {\tt \%r0} - {\tt \%r5} do not contain any
information that is still needed after the {\tt bas} instruction (they get
clobbered by the function call). Registers {\tt \%r6}, {\tt \%r7} and
{\tt \%r8} are not fixed, they can be replaced by any other suitable
register.

\subsubsection{s390x General Dynamic TLS Model}
The general dynamic access model for s390x is more or less a copy of the
general dynamic model for s390. The main differences are the more complicated
code for the thread pointer extraction, the use of the {\tt brasl}
instruction instead of the {\tt bas} and the fact the s390x uses 64 bit
offsets.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf General Dynamic Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brasl & \tt \%r14,\_\_tls\_get\_offset@plt &
                       \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@tlsgd} &
                \tt R\_390\_TLS\_GD64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
& \multicolumn{2}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_DTPMOD & \tt x \\
& \multicolumn{2}{l|}{\tt GOT[n+1]} & \tt R\_390\_TLS\_DTPOFF & \tt x \\
\end{tabular}
}
\medskip

The relocations {\tt R\_390\_TLS\_GD64}, {\tt R\_390\_TLS\_DTPMOD} and
{\tt R\_390\_TLS\_DTPOFF} do the same as their s390 counterparts, only the
bit size of the relocation target is 64 bit instead of 32 bit.

\setcounter{section}{4}
\setcounter{subsection}{2}
\setcounter{subsubsection}{6}
\subsubsection{s390 Local Dynamic TLS Model}
The code sequence of the local dynamic tls model for s390 does not provide
any advantage over the general dynamic model if only a single variable
is accessed. It is even slightly worse because an additional literal pool
entry is needed ({\tt x@tlsldm} and {\tt x@dtpoff} instead of just
{\tt x@tlsgd}) that has to get loaded and added to the return value of the
{\tt \_\_tls\_get\_offset} function call. The local dynamic model is much
better than the global dynamic model if more than a single local variable
is accessed because for every additional variable only a simple literal
pool load is needed instead of a full blown function call.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Local Dynamic Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bas & \tt \%r14,0(\%r6,\%r13) &
                \tt R\_390\_TLS\_LDCALL & \tt x1 \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt l & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x1 & & \\
& \tt l & \tt \%r9,.L4-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x1@tlsldm} &
                \tt R\_390\_TLS\_LDM32 & \tt x1 \\
\tt .L3: & \multicolumn{2}{l|}{\tt .long x1@dtpoff} &
                \tt R\_390\_TLS\_LDO32 & \tt x1 \\
\tt .L4: & \multicolumn{2}{l|}{\tt .long x2@dtpoff} &
                \tt R\_390\_TLS\_LDO32 & \tt x2 \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
& \multicolumn{2}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_DTPMOD & \tt x \\
\end{tabular}
}
\medskip

As for the IA-32 local dynamic TLS model semantic the {\tt x1@tlsldm}
expression in the literal pool instructs the assembler to emit a
{\tt R\_390\_TLS\_LDM32} relocations. The linker will create a special
{\tt tls\_index} object on the GOT for it with the {\tt ti\_offset}
element set to zero. The {\tt ti\_module} element will be filled with the
module ID of the module the code is in when it processes the
{\tt R\_390\_TLS\_LDM32} relocation. The literal pool entries
{\tt x1@dtpoff} and {\tt x2@dtpoff} are translated by the assembler
into {\tt R\_390\_TLS\_LDO32} relocations. The linker will calculate the
offsets for {\tt x1} and {\tt x2} in the TLS block for the module
and will write them to the literal pool.

The instruction sequence is diveded into four parts. The first part is
analog to the first part of the general dynamic model. The second part
calls {\tt \_\_tls\_get\_offset} with the GOT offset to the special
{\tt tls\_index} object created through the {\tt x@tlsldm} entry in
the literal pool. The GOT register {\tt \%r12} has to be setup before
the call. After the third instruction in the second part of the code
sequence {\tt \%r8} contains the address of the thread local memory for
the module the code is in. Part three of the code sequence shows how the
addresses of the thread local variable {\tt x1} and {\tt x2} are
calculated. Part four shows the literal pool entries needed by the
code sequence.

All the instruction of the local dynamic code sequence can be scheduled
freely by the compiler as long as the obvious data dependencies are
fulfilled and the function call semantic of the {\tt bas} instruction
is taken into account.

\subsubsection{s390x Local Dynamic TLS Model}
The local dynamic access model for s390x is similar to the s390 version.
The same differences as between the two general dynamic models for s390 vs.
s390x are present. The extraction of the thread pointer requires three
instruction instead of one, the branch to {\tt \_\_tls\_get\_offset} is
done with the {\tt brasl} instruction and the offsets have 64 bit
instead of 32 bit.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Local Dynamic Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brasl & \tt \%r14,\_\_tls\_get\_offset@plt &
                 \tt R\_390\_TLS\_LDCALL & \tt x1 \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt lg & \tt \%r9,.L2-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x1 & & \\
& \tt lg & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x1@tlsldm} &
                \tt R\_390\_TLS\_LDM64 & \tt x1 \\
\tt .L2: & \multicolumn{2}{l|}{\tt .quad x1@dtpoff} &
                \tt R\_390\_TLS\_LDO64 & \tt x1 \\
\tt .L3: & \multicolumn{2}{l|}{\tt .quad x2@dtpoff} &
                \tt R\_390\_TLS\_LDO64 & \tt x2 \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_DTPMOD & \tt x \\
\end{tabular}
}
\medskip

\setcounter{section}{4}
\setcounter{subsection}{3}
\setcounter{subsubsection}{6}
\subsubsection{s390 Initial Exec TLS Model}
The code for the initial exec model is small and fast. The code has to
get the offset relative to the thread pointer from the GOT and add it
to the thread pointer. There are three different variants.
The position independent variant with a small GOT (-fpic) is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r9,x@gotntpoff(\%r12) &
                \tt R\_390\_TLS\_GOTIE12 & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \hline
\hspace{2em} & & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF32 & \tt x \\
\end{tabular}
}
\medskip

The {\tt R\_390\_TLS\_GOTIE12} relocation created for the expression
{\tt x@gotntpoff} causes the linker to generate a GOT entry with a
{\tt R\_390\_TLS\_TPOFF} relocation. {\tt x@gotntpoff} is replaced by
the linker with the 12 bit offset from the start of the GOT to the
generated GOT entry. The {\tt R\_390\_TLS\_TPOFF} relocation is
processed at program startup time by the dynamic linker.

\medskip
The position independent variant with a large GOT (-fPIC) is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt l & \tt \%r9,0(\%r8,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF32 & \tt x \\
\end{tabular}
}
\medskip

The {\tt R\_390\_TLS\_GOTIE32} relocation does the same as
{\tt R\_390\_TLS\_GOTIE12}, the difference is that the linker replaces
the {\tt x@gotntpoff} expression with a 32 bit GOT offset instead of
12 bit. 

\medskip
The variant without GOT pointer is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt l & \tt \%r9,0(\%r8) & \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@indntpoff} &
                \tt R\_390\_TLS\_IE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF32 & \tt x \\
\end{tabular}
}
\medskip

The {\tt R\_390\_TLS\_IE32} relocation instructs the linker
to create the same GOT entry as for {\tt R\_390\_TLS\_GOTIE\{12,32\}}
but the linker replaces the {\tt x@indntpoff} expression with the absolute
address of the created GOT entry. This makes the variant without GOT
pointer inadequate for position independent code.

\subsubsection{s390x Initial Exec TLS Model}
The initial exec model for s390x works like the initial exec model
for s390. The position independent variant with a small GOT (-fpic) is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r9,x@gotntpoff(\%r12) &
                \tt R\_390\_TLS\_GOTIE12 & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \hline
\hspace{2em} & & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF32 & \tt x \\
\end{tabular}
}
\medskip

The position independent variant with a large GOT (-fPIC) is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt lg & \tt \%r9,0(\%r8,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF64 & \tt x \\
\end{tabular}
}
\medskip

The linker will replace {\tt x@gotntpoff} for {\tt R\_390\_TLS\_GOTIE64}
with a 64 bit GOT offset.
The variant without GOT pointer is:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Initial Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt larl & \tt \%r8,x@indntpoff &
                \tt R\_390\_TLS\_IEENT & \tt x \\
& \tt lg & \tt \%r9,0(\%r8) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \hline
\hspace{2em} & & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF64 & \tt x \\
\end{tabular}
}
\medskip

The {\tt R\_390\_TLS\_IEENT} relocations causes {\tt x@indntpoff} to
be replaced with the relative offset from the {\tt larl} instruction
to the GOT entry. Because the instruction is pc relative the variant
without GOT pointer can be used in position independent code as well.

\setcounter{section}{4}
\setcounter{subsection}{4}
\setcounter{subsubsection}{6}
\subsubsection{s390 Local Exec TLS Model}
The local exec model for s390 is only an addition of the offset which is
available as an immediate value to the thread pointer. In general the
offset can have 32 bit which requires a literal pool entry.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Local Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\ \cline{1-3}
& \tt la & \tt \%r9,0(\%r8,\%r7) \# \%r9 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

The linker resolves the {\tt R\_390\_TLS\_LE32} relocation to a
negative offset to the thread pointer.

\subsubsection{s390x Local Exec TLS Model}
The local exec model for s390x differs to the s390 model only in the
thread pointer extraction and the size of the offset.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf Local Exec Model Code Sequence}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\ \cline{1-3}
& \tt la & \tt \%r9,0(\%r8,\%r7) \# \%r9 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\setcounter{section}{5}
\setcounter{subsection}{5}
\subsection{s390 Linker Optimizations}
The s390 ABI defines the same four linker optimizations as IA-32. The
optimizations explain the {\tt \_\_tls\_get\_offset} function. All
code sequences for s390 consist of basically three things: 1) extract
the thread pointer, 2) get the offset of the requested variable to the
thread pointer and 3) an operation on the variable with an index/base
operand that combines the thread pointer and the offset (e.g.
{\tt la \%rx,0(\%ry,\%rz)}). All the optimizations have to do is to
change the method how the offset is acquired.

\subsubsection*{General Dynamic To Initial Exec}
The general dynamic access model is the most expensive one which makes
this transition the most important one. For the general dynamic access
model the code has to load a GOT offset from the literal pool and then
call {\tt \_\_tls\_get\_offset} to get back the offset of the variable
from the thread pointer. For the initial exec access model the code has
to load a GOT entry that contains the offset of the variable from the
thread pointer. One of the initial exec code variants uses a literal
pool entry for the GOT offset. This makes the transition simple, the
function call instruction is replaced by a load instruction and the
literal pool constant {\tt x@tlsgd} is replaced with {\tt x@gotntpoff}:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf GD $\rightarrow$ IE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bas & \tt \%r14,0(\%r6,\%r13) &
                \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x@tlsgd} &
                \tt R\_390\_TLS\_GD32 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt l & \tt \%r2,0(\%r2,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF32 & \tt x \\
\end{tabular}
}
\medskip

\subsubsection*{General Dynamic To Local Exec}
The optimization that turns the general dynamic code sequence into the
local exec code sequence is as simple as the general dynamic to initial
exec transition. The local exec code sequence loads the offset of
the variable to the thread pointer directly from the literal pool. The
function call instruction of the general dynamic code sequence is
turned into a nop and the literal pool constant {\tt x@tlsgd} is
replaced with {\tt x@ntpoff}:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf GD $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bas & \tt \%r14,0(\%r6,\%r13) &
                \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x@tlsgd} &
                \tt R\_390\_TLS\_GD32 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bc & \tt 0,0 \# nop & & \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\subsubsection*{Local Dynamic To Local Exec}
The local dynamic to local exec code transition is a bit more complicated.
To get the address of a thread local variable in the local dynamic model
three things need to be added: the thread pointer, the (negative) offset to
the TLS block of the module the code is in and the offset to the variable
in the TLS block. The local exec code just has to add the thread pointer
to the (negative) offset to the variable from the thread pointer. The
transition is done be replacing the function call with a nop, the literal
pool constant {\tt x1@tlsldm} with 0 and the {\tt @dtpoff} constants with
{\tt @ntpoff}:

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf LD $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bas & \tt \%r14,0(\%r6,\%r13) &
                \tt R\_390\_TLS\_LDCALL & \tt x1 \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt l & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x1 & & \\
& \tt l & \tt \%r9,.L4-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long x1@tlsldm} &
                \tt R\_390\_TLS\_LDM32 & \tt x1 \\
\tt .L3: & \multicolumn{2}{l|}{\tt .long x1@dtpoff} &
                \tt R\_390\_TLS\_LDO32 & \tt x1 \\
\tt .L4: & \multicolumn{2}{l|}{\tt .long x2@dtpoff} &
                \tt R\_390\_TLS\_LDO32 & \tt x2 \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt l & \tt \%r6,.L1-.L0(\%r13) & & \\
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r2,.L2-.L0(\%r13) & & \\
& \tt bc & \tt 0,0 \# nop & & \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt l & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x1 & & \\
& \tt l & \tt \%r9,.L4-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r10,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long \_\_tls\_get\_offset@plt-.L0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .long 0} & & \\
\tt .L3: & \multicolumn{2}{l|}{\tt .long x1@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x1 \\
\tt .L4: & \multicolumn{2}{l|}{\tt .long x2@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x2 \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_DTPMOD & \tt x \\
\end{tabular}
}
\medskip

\subsubsection*{Initial Exec To Local Exec}
The code transition from initial exec to local exec doesn't improve the
execution speed but for two of the three initial exec variants a GOT
entry less is needed.

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf IE $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt l & \tt \%r9,0(\%r8,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE32 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt lr & \tt \%r9,\%r8 ; bcr 0,\%r0 & & \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf IE $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt l & \tt \%r9,0(\%r8) & \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@indntpoff} &
                \tt R\_390\_TLS\_GOTIE32 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\ \cline{1-3}
& \tt l & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt lr & \tt \%r9,\%r8 ; bcr 0,\%r0 & & \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .long x@ntpoff} &
                \tt R\_390\_TLS\_LE32 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

There is no IE $\rightarrow$ LE code transition for the small GOT
case because no literal pool entry exists where the modified constant
{\tt x@ntpoff} could be stored. For this case a slot in the GOT is
used for the constant.

\subsection{s390x Linker Optimizations}
The same four optimizations as for s390 are available for s390x. The
optimizations follow the same principles but with 64 bit instructions
instead of 32 bit instructions. The 6 byte {\tt brasl} instruction
is replaced with either the 6 byte {\tt lg} load instruction or the
6 byte {\tt brcl 0,.} nop. The 6 byte {\tt lg} instruction is replaced
with the 6 byte triadic shift by 0 bit {\tt sllg} that is used instead
of the more appropriate {\tt lgr} which unfortunatly has only 4 byte.

\subsubsection*{General Dynamic to Initial Exec}

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf GD $\rightarrow$ IE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brasl & \tt \%r14,\_\_tls\_get\_offset@plt &
                       \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@tlsgd} &
                \tt R\_390\_TLS\_GD64 & \tt x \\
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt lg & \tt \%r2,0(\%r2,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\multicolumn{3}{l|}{\tt GOT[n]} & \tt R\_390\_TLS\_TPOFF64 & \tt x \\
\end{tabular}
}
\medskip

\subsubsection*{General Dynamic to Local Exec}

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf GD $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brasl & \tt \%r14,\_\_tls\_get\_offset@plt &
                       \tt R\_390\_TLS\_GDCALL & \tt x \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@tlsgd} &
                \tt R\_390\_TLS\_GD64 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brcl & \tt 0,. & & \\ \cline{1-3}
& \tt la & \tt \%r8,0(\%r2,\%r7) \# \%r8 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\subsubsection*{Local Dynamic to Local Exec}

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf LD $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brasl & \tt \%r14,\_\_tls\_get\_offset@plt &
                 \tt R\_390\_TLS\_LDCALL & \tt x1 \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt lg & \tt \%r9,.L2-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x1 & & \\
& \tt lg & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x1@tlsldm} &
                \tt R\_390\_TLS\_LDM64 & \tt x1 \\
\tt .L2: & \multicolumn{2}{l|}{\tt .quad x1@dtpoff} &
                \tt R\_390\_TLS\_LDO64 & \tt x1 \\
\tt .L3: & \multicolumn{2}{l|}{\tt .quad x2@dtpoff} &
                \tt R\_390\_TLS\_LDO64 & \tt x2 \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r2,.L1-.L0(\%r13) & & \\
& \tt brcl & \tt 0,. & & \\
& \tt la & \tt \%r8,0(\%r2,\%r7) & & \\ \cline{1-3}
& \tt lg & \tt \%r9,.L2-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x1 & & \\
& \tt lg & \tt \%r9,.L3-.L0(\%r13) & & \\
& \tt la & \tt \%r10,0(\%r9,\%r8) \# \%r10 = \&x2 & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad 0} & & \\
\tt .L2: & \multicolumn{2}{l|}{\tt .quad x1@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x1 \\
\tt .L3: & \multicolumn{2}{l|}{\tt .quad x2@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x2 \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\subsubsection*{Initial Exec to Local Exec}

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf IE $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt lg & \tt \%r9,0(\%r8,\%r12) &
                \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@gotntpoff} &
                \tt R\_390\_TLS\_GOTIE64 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt sllg & \tt \%r9,\%r8,0 & & \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\medskip
\centerline{\small
\begin{tabular}{ll@{ }l|lc}
\multicolumn{3}{l}{\bf IE $\rightarrow$ LE Code Transition}
                        & \bf Initial Relocation & \bf Symbol \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt lg & \tt \%r9,0(\%r8) & \tt R\_390\_TLS\_LOAD & \tt x \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@indntpoff} &
                \tt R\_390\_TLS\_GOTIE64 & \tt x \\ \hline
\multicolumn{3}{c}{$\Downarrow$} & \multicolumn{1}{c}{$\Downarrow$} &
                $\Downarrow$ \\ \hline
& \tt ear & \tt \%r7,\%a0 & & \\
& \tt sllg & \tt \%r7,\%r7,32 & & \\
& \tt ear & \tt \%r7,\%a1 & & \\ \cline{1-3}
& \tt lg & \tt \%r8,.L1-.L0(\%r13) & & \\
& \tt sllg & \tt \%r9,\%r8,0 & & \\ \cline{1-3}
& \tt la & \tt \%r10,0(\%r9,\%r7) \# \%r10 = \&x & & \\ \cline{1-3}
\tt .L0: & \multicolumn{2}{l|}{\tt \# literal pool, address in \%r13} & & \\
\tt .L1: & \multicolumn{2}{l|}{\tt .quad x@ntpoff} &
                \tt R\_390\_TLS\_LE64 & \tt x \\ \hline
& & & \multicolumn{2}{c}{\bf Outstanding Relocations} \\
\end{tabular}
}
\medskip

\newpage

\setcounter{section}{6}
\setcounter{subsection}{5}
\subsection{New s390/s390x ELF Definitions}

\begin{verbatim}
#define R_390_TLS_LOAD    37 /* Tag for load insn in TLS code */
#define R_390_TLS_GDCALL  38 /* Tag for call insn in TLS code */
#define R_390_TLS_LDCALL  39 /* Tag for call insn in TLS code */
#define R_390_TLS_GD32    40 /* Direct 32 bit for general dynamic
                                thread local data */
#define R_390_TLS_GD64    41 /* Direct 64 bit for general dynamic
                                thread local data */
#define R_390_TLS_GOTIE12 42 /* 12 bit GOT offset for static TLS
                                block offset */
#define R_390_TLS_GOTIE32 43 /* 32 bit GOT offset for static TLS
                                block offset */
#define R_390_TLS_GOTIE64 44 /* 64 bit GOT offset for static TLS
                                block offset*/
#define R_390_TLS_LDM32   45 /* Direct 32 bit for local dynamic
                                thread local data in LE code */
#define R_390_TLS_LDM64   46 /* Direct 64 bit for local dynamic
                                thread local data in LE code */
#define R_390_TLS_IE32    47 /* 32 bit address of GOT entry for
                                negated static TLS block offset */
#define R_390_TLS_IE64    48 /* 64 bit address of GOT entry for
                                negated static TLS block offset */
#define R_390_TLS_IEENT   49 /* 32 bit rel. offset to GOT entry for
                                negated static TLS block offset */
#define R_390_TLS_LE32    50 /* 32 bit negated offset relative
                                to static TLS block */
#define R_390_TLS_LE64    51 /* 64 bit negated offset relative
                                to static TLS block */
#define R_390_TLS_LDO32   52 /* 32 bit offset relative to TLS
                                block */
#define R_390_TLS_LDO64   53 /* 64 bit offset relative to TLS
                                block */
#define R_390_TLS_DTPMOD  54 /* ID of module containing symbol */
#define R_390_TLS_DTPOFF  55 /* Offset in TLS block */
#define R_390_TLS_TPOFF   56 /* Negated offset in static TLS
                                block */
\end{verbatim}
The operators used in the code sequences are defined as follows:
\begin{list}{}{\leftmargin2em\itemindent-2em}
\item {\tt @tlsgd} Allocate two contiguous entries in the GOT to hold a
{\tt tls\_index} structure. The value of the expression {\tt x@tlsgd}
is the offset from the start of the GOT to the {\tt tls\_index} structure
for the symbol {\tt x}. The call to {\tt \_\_tls\_get\_offset} with the
GOT offset to the {\tt tls\_index} structure of x will return the offset
of the thread local variable x to the TCB pointer. The {\tt @tlsgd}
operator may be used only in the general dynamic access model as shown
above.
\item {\tt @tlsldm} Allocate two contiguous entries in the GOT to hold
a {\tt tls\_index} structure. The {\tt ti\_offset} field of the object
will be set to 0 (zero) and the {\tt ti\_module} field will be filled
in a at run-time. The value of the expression {\tt x@tlsldm} is the
offset from the start of the GOT to this special {\tt tls\_index}
structure. The call to {\tt \_\_tls\_get\_offset} with the GOT offset to
this special {\tt tls\_index} structure will return the offset of the
dynamic TLS block to the TCB pointer. The {\tt @tlsgd} operator may
be used only in the local dynamic access model as shown above.
\item {\tt @dtpoff} Calculate the offset of the variable relative to the
start of the TLS block it is contained in. The {\tt @dtpoff} operator may
be used only in the local dynamic access model as shown above.
\item {\tt @ntpoff} The value of the expression {\tt x@ntpoff} is the
offset of the thread local variable {\tt x} relative to the TCB pointer. No
GOT entry is created in this case. The {\tt @ntpoff} operator may be used
only in the local exec model as shown above.
\item {\tt @gotntpoff} Allocate a GOT entry to hold the offset of a variable
in the initial TLS block relative to the TCB pointer. The value of of the
expression {\tt x@gotntpoff} is offset in the GOT to the allocated entry.
The {\tt @gotntpoff} operator may be used only in the initial exec model as
shown above.
\item {\tt @indntpoff}
This expression is similar to {\tt @gotntpoff}. The difference is that the
value of {\tt x@indntpoff} is not a GOT offset but the address of the allocated
GOT entry itself. It is used in position dependent code and in combination
with the {\tt larl} instruction. The {\tt @indntpoff} operator may be used
only in the initial exec model as shown above.
\end{list}

\end{document}



More information about the Binutils mailing list