Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Hi, Evandro,

Sorry that it took so long for me to get back to you after the GCC
Summit.  I've been quite busy and couldn't focus on this issue for a

Here's an updated patch the should address all of your concerns.  The
proposed ABI changes haven't changed at all for almost a year, and in
the mean time we've ported it to one more platform (ARM), so I believe
this is rock solid now.

Let me know what you think about the proposed changes.  They document
what's implemented in GNU binutils, GCC and the pending patches I have
for glibc, that I'm retesting after updating them to a current tree.


for ChangeLog
from  Alexandre Oliva  <>

	* object-files.tex (Relocation Types): Add
	R_X86_64_TLSDESC.  Add pointer to description.  Add short
	description of all TLS relocations.  Fix typo in DTPMOD64.
	* dl.tex (Procedure Linkage Table): Mention lazy relocation of TLS
	descriptors.  Add short description.

Index: dl.tex
--- dl.tex.orig	2006-10-08 16:53:13.000000000 -0300
+++ dl.tex	2006-10-08 17:39:44.000000000 -0300
@@ -265,6 +265,22 @@ evaluates procedure linkage table entrie
 resolution and relocation until the first execution of a table entry.
 \index{procedure linkage table|)}
+Relocation entries of type \codeindex{R_X86_64_TLSDESC} may also be
+subject to lazy relocation, using a single entry in the procedure
+linkage table and in the global offset table, at locations given by
+\texttt{DT_TLSDESC_PLT} and \texttt{DT_TLSDESC_GOT}, respectively, as
+described in ``Thread-Local Storage Descriptors for IA32 and
+AMD64/EM64T''\footnote{This document is currently available via
+  \url{}}.
+For self-containment, \texttt{DT_TLSDESC_GOT} specifies a GOT entry in
+which the dynamic loader should store the address of its internal TLS
+Descriptor resolver function, whereas \texttt{DT_TLSDESC_PLT}
+specifies the address of a PLT entry to be used as the TLS descriptor
+resolver function for lazy resolution from within this module.  The
+PLT entry must push the linkmap of the module onto the stack and
+tail-call the internal TLS Descriptor resolver function.
 \subsubsection{Large Models}
 In the small and medium code models the size of both the PLT and the GOT
Index: object-files.tex
--- object-files.tex.orig	2006-10-08 16:53:13.000000000 -0300
+++ object-files.tex	2006-10-08 17:46:49.000000000 -0300
@@ -435,7 +435,7 @@ the relocation addend.
       \texttt{R_X86_64_PC16}  & 13 & \textit{word16} & \texttt{S + A - P} \\
       \texttt{R_X86_64_8}     & 14 & \textit{word8} & \texttt{S + A} \\
       \texttt{R_X86_64_PC8}   & 15 & \textit{word8} & \texttt{S + A - P} \\
-      \texttt{R_X86_64_DPTMOD64}   & 16 & \textit{word64} &  \\
+      \texttt{R_X86_64_DTPMOD64}   & 16 & \textit{word64} &  \\
       \texttt{R_X86_64_DTPOFF64}   & 17 & \textit{word64} &  \\
       \texttt{R_X86_64_TPOFF64}   & 18 & \textit{word64} &  \\
       \texttt{R_X86_64_TLSGD}   & 19 & \textit{word32} &  \\
@@ -448,6 +448,9 @@ the relocation addend.
       \texttt{R_X86_64_GOTPC32} & 26 & \textit{word32} & \texttt{GOT + A - P} \\
       \texttt{R_X86_64_SIZE32} & 32 & \textit{word32} & \texttt{Z + A} \\
       \texttt{R_X86_64_SIZE64} & 33 & \textit{word64} & \texttt{Z + A} \\
+      \texttt{R_X86_64_GOTPC32_TLSDESC} & 34 & \textit{word32} &  \\
+      \texttt{R_X86_64_TLSDESC_CALL} & 35 & none &  \\
+      \texttt{R_X86_64_TLSDESC} & 36 & \textit{word64}$\times 2$ & \\
 %      \texttt{R_X86_64_GOT64} & 16 & \textit{word64} & \texttt{G + A} \\
 %      \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
@@ -469,6 +472,7 @@ to those used for the \intelabi.  \footn
   loading the offset into a displacement register; the base plus
   immediate displacement addressing form can be used.}
 The \texttt{R_X86_64_GOTPCREL} relocation has different semantics from the
 \texttt{R_X86_64_GOT32} or equivalent i386 \texttt{R_I386_GOTPC} relocation.
 In particular, because the \xARCH architecture has an addressing mode relative
@@ -477,6 +481,7 @@ using a single instruction.  The calcula
 \texttt{R_X86_64_GOTPCREL} relocation gives the difference between the location
 in the GOT where the symbol's address is given and the location where the
 relocation is applied.
 The \texttt{R_X86_64_32} and \texttt{R_X86_64_32S} relocations truncate
@@ -492,19 +497,72 @@ relocations is not conformant to this AB
 added for documentation purposes.  The \texttt{R_X86_64_16}, and
 \texttt{R_X86_64_8} relocations truncate the computed value to 16-bits
 resp. 8-bits.
-The relocations \texttt{R_X86_64_DPTMOD64},
-\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64} ,
-\texttt{R_X86_64_TLSGD} , \texttt{R_X86_64_TLSLD} ,
+The relocations \texttt{R_X86_64_DTPMOD64},
+\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64},
+\texttt{R_X86_64_TLSGD}, \texttt{R_X86_64_TLSLD},
 \texttt{R_X86_64_DTPOFF32}, \texttt{R_X86_64_GOTTPOFF} and
 \texttt{R_X86_64_TPOFF32} are listed for completeness.  They are part
 of the Thread-Local Storage ABI extensions and are documented in the
 document called ``ELF Handling for Thread-Local
 Storage''\footnote{This document is currently available via
-  \url{}}\index{Thread-Local Storage}.
+  \url{}}\index{Thread-Local
+  Storage}.  The relocations \texttt{R_X86_64_GOTPC32_TLSDESC},
+\texttt{R_X86_64_TLSDESC_CALL} and \texttt{R_X86_64_TLSDESC} are also
+used for Thread-Local Storage, but are not documented there as of this
+writing.  A description can be found in the document ``Thread-Local
+Storage Descriptors for IA32 and AMD64/EM64T''\footnote{This document
+  is currently available via
+  \url{}}.
+In order to make this document self-contained, a description of the
+TLS relocations follows.
+\texttt{R_X86_64_DTPMOD64} resolves to the index of the dynamic thread
+vector entry that points to the base address of the TLS block
+corresponding to the module that defines the referenced symbol.
+\texttt{R_X86_64_DTPOFF64} and \texttt{R_X86_64_DTPOFF32} compute the
+offset from the pointer in that entry to the referenced symbol.  The
+linker generates such relocations in adjacent entries in the GOT, in
+response to \texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD}
+relocations.  If the linker can compute the offset itself, because the
+referenced symbol binds locally, the \texttt{DTPOFF} may be omitted.
+Otherwise, such relocations are always in pairs, such that the
+\texttt{DTPOFF64} relocation applies to the word64 right past the
+corresponding \texttt{DTPMOD} relocation.
+\texttt{R_X86_64_TPOFF64} and \texttt{R_X86_64_TPOFF32} resolve to the
+offset from the thread pointer to a thread-local variable.  The former
+is generated in response to \texttt{R_X86_64_GOTTPOFF}, that resolves
+to a PC-relative address of a GOT entry containing such a 64-bit
+\texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD} both resolve to
+PC-relative offsets to a \texttt{DTPMOD} GOT entry.  The difference
+between them is that, for \texttt{TLSGD}, the following GOT entry will
+contain the offset of the referenced symbol into its TLS block,
+whereas, for \texttt{TLSLD}, the following GOT entry will contain the
+offset for the base address of the TLS block.  The idea is that adding
+this offset to the result of \texttt{DTPMOD32} for a symbol ought to
+yield the same as the result of \texttt{DTPMOD64} for the same symbol.
+\texttt{R_X86_64_TLSDESC} resolves to a pair of word64s, called TLS
+Descriptor, the first of which is a pointer to a function, followed by
+an argument.  The function is passed a pointer to the this pair of
+entries in \%rax and, using the argument in the second entry, it must
+compute and return in \%rax the offset from the thread pointer to the
+symbol referenced in the relocation, without modifying any registers
+other than processor flags.  \texttt{R_X86_64_GOTPC32_TLSDESC}
+resolves to the PC-relative address of a TLS descriptor corresponding
+to the named symbol.  \texttt{R_X86_64_TLSDESC_CALL} must annotate the
+instruction used to call the TLS Descriptor resolver function, so as
+to enable relaxation of that instruction.
 \subsection{Large Models}
 In order to extend both the PLT and the GOT beyond 2GB, it

On Sep 19, 2005, "Menezes, Evandro" <> wrote:

> Alexandre, 
>> Please read the document referenced in the patch, for 
>> starters.  In it you'll see the exact spelling of the coding 
>> samples is not final yet, and it doesn't make sense to 
>> maintain yet another copy of this until it settles down.  

> When it does, it'll be added to the ABI then.  Not before.  For now, it's OK to reserve the relocation numbers in this mailing list.  

>> Also, you'll find that the calculations are not quite 
>> possible to express in the way other relocations are 
>> expressed; suggestions are welcome.  

> State so, perhaps in a note, expanding what they mean.

>> Finally, what's wrong 
>> with following the existing practice of referring to TLS 
>> specs elsewhere?

> The intent is that the x86-64 ABI remains a stand-alone document as much as possible.  It's not quite there yet, but adding yet another external reference sets it back even further.

> BTW, the TLS reference is slated to be incorporated into the x86-64 ABI.

>> The point of this posting was more to reserve the relocation 
>> numbers for these purposes (the purpose of the relocations is 
>> quite solid already, even though the numbers have changed as 
>> recently as yesterday), but I'm yet to do some more 
>> performance tests with some minor variations of the code 
>> sequences to choose the best one.  I don't want to have to 
>> maintain all this information in sync between multiple specs 
>> documents and the several different packages that implement 
>> them; having a single specs document is much better for now.

> That's fine.  When it reaches a mature state, patches against the ABI will be more than welcome.

Alexandre Oliva
Secretary for FSF Latin America
Red Hat Compiler Engineer   aoliva@{,}
Free Software Evangelist  oliva@{,}

