Bug 27953

Summary: IE->LE is not happening for riscv in linker relaxation.
Product: binutils Reporter: chschandan <chschandan>
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED DUPLICATE    
Severity: normal CC: nelsonc1225
Priority: P2    
Version: 2.36.1   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description chschandan@gmail.com 2021-06-04 05:50:23 UTC
When a __thread variable is defined and accessed within an executable, we should be able to access it using a single TP based instruction.

 10158: 00022503 lw a0,0(tp) # 0 <ThreadVar>
However if a __thread variable is defined in another module, but used in another module, then, even if both modules are in the executable (not in a shared library), the code contains an unnecessary extra level of indirection through the global offset table:

 10170: 00002517 auipc a0,0x2
 10174: ea853503 ld a0,-344(a0) # 12018 <_GLOBAL_OFFSET_TABLE_+0x8>
 10178: 9512 add a0,a0,tp
 1017a: 4108 lw a0,0(a0)
Note that the compiler cannot know whether an external __thread variable is defined in the executable or in a shared library. Therefore at compile time, the extra level of indirection has to be included.

However a standard linker "TLS relaxation" (Initial Exec => Local Exec) is supposed to optimize the code in the case where the referenced variable turns out to be defined in the executable.

Unfortunately this has not yet been implemented by the GNU linker for RISC-V (as of GNU Binutils 2.36.1).

$ cat thr1.c
extern __thread int ThreadVar;
int _start(void)
{
 return ThreadVar;
}
$ cat thr2.c
__thread int ThreadVar = 123;
 

The optimal code can be seen by compiling with -ftls-model=local-exec (we cannot use that option in general since we do not know at compile time whether we are compiling a static or dynamic executable). 

 $ clang -O2 -target riscv64 -march=rv64imafdc -mabi=lp64d -c thr1.c thr2.c -ftls-model=local-exec
$ ldriscv -melf64lriscv -o thr.vxe thr1.o thr2.o
$ objdumpriscv -S thr.vxe
thr.vxe: file format elf64-littleriscv

Disassembly of section .text:
0000000000010158 <_start>:
 10158: 00022503 lw a0,0(tp) # 0 <ThreadVar>
 1015c: 8082 ret
 ...
 

When we don't compile for local-exec, we expect the linker to perform the "initial-exec" => "local-exec" optimization - but it doesn't!

$ clang -O2 -target riscv64 -march=rv64imafdc -mabi=lp64d -c thr1.c thr2.c
$ ldriscv -melf64lriscv -o thr.vxe thr1.o thr2.o
$ objdumpriscv -S thr.vxe
thr.vxe: file format elf64-littleriscv

Disassembly of section .text:
0000000000010170 <_start>:
 10170: 00002517 auipc a0,0x2
 10174: ea853503 ld a0,-344(a0) # 12018 <_GLOBAL_OFFSET_TABLE_+0x8>
 10178: 9512 add a0,a0,tp
 1017a: 4108 lw a0,0(a0)
 1017c: 8082 ret
Comment 1 Nelson Chu 2021-06-04 06:34:18 UTC
TLS transitions are duplicate to
https://sourceware.org/bugzilla/show_bug.cgi?id=24676.

I'm not sure if the transitions should be implemented only in linker relaxation, since x86 and other targets don't have relaxations and translate TLS models when relocating.  It would be great if you can refer to their implementations, I think the x86 TLS transition is the correct way to do.

*** This bug has been marked as a duplicate of bug 24676 ***