Bug 27953 - IE->LE is not happening for riscv in linker relaxation.
Summary: IE->LE is not happening for riscv in linker relaxation.
Status: RESOLVED DUPLICATE of bug 24676
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.36
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-04 05:50 UTC by chschandan@gmail.com
Modified: 2022-06-22 06:31 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chschandan@gmail.com 2021-06-04 05:50:23 UTC
When a __thread variable is defined and accessed within an executable, we should be able to access it using a single TP based instruction.

 10158: 00022503 lw a0,0(tp) # 0 <ThreadVar>
However if a __thread variable is defined in another module, but used in another module, then, even if both modules are in the executable (not in a shared library), the code contains an unnecessary extra level of indirection through the global offset table:

 10170: 00002517 auipc a0,0x2
 10174: ea853503 ld a0,-344(a0) # 12018 <_GLOBAL_OFFSET_TABLE_+0x8>
 10178: 9512 add a0,a0,tp
 1017a: 4108 lw a0,0(a0)
Note that the compiler cannot know whether an external __thread variable is defined in the executable or in a shared library. Therefore at compile time, the extra level of indirection has to be included.

However a standard linker "TLS relaxation" (Initial Exec => Local Exec) is supposed to optimize the code in the case where the referenced variable turns out to be defined in the executable.

Unfortunately this has not yet been implemented by the GNU linker for RISC-V (as of GNU Binutils 2.36.1).

$ cat thr1.c
extern __thread int ThreadVar;
int _start(void)
{
 return ThreadVar;
}
$ cat thr2.c
__thread int ThreadVar = 123;
 

The optimal code can be seen by compiling with -ftls-model=local-exec (we cannot use that option in general since we do not know at compile time whether we are compiling a static or dynamic executable). 

 $ clang -O2 -target riscv64 -march=rv64imafdc -mabi=lp64d -c thr1.c thr2.c -ftls-model=local-exec
$ ldriscv -melf64lriscv -o thr.vxe thr1.o thr2.o
$ objdumpriscv -S thr.vxe
thr.vxe: file format elf64-littleriscv

Disassembly of section .text:
0000000000010158 <_start>:
 10158: 00022503 lw a0,0(tp) # 0 <ThreadVar>
 1015c: 8082 ret
 ...
 

When we don't compile for local-exec, we expect the linker to perform the "initial-exec" => "local-exec" optimization - but it doesn't!

$ clang -O2 -target riscv64 -march=rv64imafdc -mabi=lp64d -c thr1.c thr2.c
$ ldriscv -melf64lriscv -o thr.vxe thr1.o thr2.o
$ objdumpriscv -S thr.vxe
thr.vxe: file format elf64-littleriscv

Disassembly of section .text:
0000000000010170 <_start>:
 10170: 00002517 auipc a0,0x2
 10174: ea853503 ld a0,-344(a0) # 12018 <_GLOBAL_OFFSET_TABLE_+0x8>
 10178: 9512 add a0,a0,tp
 1017a: 4108 lw a0,0(a0)
 1017c: 8082 ret
Comment 1 Nelson Chu 2021-06-04 06:34:18 UTC
TLS transitions are duplicate to
https://sourceware.org/bugzilla/show_bug.cgi?id=24676.

I'm not sure if the transitions should be implemented only in linker relaxation, since x86 and other targets don't have relaxations and translate TLS models when relocating.  It would be great if you can refer to their implementations, I think the x86 TLS transition is the correct way to do.

*** This bug has been marked as a duplicate of bug 24676 ***