[hjl@gnu-6 gold]$ cat x.s .text .globl _start _start: movq %fs:0, %rax addl $0x4c000000,%ebx addl external_ie@GOTTPOFF(%rip), %eax retq .globl external_ie .section .tdata,"awT",@progbits .align 4 .type external_ie, @object .size external_ie, 4 external_ie: .long 100 [hjl@gnu-6 gold]$ make cc -O2 -c x.s ./ld -o x x.o objdump -dw x x: file format elf64-x86-64 Disassembly of section .text: 00000000004000e8 <_start>: 4000e8: 64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax 4000f1: 81 c3 00 00 00 4d add $0x4d000000,%ebx 4000f7: 8d 80 fc ff ff ff lea -0x4(%rax),%eax 4000fd: c3 retq [hjl@gnu-6 gold]$ The second instruction is changed from addl $0x4c000000,%ebx to add $0x4d000000,%ebx
The same thing happens with ld.bfd: [hjl@gnu-6 pr17795]$ make LD=ld.bfd cc -O2 -c x.s ld.bfd -o x x.o objdump -dw x x: file format elf64-x86-64 Disassembly of section .text: 00000000004000e8 <_start>: 4000e8: 64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax 4000f1: 81 c3 00 00 00 4d add $0x4d000000,%ebx 4000f7: 8d 80 fc ff ff ff lea -0x4(%rax),%eax 4000fd: c3 retq [hjl@gnu-6 pr17795]$ Target_x86_64<size>::Relocate::tls_ie_to_le has unsigned char op1 = view[-3]; unsigned char op2 = view[-2]; unsigned char op3 = view[-1]; unsigned char reg = op3 >> 3; It is safe only if view[-3] is a REX prefix of the current instruction. However, I can't find a good way to detect if view[-3] is a REX prefix or the last byte of the previous instruction. Compilers may have to always generate a REX prefix even if it isn't needed to encode the instruction.
As you pointed out, there is no good way to verify that the instruction we're modifying is actually a movq or addq. Checking for the 0x4c prefix is about as good as we can do, and beyond that we have to count on the compiler generating ABI-conforming code (where by "ABI", I mean Ulrich's TLS paper). The GOTTPOFF relocation must be applied to either a movq or addq instruction, and the destination register must be RAX, so a REX prefix is always required. (It's not clear in Ulrich's document whether it *must* be RAX, but I'd definitely expect it to be a quadword register requiring a REX prefix.) Otherwise, the code is not conforming to the ABI, and the linker optimization may break it.