12608 – TLS relocations issues on alpha

Bug 12608 - TLS relocations issues on alpha

Summary: TLS relocations issues on alpha

Status:	RESOLVED DUPLICATE of bug 12928

Alias:	None

Product:	binutils
Classification:	Unclassified
Component:	ld (show other bugs)
Version:	2.21

Importance:	P2 normal
Target Milestone:	---
Assignee:	unassigned

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-27 17:05 UTC by Aurelien Jarno
Modified:	2011-06-24 18:36 UTC (History)
CC List:	4 users (show)

See Also:
Host:	alphaev68-unknown-linux-gnu
Target:	alphaev68-unknown-linux-gnu
Build:	alphaev68-unknown-linux-gnu
Last reconfirmed:	2011-04-01 04:43:41

Attachments
working source code (EV4) (3.47 KB, application/octet-stream) 2011-03-27 17:06 UTC, Aurelien Jarno	Details
broken source code (EV5) (3.47 KB, application/octet-stream) 2011-03-27 17:07 UTC, Aurelien Jarno	Details
Patch to fix relax tls (670 bytes, patch) 2011-04-02 00:24 UTC, Michael Cree	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aurelien Jarno 2011-03-27 17:05:49 UTC

Some TLS glibc tests fail on alpha when built with gcc -mtune=ev5 or later CPU. Digging into the issues, it seems the problem comes from the linker.

The problem affects tst-tlsmod7.c and tst-tlsmod8.c. When built with -mtune=ev5, some instructions are re-ordered, which seems to confuse ld. Let's give the example of tst-tlsmod7. I have attached the corresponding source code to this bug report, it's possible to build the .so with "gcc -shared tst-tlsmod7.s -o tst-tlsmod7.so". The difference at the source level causing the issue between -mtune=ev4 and -mtune=ev5 is the following:

--- tst-tlsmod7.ev4.s
+++ tst-tlsmod7.ev5.s
@@ -429,8 +429,8 @@
        cmpeq $1,27,$1
        beq $1,$L36
        .loc 1 26 0
-       lda $16,local1($29)             !tlsldm!29
        ldq $27,__tls_get_addr($29)             !literal!29
+       lda $16,local1($29)             !tlsldm!29
        jsr $26,($27),__tls_get_addr            !lituse_tlsldm!29
        ldah $29,0($26)         !gpdisp!31
        ldah $0,local1($0)              !dtprelhi

When looking at the corresponding .o, the difference is still the same:

--- tst-tlsmod7.ev4.o
+++ tst-tlsmod7.ev5.o
@@ -295,10 +295,10 @@
  318:  08 00 21 a4     ldq     t0,8(t0)
  31c:  a1 75 23 40     cmpeq   t0,0x1b,t0
  320:  21 00 20 e4     beq     t0,3a8 <check1+0x178>
- 324:  00 00 1d 22     lda     a0,0(gp)
-                       324: TLSLDM     local1
- 328:  00 00 7d a7     ldq     t12,0(gp)
-                       328: ELF_LITERAL        __tls_get_addr
+ 324:  00 00 7d a7     ldq     t12,0(gp)
+                       324: ELF_LITERAL        __tls_get_addr
+ 328:  00 00 1d 22     lda     a0,0(gp)
+                       328: TLSLDM     local1
  32c:  00 40 5b 6b     jsr     ra,(t12),330 <check1+0x100>
                        32c: LITUSE     .text+0x5
                        32c: HINT       __tls_get_addr

When looking at the corresponding .so, the difference has now changed, probably due to a relocation issue in the linker:

--- tst-tlsmod7.ev4.so
+++ tst-tlsmod7.ev5.so
@@ -744,7 +744,7 @@
  d38:  08 00 21 a4     ldq     t0,8(t0)
  d3c:  a1 75 23 40     cmpeq   t0,0x1b,t0
  d40:  21 00 20 e4     beq     t0,dc8 <check1+0x178>
- d44:  58 80 1d a6     ldq     a0,-32680(gp)
+ d44:  58 80 7d a7     ldq     t12,-32680(gp)
  d48:  00 00 fe 2f     unop
  d4c:  9e 00 00 00     rduniq
  d50:  00 04 00 42     addq    a0,v0,v0

The value is now loaded into t12 instead of a0, which gives a segfault when this address is used a bit later.

Comment 1 Aurelien Jarno 2011-03-27 17:06:32 UTC

Created attachment 5328 [details]
working source code (EV4)

Comment 2 Aurelien Jarno 2011-03-27 17:07:02 UTC

Created attachment 5329 [details]
broken source code (EV5)

Comment 3 Witold Baryluk 2011-03-31 00:04:36 UTC

To test if it is regression,
I tested both asembly files using 2.18.1~cvs20080103-7,
and both produces EXACTLY same .so file. Shouldn't it
produce two different files?
(intermidiete .o files are different, just like in initial report).

Comment 4 Witold Baryluk 2011-03-31 00:32:44 UTC

Additionally, .o files are only differing in this places

$ cmp --verbose source-ev{4,5}.o
  871  35 175
  872  42 247
  875 175  35
  876 247  42
 8409  44  50
 8433  50  44
$

This actually do not exactly much my understaning relating object code and this numers (they do not much when doing, hex<->dec, but I will blame cmp reporting in strange way). Here is the same file piped thru hd, and then compared.


baryluk@a09:~/tls-relocation-bug$ diff -Nu source-ev4.o.hd  source-ev5.o.hd
--- source-ev4.o.hd	2011-03-31 01:37:45.000000000 +0200
+++ source-ev5.o.hd	2011-03-31 01:37:49.000000000 +0200
@@ -52,7 +52,7 @@
 00000330  2d 00 20 e4 00 00 3d a4  0d 00 5f 20 22 37 44 48  |-. ...=..._ "7DH|
 00000340  01 04 20 40 19 00 42 20  00 00 61 a4 23 36 7e 48  |.. @..B ..a.#6~H|
 00000350  a2 05 62 40 24 00 40 e4  08 00 21 a4 a1 75 23 40  |..b@$.@...!..u#@|
-00000360  21 00 20 e4 00 00 1d 22  00 00 7d a7 00 40 5b 6b  |!. ...."..}..@[k|
+00000360  21 00 20 e4 00 00 7d a7  00 00 1d 22 00 40 5b 6b  |!. ...}....".@[k|
 00000370  00 00 ba 27 00 00 00 24  11 00 3f 20 00 00 40 a4  |...'...$..? ..@.|
 00000380  00 00 bd 23 21 17 24 48  22 36 5e 48 10 00 21 20  |...#!.$H"6^H..! |
 00000390  a1 05 41 40 14 00 20 e4  00 00 00 20 08 00 20 a4  |..A@.. .... .. .|
@@ -521,9 +521,9 @@
 000020a0  10 00 00 00 00 00 00 00  bc 02 00 00 00 00 00 00  |................|
 000020b0  25 00 00 00 18 00 00 00  00 00 00 00 00 00 00 00  |%...............|
 000020c0  f4 02 00 00 00 00 00 00  25 00 00 00 1a 00 00 00  |........%.......|
-000020d0  00 00 00 00 00 00 00 00  24 03 00 00 00 00 00 00  |........$.......|
+000020d0  00 00 00 00 00 00 00 00  28 03 00 00 00 00 00 00  |........(.......|
 000020e0  1e 00 00 00 07 00 00 00  00 00 00 00 00 00 00 00  |................|
-000020f0  28 03 00 00 00 00 00 00  04 00 00 00 13 00 00 00  |(...............|
+000020f0  24 03 00 00 00 00 00 00  04 00 00 00 13 00 00 00  |$...............|
 00002100  00 00 00 00 00 00 00 00  2c 03 00 00 00 00 00 00  |........,.......|
 00002110  05 00 00 00 01 00 00 00  05 00 00 00 00 00 00 00  |................|
 00002120  2c 03 00 00 00 00 00 00  08 00 00 00 13 00 00 00  |,...............|

PS. Ah, sorry, cmp --verbose, output in base-8. Now it matches.

This is what I got in resulting .so file in my case (remember binutils 2.18),

$ hd tst-tlsmod7-z-ev4.so | grep '75 23' -A 2
00000ec0  a2 05 62 40 24 00 40 e4  08 00 21 a4 a1 75 23 40  |..b@$.@...!..u#@|
00000ed0  21 00 20 e4 58 80 1d a6  00 00 fe 2f 9e 00 00 00  |!. .X....../....|
00000ee0  00 04 00 42 00 00 00 24  11 00 3f 20 58 00 40 a4  |...B...$..? X.@.|

After locating it in objdump -d, I have this

 ebc:   23 36 7e 48     zapnot  t2,0xf1,t2
 ec0:   a2 05 62 40     cmpeq   t2,t1,t1
 ec4:   24 00 40 e4     beq     t1,f58 <check1+0x178>
 ec8:   08 00 21 a4     ldq     t0,8(t0)
 ecc:   a1 75 23 40     cmpeq   t0,0x1b,t0
 ed0:   21 00 20 e4     beq     t0,f58 <check1+0x178>
 ed4:   58 80 1d a6     ldq     a0,-32680(gp)
 ed8:   00 00 fe 2f     unop    
 edc:   9e 00 00 00     rduniq
 ee0:   00 04 00 42     addq    a0,v0,v0
 ee4:   00 00 00 24     ldah    v0,0(v0)

for both ev4 and ev5 .so. This matches a result of Aurelien's ev4.so

I will try bissecting problem.

Comment 5 Witold Baryluk 2011-03-31 04:42:15 UTC

I cannot reproduce problem in CVS HEAD.

Comment 6 Witold Baryluk 2011-03-31 04:47:51 UTC

This is diff I got when using git cvsimport repo, commit 9462a417 (I do not know how to map it back to CVS, but it is Date:   Wed Mar 30 23:00:05 2011 +0000).


--- source-ev4.so.disa	2011-03-31 05:58:39.000000000 +0200
+++ source-ev5.so.disa	2011-03-31 05:58:44.000000000 +0200
@@ -1,5 +1,5 @@
 
-./source-ev4.so:     file format elf64-alpha
+./source-ev5.so:     file format elf64-alpha
 
 Disassembly of section .text:
 
@@ -229,8 +229,8 @@
  958:	08 00 21 a4 	ldq	t0,8(t0)
  95c:	a1 75 23 40 	cmpeq	t0,0x1b,t0
  960:	21 00 20 e4 	beq	t0,9e8 <check1+0x178>
- 964:	40 80 1d 22 	lda	a0,-32704(gp)
- 968:	20 80 7d a7 	ldq	t12,-32736(gp)
+ 964:	20 80 7d a7 	ldq	t12,-32736(gp)
+ 968:	40 80 1d 22 	lda	a0,-32704(gp)
  96c:	00 40 5b 6b 	jsr	ra,(t12),970 <check1+0x100>
  970:	02 00 ba 27 	ldah	gp,2(ra)
  974:	00 00 00 24 	ldah	v0,0(v0)

Instructions are re-ordered, as in input assembler, but it should work (registers are correctly used).

Comment 7 Witold Baryluk 2011-03-31 06:43:43 UTC

The same behaviour when I compiled binutils_2.21.0.20110327 from Debian unstable.

Everything seems fine.

Aurelien, could you provide exact gcc/ld versions and a exact command line how to produce .so files from given .s files?

Thanks,
Witek

Comment 8 Michael Cree 2011-03-31 23:08:24 UTC

I see the bug compiling just as Aurelien describes.  This is using Debian unstable binutils (2.21.0.20110327).  The bug does not appear if one compiles with the "-Wl,--no-relax" option.  

I think the bug is likely to be in the elf64_alpha_relax_tls_get_addr() function of bfd/elf64-alpha.c which is the function that implements the transformation of code of the specific example.

The particular register that is loaded into the x of the instruction "ldq    x,-32680(gp)" in the examples given by Aurelien is at line 3500 of the bfd/elf64-alpha.c  code and is:

 tlsgd_reg = bfd_get_32 (info->abfd, pos[0]);
 tlsgd_reg = (tlsgd_reg >> 21) & 31;

For some reason it can pick up the wrong register, i.e., it seems to be getting the register of the first instruction in the sequence transformation always, and if the first two instructions are swapped in order it then gets the wrong register.  I am not yet sure why this occurs, as there seems to be code above to deal with this exact situation.  At line 3429 is the following:

  /* Generally, the positions are not allowed to be out of order, lest the
     modified insn sequence have different register lifetimes.  We can make
     an exception when pos 1 is adjacent to pos 0.  */
  if (pos[1] + 4 == pos[0])
    {
      bfd_byte *tmp = pos[0];
      pos[0] = pos[1];
      pos[1] = tmp;
    }

This is far as my analysis has got with the limited time I have had to look at this so far.

Comment 9 Witold Baryluk 2011-04-01 00:21:08 UTC

It looks problem is actually in assembler, not linker. This is why I not reproduced it before (I was still using .o files produced using as 2.18.0). Will investigate more.

Comment 10 Witold Baryluk 2011-04-01 01:51:23 UTC

You are right, it is linker. There is small change in assembled *.o files
beetwen 2.18 and 2.21, but it is .comment section which had
now Merge and Strings flags, in 2.21. reruning gcc -shared on
pre-assembled .o files, still produced wrong result as in Aurelien.
(regardles which assembler I used).

Sorry, for confusing you.


It looks that removing -O1 from collect2 command line, fixes problem.

Comment 11 Witold Baryluk 2011-04-01 04:43:41 UTC

After bisecting, I got this as a first bad commit

456af9cfa4c227bddde3bae28cd0a4327bcb7c1c is first bad commit
commit 456af9cfa4c227bddde3bae28cd0a4327bcb7c1c
Author: Richard Henderson <rth@redhat.com>
Date:   Mon Aug 9 23:58:51 2010 +0000

            PR ld/11891
            * elf64-alpha.c (elf64_alpha_relax_tls_get_addr): Disallow relaxing
            to tlshi/lo until pos0 and pos1 are adjacent.  Use the destination
            register from the tldgd insn.

diff --git a/bfd/ChangeLog b/bfd/ChangeLog
index c1c9054..528fa3c 100644
--- a/bfd/ChangeLog
+++ b/bfd/ChangeLog
@@ -1,3 +1,10 @@
+2010-08-09  Richard Henderson  <rth@redhat.com>
+
+	PR ld/11891
+	* elf64-alpha.c (elf64_alpha_relax_tls_get_addr): Disallow relaxing
+	to tlshi/lo until pos0 and pos1 are adjacent.  Use the destination
+	register from the tldgd insn.
+
 2010-08-09  Catherine Moore  <clm@codesourcery.com>
 
 	* elfxx-mips.c (mips_elf_perform_relocation): Improve
diff --git a/bfd/elf64-alpha.c b/bfd/elf64-alpha.c
index 649d370..083beb1 100644
--- a/bfd/elf64-alpha.c
+++ b/bfd/elf64-alpha.c
@@ -3382,9 +3382,9 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
 				Elf_Internal_Rela *irel, bfd_boolean is_gd)
 {
   bfd_byte *pos[5];
-  unsigned int insn;
+  unsigned int insn, tlsgd_reg;
   Elf_Internal_Rela *gpdisp, *hint;
-  bfd_boolean dynamic, use_gottprel, pos1_unusable;
+  bfd_boolean dynamic, use_gottprel;
   unsigned long new_symndx;
 
   dynamic = alpha_elf_dynamic_symbol_p (&info->h->root, info->link_info);
@@ -3425,7 +3425,6 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
   pos[2] = info->contents + irel[2].r_offset;
   pos[3] = info->contents + gpdisp->r_offset;
   pos[4] = pos[3] + gpdisp->r_addend;
-  pos1_unusable = FALSE;
 
   /* Generally, the positions are not allowed to be out of order, lest the
      modified insn sequence have different register lifetimes.  We can make
@@ -3436,8 +3435,6 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
       pos[0] = pos[1];
       pos[1] = tmp;
     }
-  else if (pos[1] < pos[0])
-    pos1_unusable = TRUE;
   if (pos[1] >= pos[2] || pos[2] >= pos[3])
     return TRUE;
 
@@ -3495,6 +3492,14 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
 
   use_gottprel = FALSE;
   new_symndx = is_gd ? ELF64_R_SYM (irel->r_info) : 0;
+
+  /* Beware of the compiler hoisting part of the sequence out a loop
+     and adjusting the destination register for the TLSGD insn.  If this
+     happens, there will be a move into $16 before the JSR insn, so only
+     transformations of the first insn pair should use this register.  */
+  tlsgd_reg = bfd_get_32 (info->abfd, pos[0]);
+  tlsgd_reg = (tlsgd_reg >> 21) & 31;
+
   switch (!dynamic && !info->link_info->shared)
     {
     case 1:
@@ -3508,7 +3513,7 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
 
 	if (disp >= -0x8000 && disp < 0x8000)
 	  {
-	    insn = (OP_LDA << 26) | (16 << 21) | (31 << 16);
+	    insn = (OP_LDA << 26) | (tlsgd_reg << 21) | (31 << 16);
 	    bfd_put_32 (info->abfd, (bfd_vma) insn, pos[0]);
 	    bfd_put_32 (info->abfd, (bfd_vma) INSN_UNOP, pos[1]);
 
@@ -3519,11 +3524,11 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
 	  }
 	else if (disp >= -(bfd_signed_vma) 0x80000000
 		 && disp < (bfd_signed_vma) 0x7fff8000
-		 && !pos1_unusable)
+		 && pos[0] + 4 == pos[1])
 	  {
-	    insn = (OP_LDAH << 26) | (16 << 21) | (31 << 16);
+	    insn = (OP_LDAH << 26) | (tlsgd_reg << 21) | (31 << 16);
 	    bfd_put_32 (info->abfd, (bfd_vma) insn, pos[0]);
-	    insn = (OP_LDA << 26) | (16 << 21) | (16 << 16);
+	    insn = (OP_LDA << 26) | (tlsgd_reg << 21) | (tlsgd_reg << 16);
 	    bfd_put_32 (info->abfd, (bfd_vma) insn, pos[1]);
 
 	    irel[0].r_offset = pos[0] - info->contents;
@@ -3538,7 +3543,7 @@ elf64_alpha_relax_tls_get_addr (struct alpha_relax_info *info, bfd_vma symval,
     default:
       use_gottprel = TRUE;
 
-      insn = (OP_LDQ << 26) | (16 << 21) | (29 << 16);
+      insn = (OP_LDQ << 26) | (tlsgd_reg << 21) | (29 << 16);
       bfd_put_32 (info->abfd, (bfd_vma) insn, pos[0]);
       bfd_put_32 (info->abfd, (bfd_vma) INSN_UNOP, pos[1]);

Comment 12 Witold Baryluk 2011-04-01 06:36:28 UTC

Minimalized test case.

	.set noreorder
	.set volatile
	.set noat
	.set nomacro
	.arch ev5
	.text
	.align 2
	.align 4
	.globl check1
	.ent check1
check1:
	ldq $27,__tls_get_addr($29)		!literal!29
	lda $16,local2($29)		!tlsldm!29
	jsr $26,($27),__tls_get_addr		!lituse_tlsldm!29
	ldah $29,0($26)		!gpdisp!31
	lda $29,0($29)		!gpdisp!31
	ldq $1,local2($29)		!gottprel
	bis $31,$31,$31
	.end check1
	.section	.tdata,"awT",@progbits
	.align 2
	.type	local2, @object
	.size	local2, 16
local2:
	.byte	19
	.zero	3
	.long	20
	.quad	21

Comment 13 Michael Cree 2011-04-02 00:24:12 UTC

Created attachment 5641 [details]
Patch to fix relax tls

Patch to fix this issue.

On entry to elf64_alpha_relax_tls_get_addr() the instructions are already re-ordered to be in "canonical" order rather than actual execution order so that when pos[0] is assigned it contains the address to tlsldm.  However pos[0] and pos[1] are then swapped if the pos[1] is the instruction before pos[0] to get them in memory address order.  This means that by the time the register to pass to rduniq is calculated pos[0] may not contain the tlsldm instruction.  The patch shifts the calculation of the register before the switch of pos[0] and pos[1].

It would be good if someone more knowledgeable about binutils than I would verify that my analysis is correct and that the fix does not introduce any further problems.

Comment 14 Uros Bizjak 2011-06-24 18:36:27 UTC

Dupe of Bug 12928.

*** This bug has been marked as a duplicate of bug 12928 ***