Bug 21725

Summary: [2.29 Regression] binutils fails to build glibc-2.24 on aarch64-linux-gnu and arm-linux-gnueabihf
Product: binutils Reporter: Matthias Klose <doko>
Component: binutilsAssignee: Jiong Wang <jiwang>
Status: RESOLVED FIXED    
Severity: normal CC: adconrad, jiwang, pbrobinson
Priority: P2    
Version: 2.29   
Target Milestone: ---   
Host: Target: aarch64-linux-gnu, arm-linux-gnueabihf
Build: Last reconfirmed:

Description Matthias Klose 2017-07-06 12:15:23 UTC
Trying to build glibc-2.24 on aarch64-linux-gnu and arm-linux-gnueabihf, the build fails when trying to run localedef in the check target, the just built localedef segfaulting. This glibc build ok when building with binutils from the 2.28 branch.

glibc-2.24 needs upstream commit 388b4f1a02f3a801965028bbfcd48d905638b797 backported to 2.24 to build.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaad00c5d28 in _dl_start_user ()
   from /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
(gdb) bt
#0  0x0000aaaad00c5d28 in _dl_start_user ()
   from /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
#1  0x0000aaaad00c5cc8 in _start ()
   from /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 1 Jiong Wang 2017-07-06 14:19:04 UTC
(In reply to Matthias Klose from comment #0)
> Trying to build glibc-2.24 on aarch64-linux-gnu and arm-linux-gnueabihf, the
> build fails when trying to run localedef in the check target, the just built
> localedef segfaulting. This glibc build ok when building with binutils from
> the 2.28 branch.
> 
> glibc-2.24 needs upstream commit 388b4f1a02f3a801965028bbfcd48d905638b797
> backported to 2.24 to build.
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000aaaad00c5d28 in _dl_start_user ()
>    from
> /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
> (gdb) bt
> #0  0x0000aaaad00c5d28 in _dl_start_user ()
>    from
> /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
> #1  0x0000aaaad00c5cc8 in _start ()
>    from
> /home/ubuntu/glibc/glibc-2.24/build-tree/arm64-libc/elf/ld-linux-aarch64.so.1
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Hi Matthias,

  Thanks for reporting this.

  It looks to me is the same issue reported here?

    https://sourceware.org/ml/binutils/2017-06/msg00226.html
  
  I guess ld.so actually have not been built successfully.  You may need to redirect the build log to a text file then search that error, somehow glibc build won't stop even that error happens.

  I could reproduce this issue on AArch64, and could confirm it's fixed after backporting the following fix from master branch to 2.24.

  could you please confirm the backport works for you as well?

commit e9177fba13549a8e2a6232f46080e5c6d3e467b1
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Jun 21 13:47:07 2017 +0100

    [AArch64] Use hidden __GI__dl_argv in rtld startup code
Comment 2 Matthias Klose 2017-07-06 16:58:12 UTC
that seems to work. However a link error for ld.so is not seen on arm-linux-gnueabihf.
Comment 3 Jiong Wang 2017-07-07 14:12:06 UTC
(In reply to Matthias Klose from comment #2)
> that seems to work. However a link error for ld.so is not seen on
> arm-linux-gnueabihf.

I just reproduced the ARM segment fault, it looks like another issue which needs investigation.

gdb --args ./elf/ld-linux-armhf.so.3 ./locale/localedef 

(gdb) bt    
#0  elf_dynamic_do_Rel (lazy=0, skip_ifunc=0, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0xaaad2568 <_rtld_local+1304>) at do-rel.h:83
#1  _dl_start (arg=0xfffef590) at rtld.c:504
#2  0xaaaaab8e in _start ()
Comment 4 Adam Conrad 2017-07-11 11:32:12 UTC
Can confirm both comments, the upstream commit pointed to does fix aarch64, and I can reproduce that identical backtrace on armhf.
Comment 5 Adam Conrad 2017-07-11 11:42:31 UTC
Note that you don't really need to do anything fancy like try to reproduce the testsuite environment (or, indeed, try to call localedef), just invoking the freshly-built ld-linux-armhf.so.3 by itself is enough to show the damage.
Comment 6 Jiong Wang 2017-07-12 12:50:00 UTC
ARM ld.so has broken since the following commit, revert it on 2.29 branch make ld.so works again on arm-linux-gnueabihf

commit 52a86f843b6dee1de9977293da9786649b146b05
Author: Nick Clifton <nickc@redhat.com>
Date:   Mon May 15 15:29:02 2017 +0100

    Fix use of ARM ADR and ADRl pseudo-instructions with thumb function symbols.

@nick, do you mind have a look?  I will do some analysis as well.
Comment 7 Jiong Wang 2017-07-12 13:40:53 UTC
(In reply to Jiong Wang from comment #6)
>  will do some analysis as well.

The difference between the working and broken ld.so on arm is

00000fa0 <_dl_start>:
@@ -1262,7 +1262,7 @@
      fb6:	6338      	str	r0, [r7, #48]	; 0x30
      fb8:	4479      	add	r1, pc
      fba:	589a      	ldr	r2, [r3, r2]
-     fbc:	f2af 0420 	subw	r4, pc, #32
+     fbc:	f2af 041f 	subw	r4, pc, #31
      fc0:	680b      	ldr	r3, [r1, #0]
      fc2:	f022 0201 	bic.w	r2, r2, #1
      fc6:	1aa4      	subs	r4, r4, r2
Comment 8 Jiong Wang 2017-07-12 13:54:43 UTC
sysdeps/arm/dl-machine.h:

elf_machine_load_address (void)
   extern Elf32_Addr internal_function __dl_start (void *) asm ("_dl_start");
   Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
   Elf32_Addr pcrel_addr;
#ifdef __thumb__
  /* Clear the low bit of the funciton address.  */
  got_addr &= ~(Elf32_Addr) 1;
#endif
   asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
   return pcrel_addr - got_addr;

The __thumb__ hunk is suspicious, as Nick's patch will also update the low bit.

If I remove the "#ifdef __thumb__", the generated ld.so will also work.
Comment 9 Jiong Wang 2017-07-12 15:41:01 UTC
(In reply to Jiong Wang from comment #8)
> sysdeps/arm/dl-machine.h:
> 
> elf_machine_load_address (void)
>    extern Elf32_Addr internal_function __dl_start (void *) asm ("_dl_start");
>    Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
>    Elf32_Addr pcrel_addr;
> #ifdef __thumb__
>   /* Clear the low bit of the funciton address.  */
>   got_addr &= ~(Elf32_Addr) 1;
> #endif
>    asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
>    return pcrel_addr - got_addr;
> 
> The __thumb__ hunk is suspicious, as Nick's patch will also update the low
> bit.
> 
> If I remove the "#ifdef __thumb__", the generated ld.so will also work.

I think GLIBC ought to be fixed there to be more portable, the __thumb__ can be kept, but pcrel_addr should always be strippted as well if it's __thumb__.  As the assumption of PC-rel relocation finished within assembler does not touch bit0 does not holds after Nick's fix.

I will post a GLIBC fix.
Comment 10 Adam Conrad 2017-07-12 21:28:23 UTC
Can confirm that the patch posted to libc-alpha[1] works for us, and gets us a build[2] that not only runs but also passes the testsuite again.  Thanks.

[1] https://sourceware.org/ml/libc-alpha/2017-07/msg00518.html
[2] https://launchpad.net/ubuntu/+source/glibc/2.24-12ubuntu1