Bug 21458 - ld generates none ARM elf ABI compliant code that causes a hard fault.
Summary: ld generates none ARM elf ABI compliant code that causes a hard fault.
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.29
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-04 00:58 UTC by Andrew Goedhart
Modified: 2017-05-15 16:11 UTC (History)
2 users (show)

See Also:
Host: Linux
Target: Arm cortex M
Build:
Last reconfirmed:


Attachments
test case (21.39 KB, application/x-object)
2017-05-04 00:59 UTC, Andrew Goedhart
Details
Patch to fix R_ARM_THM_ALU_PREL hard fault/abi relocation error (478 bytes, patch)
2017-05-04 01:03 UTC, Andrew Goedhart
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Goedhart 2017-05-04 00:58:31 UTC
According to the ELF for the ARM Architecture ABI r2.10 (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0044f/index.html) the relocation R_ARM_THM_ALU_PREL_10_0 needs to generate code according to:

53 R_ARM_THM_ALU_PREL_11_0 Static Thumb32 ((S + A) | T) – Pa

This means that if the SYMBOL is of type STT_FUNC and the code is thumb, the LSb of the loaded value needs to be set. This is not done in the current code.

GCC seems not to use this relocation when generating function pointers to thumb code but IAR does. when linking GCC code to an IAR library containing this the linked code HARD faults.

Code Analysis
-------------

In elf32-arm.c

On reading in the symbols and mangling them via elf32_arm_swap_symbol_in, the test is done for STT_FUNC and the LSb of the symbol value being set. The LSb of the symbol is then cleared.


  /* New EABI objects mark thumb function symbols by setting the low bit of
     the address.  */
  if (ELF_ST_TYPE (dst->st_info) == STT_FUNC
      || ELF_ST_TYPE (dst->st_info) == STT_GNU_IFUNC)
    {
      if (dst->st_value & 1)
	{
	  dst->st_value &= ~(bfd_vma) 1;
	  ARM_SET_SYM_BRANCH_TYPE (dst->st_target_internal,
				   ST_BRANCH_TO_THUMB);
	}
      else
	ARM_SET_SYM_BRANCH_TYPE (dst->st_target_internal, ST_BRANCH_TO_ARM);
    }

No check is made and the LSb of the value is not set, for a branch types of ST_BRANCH_TO_ARM in the function generating the relocations: elf32_arm_final_link_relocate(), case R_ARM_THM_ALU_PREL_11_0:


Test Case
---------

The object file G3_LIB.o (attched) contains three symbols:

    97: 00000469 26 FUNC    LOCAL  DEFAULT   12 G3_LIB_UpdateTablesEvent_15Sec
    98: 0000044d 26 FUNC    LOCAL  DEFAULT   12 G3_LIB_UpdateTablesEvent_Minute
    99: 00000485 22 FUNC    LOCAL  DEFAULT   12 G3_LIB_UpdateTablesEvent_TwoSec

When linking in function g3_timed_events_init() the above symbols are used to generate function pointers with even addresses that later are called causing a hard fault.

Disassembly on the hardware device showing the invalid thumb addresses being generated at addresses 0xd720, 0xd738, 0xd752

                     ?Subroutine5:
0000d710: 0x00000192   str     r2, [sp, #4]
0000d712: 0x00000023   movs    r3, #0
0000d714: 0x43f69822   movw    r2, #15000      ; 0x3a98
0000d718: 0x00000092   str     r2, [sp, #0]
0000d71a: 0x00008218   adds    r2, r0, r2
0000d71c: 0x00004b41   adcs    r3, r1
0000d71e: 0x00000021   movs    r1, #0
0000d720: 0x0ff27400   addw    r0, pc, #116    ; 0x74
0000d724: 0x0ef023b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine4:
0000d728: 0x00000192   str     r2, [sp, #4]
0000d72a: 0x00000023   movs    r3, #0
0000d72c: 0x4ef66022   movw    r2, #60000      ; 0xea60
0000d730: 0x00000092   str     r2, [sp, #0]
0000d732: 0x00008218   adds    r2, r0, r2
0000d734: 0x00004b41   adcs    r3, r1
0000d736: 0x00000021   movs    r1, #0
0000d738: 0x0ff24000   addw    r0, pc, #64     ; 0x40
0000d73c: 0x0ef017b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine3:
0000d740: 0x00000192   str     r2, [sp, #4]
0000d742: 0x4ff4fa62   mov.w   r2, #2000       ; 0x7d0
0000d746: 0x00000092   str     r2, [sp, #0]
0000d748: 0x10f5fa62   adds.w  r2, r0, #2000   ; 0x7d0
0000d74c: 0x41f10003   adc.w   r3, r1, #0
0000d750: 0x00000021   movs    r1, #0
0000d752: 0x0ff26000   addw    r0, pc, #96     ; 0x60
0000d756: 0x0ef00ab9   b.w     0x1b96e <g3_timed_events_add>


Unfortunately I don't have the code or the IAR compiler to generate a simpler test object file. It was supplied as a compiled library from which I extracted the G3_LIB.o after tracing the cause of the hard fault on a CortexM4 based processor.

PATCH 
-----

The disassembly of the linked code after applying the patch:

                     ?Subroutine5:
0000d710: 0x00000192   str     r2, [sp, #4]
0000d712: 0x00000023   movs    r3, #0
0000d714: 0x43f69822   movw    r2, #15000      ; 0x3a98
0000d718: 0x00000092   str     r2, [sp, #0]
0000d71a: 0x00008218   adds    r2, r0, r2
0000d71c: 0x00004b41   adcs    r3, r1
0000d71e: 0x00000021   movs    r1, #0
0000d720: 0x0ff27500   addw    r0, pc, #117    ; 0x75
0000d724: 0x0ef023b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine4:
0000d728: 0x00000192   str     r2, [sp, #4]
0000d72a: 0x00000023   movs    r3, #0
0000d72c: 0x4ef66022   movw    r2, #60000      ; 0xea60
0000d730: 0x00000092   str     r2, [sp, #0]
0000d732: 0x00008218   adds    r2, r0, r2
0000d734: 0x00004b41   adcs    r3, r1
0000d736: 0x00000021   movs    r1, #0
0000d738: 0x0ff24100   addw    r0, pc, #65     ; 0x41
0000d73c: 0x0ef017b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine3:
0000d740: 0x00000192   str     r2, [sp, #4]
0000d742: 0x4ff4fa62   mov.w   r2, #2000       ; 0x7d0
0000d746: 0x00000092   str     r2, [sp, #0]
0000d748: 0x10f5fa62   adds.w  r2, r0, #2000   ; 0x7d0
0000d74c: 0x41f10003   adc.w   r3, r1, #0
0000d750: 0x00000021   movs    r1, #0
0000d752: 0x0ff26100   addw    r0, pc, #97     ; 0x61
0000d756: 0x0ef00ab9   b.w     0x1b96e <g3_timed_events_add>


The GIT patch off 2.29 to fix the problem is:

From 962abfea5b59dc8ec7f5dc0e1c38704aac0094e3 Mon Sep 17 00:00:00 2001
From: Andrew Goedhart <Andrewgoedhart@simplepowersolutions.co.za>
Date: Thu, 4 May 2017 02:17:54 +0200
Subject: [PATCH] Fix R_ARM_THM_ALU_PREL relocation to conform to ARM elf abi  
  set thumb bit on branch to thumb code

---
 bfd/elf32-arm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/bfd/elf32-arm.c b/bfd/elf32-arm.c
index 434649f..9acd9c7 100644
--- a/bfd/elf32-arm.c
+++ b/bfd/elf32-arm.c
@@ -10508,6 +10508,10 @@ elf32_arm_final_link_relocate (reloc_howto_type *           howto,
 
 	value = relocation;
 
+	if ( branch_type == ST_BRANCH_TO_THUMB){
+	  value |= 1;
+	}
+
 	if (value >= 0x1000)
 	  return bfd_reloc_overflow;
 
-- 
2.9.3
Comment 1 Andrew Goedhart 2017-05-04 00:59:45 UTC
Created attachment 10036 [details]
test case

This is the object file generating the invalid relocations
Comment 2 Andrew Goedhart 2017-05-04 01:03:02 UTC
Created attachment 10037 [details]
Patch to fix R_ARM_THM_ALU_PREL hard fault/abi relocation error

Tested. After applying patch generates correct code with lsb set at addresses: 0xd720, 0xd738, 0x7d52

                     ?Subroutine5:
0000d710: 0x00000192   str     r2, [sp, #4]
0000d712: 0x00000023   movs    r3, #0
0000d714: 0x43f69822   movw    r2, #15000      ; 0x3a98
0000d718: 0x00000092   str     r2, [sp, #0]
0000d71a: 0x00008218   adds    r2, r0, r2
0000d71c: 0x00004b41   adcs    r3, r1
0000d71e: 0x00000021   movs    r1, #0
0000d720: 0x0ff27500   addw    r0, pc, #117    ; 0x75
0000d724: 0x0ef023b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine4:
0000d728: 0x00000192   str     r2, [sp, #4]
0000d72a: 0x00000023   movs    r3, #0
0000d72c: 0x4ef66022   movw    r2, #60000      ; 0xea60
0000d730: 0x00000092   str     r2, [sp, #0]
0000d732: 0x00008218   adds    r2, r0, r2
0000d734: 0x00004b41   adcs    r3, r1
0000d736: 0x00000021   movs    r1, #0
0000d738: 0x0ff24100   addw    r0, pc, #65     ; 0x41
0000d73c: 0x0ef017b9   b.w     0x1b96e <g3_timed_events_add>
                     ?Subroutine3:
0000d740: 0x00000192   str     r2, [sp, #4]
0000d742: 0x4ff4fa62   mov.w   r2, #2000       ; 0x7d0
0000d746: 0x00000092   str     r2, [sp, #0]
0000d748: 0x10f5fa62   adds.w  r2, r0, #2000   ; 0x7d0
0000d74c: 0x41f10003   adc.w   r3, r1, #0
0000d750: 0x00000021   movs    r1, #0
0000d752: 0x0ff26100   addw    r0, pc, #97     ; 0x61
0000d756: 0x0ef00ab9   b.w     0x1b96e <g3_timed_events_add>
Comment 3 Andrew Goedhart 2017-05-07 14:25:49 UTC
This bug has also been reported in launchpad :https://bugs.launchpad.net/gcc-arm-embedded/+bug/1421389 in 2015. Talk was made of forwarding a similar patch to the one in this bug report to bin utils. However I see no evidence of a patch ever being applied to the bin utils source tree. The fault is still present in 2.29 head.
Comment 4 cvs-commit@gcc.gnu.org 2017-05-09 11:16:32 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=e645cf40b111daef4518a58547de577eb9379ccb

commit e645cf40b111daef4518a58547de577eb9379ccb
Author: Andrew Goedhart <Andrewgoedhart@simplepowersolutions.co.za>
Date:   Tue May 9 12:14:48 2017 +0100

    Fix resolution of R_ARM_THM_ALU_PREL_11_0 relocation against Thumb symbols.
    
    	PR ld/21458
    	* elf32-arm.c (elf32_arm_final_link_relocate): Set the bottom bit
    	of the value when resolving a R_ARM_THM_ALU_PREL_11_0 relocation
    	and the destination is a Thumb symbol.
Comment 5 Nick Clifton 2017-05-09 11:21:46 UTC
Hi Andrew,

  I am very sorry for dropping the ball on this PR and not reviewing it until now.

  I have gone ahead and checked in your patch.  One thing extra that I would like to do is to add a new testcase to the linker testsuite, to make sure that this problem stays fixed.  Do you have some (small) assembler sources that can be used to trigger this bug ?

Cheers
  Nick
Comment 6 Andrew Goedhart 2017-05-09 15:47:35 UTC
Hi Nick

Thanks for the merge.

I can't generate a test case because using gas, the ADR instruction
insists that the label be in the same section and file. So no
relocation is generated because the only code the assembler generates
is PC relative and therefore does not require relocation. I am
guessing this is why the problem was hidden for so long.

Unfortunately we may potentially have another problem. When trying to
generate a test case I came across the following.

The test case below generates incorrect code for the ADR instruction
in conjunction with a thumb_func target..
------------------------

.syntax unified
.thumb
.text
.align 2
.global __testFnPtr
.type __testFnPtr, %function
.thumb_func
__testFnPtr:
   bx lr

.global __testArmThmAluPre
.type __testArmThmAluPre, %function
.thumb_func
__testArmThmAluPre:
    Push {LR}
    ADR R0,__testFnPtr
    BLX R0
    Pop {PC}

-------------------------
this results in the following disassembly

                     __testFnPtr:
000098e8: 0x00007047   bx      lr
                     __testArmThmAluPre:
000098ea: 0x000000b5   push    {lr}
000098ec: 0xaff20800   subw    r0, pc, #8
000098f0: 0x00008047   blx     r0
000098f2: 0x000000bd   pop     {pc}

The problem is that the thumb bit is not set ( subw    r0, pc, #8
results in an even address) and that this code causes a hard fault.

Now I don't know my assembly as well as I should so there might be
wrong/missing directives in the above file. Comment welcome

regards

Andrew Goedhart

On Tue, May 9, 2017 at 1:21 PM, nickc at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21458
>
> Nick Clifton <nickc at redhat dot com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |RESOLVED
>                  CC|                            |nickc at redhat dot com
>          Resolution|---                         |FIXED
>
> --- Comment #5 from Nick Clifton <nickc at redhat dot com> ---
> Hi Andrew,
>
>   I am very sorry for dropping the ball on this PR and not reviewing it until
> now.
>
>   I have gone ahead and checked in your patch.  One thing extra that I would
> like to do is to add a new testcase to the linker testsuite, to make sure that
> this problem stays fixed.  Do you have some (small) assembler sources that can
> be used to trigger this bug ?
>
> Cheers
>   Nick
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
Comment 7 Nick Clifton 2017-05-10 12:10:34 UTC
(In reply to Andrew Goedhart from comment #6)

Hi Andrew,

> Unfortunately we may potentially have another problem. When trying to
> generate a test case I came across the following.

>     ADR R0,__testFnPtr
>     BLX R0

> The problem is that the thumb bit is not set ( subw    r0, pc, #8
> results in an even address) and that this code causes a hard fault.
> 
> Now I don't know my assembly as well as I should so there might be
> wrong/missing directives in the above file. Comment welcome

Hmm, it appears that the ADR pseudo-instruction may not actually support
encoding thumb function addresses in the bottom bit.  It is not mentioned
in the description of the instruction in the ARM ARM.  (I am looking at
section A8.8.12 of the ARM DDI 0406C.b document).  

The intent, I think, is that the value loaded by an ADR instruction will
be used by a B or BL instruction, but not a BLX instruction.

Except that there is a special case when the destination register of the 
ADR instruction is the PC and the processor supports the ARMv7 ISA.  
In this cae the instruction acts like a BX instruction.  Prior to ARMv7
however ADR PC, <label> would act as a B instruction.

Changing the behaviour of the ADR instruction now, would I think, be a 
very dangerous thing to do.  Presumably there is already (lots of) code
out there that is using it in its current state, and fixing this bug, if
it really is a bug, would have the potential to break things.

I will however ping the guys at ARM to see if they have an opinion.

Cheers
  Nick
Comment 8 Andrew Goedhart 2017-05-10 14:07:05 UTC
Hi Nic

I had a look at the programmers reference guide for the cortext M3 and M4 .

I noticed that the instructions BL and B require a label not a
register. This is true for ARMV4-ARMV8 architectures. All the
architectures that support thumb and where a .thumb_func symbol would
be relevant.

Only BLX and BX can and must use a register as an operand.  i.e. BLX
R0   Again true from ARMV4-ARMV8.

That means in thumb code you cannot branch on a register unless it is
a MOV PC or BLX and BX instruction

However for BLX and BX you want to set the lsb bit to 1 for a thumb
target (.thumb_func). If you use a MOV, bit[0] is ignored and set to 0
so setting it to 1 does not cause failure

If you are loading the address of a constant it would not be
appropriate to set the lsb bit to 1 but then the target of the load
would not be a STT_FUNC. or .thumb_func

ARMV6T&ARMV7T
--------------------------
Now for Cortext M0, M3, M4, M7 and ARMV6T
Some Interesting excerpts from ARMv6 architecture
[https://developer.arm.com/docs/dui0662/latest/3-the-cortex-m0-instruction-set/35-general-data-processing-instructions/355-mov-and-mvn]

Mov PC, Rx

When Rd is the PC in a MOV instruction:
   * Bit[0] of the result is discarded.
   * A  branch occurs to the address created by forcing bit[0] of the
result to 0. The T-bit remains unmodified.

for BX and BLX
  * The BX and BLX instructions result in a HardFault exception if
bit[0] of Rm is 0.


ARMV4T & ARMV5T
---------------------------
Also  from page A4-43 of the architecture reference manual for ARM5(T)

from MOV
When the PC is the destination of the instruction, a branch occurs.
The instruction: MOV PC, LR can therefore be used to return from a
subroutine (see instructions B, BL on page A4-10). In T variants of
architecture 4 and in architecture 5 and above, the instruction BX LR
must be used in place of MOV PC, LR, as the BX instruction
automatically switches back to Thumb state if appropriate (but see
also The T and J bits on page A2-15 for operation on non-T variants of
ARM architecture version 5)

also from A2-10 of the architecture reference manual:

'The precise rules depend on the current instruction set state and the
architecture version:

 • In T variants of ARMv4 and above, including all variants of ARMv6
and above, bit[0] of a value written to R15 in Thumb state is ignored
unless the instruction description says otherwise. If bit[0] of the PC
is implemented (which depends on whether and how the Jazelle Extension
is implemented), then zero must be written to it regardless of the
value written to bit[0] of R15.

• In ARMv6 and above, bits[1:0] of a value written to R15 in ARM state
are ignored unless the instruction description says otherwise. Bit[1]
of the PC must be written as zero regardless of the value written to
bit[1] of R15. If bit[0] of the PC is implemented (which depends on
how the Jazelle Extension is implemented), then zero must be written
to it. '



So if thumb is supported and a MOV PC, Rx is used the lsb bit is
ignored on all targets upto and including V7.

This means that if the target is a .thumbfunc, when doing an ADR, the
lsb can be set without breaking code unless that code was loading an
address for another reason other then branching to it.

So if we adjusted ADR to include setting the thumb bit for targets of
type /thumb_func,

both

.thumb_func
FunctionPtr:

ADR R0 FunctionPtr
BLX R0

and

MOV PC, R0

would function correctly.



regards

Andrew



On Wed, May 10, 2017 at 2:10 PM, nickc at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21458
>
> --- Comment #7 from Nick Clifton <nickc at redhat dot com> ---
> (In reply to Andrew Goedhart from comment #6)
>
> Hi Andrew,
>
>> Unfortunately we may potentially have another problem. When trying to
>> generate a test case I came across the following.
>
>>     ADR R0,__testFnPtr
>>     BLX R0
>
>> The problem is that the thumb bit is not set ( subw    r0, pc, #8
>> results in an even address) and that this code causes a hard fault.
>>
>> Now I don't know my assembly as well as I should so there might be
>> wrong/missing directives in the above file. Comment welcome
>
> Hmm, it appears that the ADR pseudo-instruction may not actually support
> encoding thumb function addresses in the bottom bit.  It is not mentioned
> in the description of the instruction in the ARM ARM.  (I am looking at
> section A8.8.12 of the ARM DDI 0406C.b document).
>
> The intent, I think, is that the value loaded by an ADR instruction will
> be used by a B or BL instruction, but not a BLX instruction.
>
> Except that there is a special case when the destination register of the
> ADR instruction is the PC and the processor supports the ARMv7 ISA.
> In this cae the instruction acts like a BX instruction.  Prior to ARMv7
> however ADR PC, <label> would act as a B instruction.
>
> Changing the behaviour of the ADR instruction now, would I think, be a
> very dangerous thing to do.  Presumably there is already (lots of) code
> out there that is using it in its current state, and fixing this bug, if
> it really is a bug, would have the potential to break things.
>
> I will however ping the guys at ARM to see if they have an opinion.
>
> Cheers
>   Nick
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
Comment 9 cvs-commit@gcc.gnu.org 2017-05-15 14:30:14 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=52a86f843b6dee1de9977293da9786649b146b05

commit 52a86f843b6dee1de9977293da9786649b146b05
Author: Nick Clifton <nickc@redhat.com>
Date:   Mon May 15 15:29:02 2017 +0100

    Fix use of ARM ADR and ADRl pseudo-instructions with thumb function symbols.
    
    	PR gas/21458
    	* config/tc-arm.c (do_adr): If the ADR involves a thumb function
    	symbol, ensure that the T bit will be set.
    	(do_adrl): Likewise.
    	(do_t_adr): Likewise.
    	* testsuite/gas/arm/pr21458.s: New test.
    	* testsuite/gas/arm/pr21458.d: New test driver.
Comment 10 Nick Clifton 2017-05-15 14:31:55 UTC
Hi Andrew,

  Well, I have not heard back from ARM - which is a bit surprising - but I have to agree with you.  The probable intent of the ADR pseudo-instruction is that is should produce addresses with the T bit set when a thumb function symbol is used.

  So I have checked in a patch to do this.  Please update your sources and give it a go.

Cheers
  Nick
Comment 11 Andrew.goedhart 2017-05-15 16:11:49 UTC
HI Nick

Will do tonight at home

regards

On Mon, May 15, 2017 at 4:31 PM, nickc at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21458
>
> --- Comment #10 from Nick Clifton <nickc at redhat dot com> ---
> Hi Andrew,
>
>   Well, I have not heard back from ARM - which is a bit surprising - but I have
> to agree with you.  The probable intent of the ADR pseudo-instruction is that
> is should produce addresses with the T bit set when a thumb function symbol is
> used.
>
>   So I have checked in a patch to do this.  Please update your sources and give
> it a go.
>
> Cheers
>   Nick
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.