Bug 22589 - aarch64: adrp relocation gets filled with non-zero address for undefined weak symbol
Summary: aarch64: adrp relocation gets filled with non-zero address for undefined weak...
Status: REOPENED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.30
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-11 23:47 UTC by Julius Werner
Modified: 2022-11-21 16:02 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2017-12-15 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Julius Werner 2017-12-11 23:47:38 UTC
Consider the following minimal testcase:

weaklink.s:
  .weak myfunction

  _start:
    adrp x0, myfunction
    add  x0, x0, :lo12:myfunction

'binutils-gdb/gas/as-new weaklink.s -o weaklink.o' generates an unbound R_AARCH64_ADR_PREL_PG_HI21 relocation for the first instruction as expected:

0000000000000000 <_start> (File Offset: 0x40):
   0:   90000000        adrp    x0, 0 <myfunction> (File Offset: 0x40)
                        0: R_AARCH64_ADR_PREL_PG_HI21   myfunction
   4:   91000000        add     x0, x0, #0x0
                        4: R_AARCH64_ADD_ABS_LO12_NC    myfunction

But 'binutils-gdb/ld/ld-new weaklink.o -o weaklink' somehow decides that relocation should be bound to 0x40000 even though the myfunction symbol isn't defined anywhere:

0000000000400078 <_start> (File Offset: 0x78):
  400078:       90000000        adrp    x0, 400000 <_start-0x78> (File Offset: 0x0)
  40007c:       91000000        add     x0, x0, #0x0

This breaks any code trying to test x0 for zero to see if the weak symbol is defined. A work-around is to use ldr x0, =myfunction which gives the correct result. Tested with the current binutils-gdb HEAD (c4e648430f3c5c13).
Comment 1 Nick Clifton 2017-12-15 18:33:44 UTC
Hi Julius,

I do not believe that this is a bug:

>   400078:       90000000        adrp    x0, 400000 <_start-0x78> (File

> This breaks any code trying to test x0 for zero to see if the weak symbol is defined.

Actually the offset stored in the instruction *is* zero.  Look at the
binary value of the instruction (0x90000000).  But since adrp loads a (pafe
aligned) pc-relative value into its destination register, x0 will never be 
set to zero unless the adrp instruction is at address in the first 4K of 
memory.

Hence the linker has linked the code correctly and the disassembler is
helpfully showing you that value that *will* be loaded into the x0 register.

Does this make sense to you ?

Cheers
  Nick
Comment 2 Julius Werner 2017-12-15 20:40:49 UTC
> Actually the offset stored in the instruction *is* zero.

Right, but I think that is incorrect. The point of the 'adrp <reg>, <symbol>' mnemonic is that you want to generate an instruction that contains the right PC-relative offset so that when executed at the address it will end up at, the (absolute) high 20 bits of <symbol> will be written into <reg>. Since the offset that needs to be put into the instruction is PC-relative, the assembler generates a relocation and it's the linker's job to fill that in with the right value when it determines the final address for the instruction.

Consider this counterexample which works correctly:

test2.s:
  .global zerofunc
  .section .zero
  zerofunc:
        ret

test1.s:
  .section .text
  _start:
        adrp    x0, zerofunc
        add     x0, x0, :lo12:zerofunc

Which individually assemble into:

Disassembly of section .text:

0000000000000000 <_start> (File Offset: 0x40):
   0:   90000000        adrp    x0, 0 <zerofunc> (File Offset: 0x40)
                        0: R_AARCH64_ADR_PREL_PG_HI21   zerofunc
   4:   91000000        add     x0, x0, #0x0
                        4: R_AARCH64_ADD_ABS_LO12_NC    zerofunc

...and...

Disassembly of section .zero:

0000000000000000 <zerofunc> (File Offset: 0x40):
   0:   d65f03c0        ret

And then when I link the two together, I get:

Disassembly of section .text:

0000000000400078 <_start> (File Offset: 0x78):
  400078:       90ffe000        adrp    x0, 0 <zerofunc> (File Offset: 0xffffffffffc00000)
  40007c:       91000000        add     x0, x0, #0x0

Disassembly of section .zero:

0000000000000000 <zerofunc> (File Offset: 0x80):
   0:   d65f03c0        ret

As you can see, the linker inserted the right offset into the instruction so that when applied relative to the PC that the instruction was linked at, the resulting value in x0 will be 0. Everything in this example is the same as in the failure case, except that the symbol pointed to by the relocation is a defined symbol with value 0 rather than an undefined weak symbol. The linker should treat both of these the same for address calculations, but it doesn't seem to do so.
Comment 3 Richard Earnshaw 2017-12-18 10:39:24 UTC
It's simply not safe to use adrp to reference a weak symbol.  If you're hand-writing assembly you have to be aware of this.

adrp is a position-independent address-forming instruction; as such it cannot ever resolve to zero and still be position independent; that would require dynamic relocation which would in turn imply having a writable text section.  Furthermore, the offset range of adrp is such that it probably could not generate zero.

It might be nice if the assembler/linker could detect this case and emit an out-of-bounds error - it's essentially a relocation overflow.  I'm not sure if that's a generic issue or a back-end one.

Anyway, it's still invalid to use adrp to form a reference to a weak symbol.
Comment 4 Julius Werner 2018-01-11 00:11:15 UTC
Hi Richard,

Sorry, I can't quite follow your argumentation. Are you trying to say that it's not legal to use the ADRP instruction in position-dependent code? I'm not really sure what you're basing that assumption on... I can't find anything to that effect in any of the official ARM documentation and specs. The instruction may be useful for position-independent code and often used by compilers to this effect, yes, but that doesn't mean that this is the only valid use for it and that it isn't allowed for anything else (unless ARM explicitly specified it like that, which to my knowledge they didn't). If I'm hand-writing assembly and I want to use the instruction in a way that is compliant with the spec, I think binutils should accept and link that correctly.

FWIW, this instruction is useful for any load from a compile-time-known address in position-dependent code where the address is guaranteed to be less than 4GB away from the program counter (but may be further away than the 32KB addressable directly by an LDR (immediate) instruction). This is a pretty common use case for firmware and embedded applications which often use a 1:1 virtual to physical mapping and only need to operate on the DRAM and MMIO addresses within 0 and 4GB. It's more compact than the LDR <reg>, =<symbol> mnemonic which translates into 12 bytes of code (compared to 8 bytes for ADRP and a normal LDR).

I understand that there are cases where the instruction could not output 0 (i.e. if the PC is above 4GB), and I agree that linking should fail with a relocation overflow in that case. But for cases where it does fit, I think it's clearly more correct to output an instruction that results in 0 rather than whatever it's doing right now (which is clearly just wrong, one way or another). See also my earlier post about how it works perfectly fine for a defined symbol with value 0. It makes absolutely no sense that this should give a different result than an undefined weak symbol, they both resolve to exactly the same address.
Comment 5 Richard Earnshaw 2018-01-12 14:41:23 UTC
(In reply to Julius Werner from comment #4)
> Hi Richard,
> 
> Sorry, I can't quite follow your argumentation. Are you trying to say that
> it's not legal to use the ADRP instruction in position-dependent code? 

No, I'm saying you can't use ADRP for a *weak* symbol reference in position independent code.

> I'm
> not really sure what you're basing that assumption on... I can't find
> anything to that effect in any of the official ARM documentation and specs.

What documentation?  ARM docs don't say you can use a division instruction for doing general addition, but clearly you can't...

> The instruction may be useful for position-independent code and often used
> by compilers to this effect, yes, but that doesn't mean that this is the
> only valid use for it and that it isn't allowed for anything else (unless
> ARM explicitly specified it like that, which to my knowledge they didn't).
> If I'm hand-writing assembly and I want to use the instruction in a way that
> is compliant with the spec, I think binutils should accept and link that
> correctly.

It's all a consequence of the requirements on the binary and the constraints that imposes.  Building code that is PIC takes some care and consideration of the operating constraints.  Code, even PIC, needs to be read-only (no dynamic relocations) and that means that after static linking there must be no outstanding relocations on the binary.  With weak symbols we don't know until run-time whether or not the symbol will be defined.  The two constraints are incompatible with each other if we allow ADRP to reference a weak symbol.  It's not just weak symbols have this property; symbol pre-emption can have a similar impact.  The rules for building an ELF image that is workable can and do restrict which instructions can be used in which contexts.  You as the programmer have to understand this, just as a compiler has to.

> 
> FWIW, this instruction is useful for any load from a compile-time-known
> address in position-dependent code where the address is guaranteed to be
> less than 4GB away from the program counter (but may be further away than
> the 32KB addressable directly by an LDR (immediate) instruction).

But it's not compile-time known.  That's the point.

>  This is a
> pretty common use case for firmware and embedded applications which often
> use a 1:1 virtual to physical mapping and only need to operate on the DRAM
> and MMIO addresses within 0 and 4GB. It's more compact than the LDR <reg>,
> =<symbol> mnemonic which translates into 12 bytes of code (compared to 8
> bytes for ADRP and a normal LDR).
> 
> I understand that there are cases where the instruction could not output 0
> (i.e. if the PC is above 4GB), and I agree that linking should fail with a
> relocation overflow in that case. But for cases where it does fit, I think
> it's clearly more correct to output an instruction that results in 0 rather
> than whatever it's doing right now (which is clearly just wrong, one way or
> another). See also my earlier post about how it works perfectly fine for a
> defined symbol with value 0. It makes absolutely no sense that this should
> give a different result than an undefined weak symbol, they both resolve to
> exactly the same address.

That's not the model implemented in GNU binutils.  Nor is it the model wanted for normal PIC code.
Comment 6 Julius Werner 2018-01-13 01:27:29 UTC
> > Sorry, I can't quite follow your argumentation. Are you trying to say that
> > it's not legal to use the ADRP instruction in position-dependent code? 
> 
> No, I'm saying you can't use ADRP for a *weak* symbol reference in position
> independent code.

Yes, but I am not building PIC code! That's why I'm asking whether you think the instruction would be illegal to use in position-*dependent* (i.e. non-PIC) code, because that's the code I am having a problem with. I agree that it doesn't make sense for weak symbols in PIC code, but that's not the case I have -- I want to link non-PIC code where using this instruction can be perfectly fine as long as you know the limits of your address space.

> > I'm
> > not really sure what you're basing that assumption on... I can't find
> > anything to that effect in any of the official ARM documentation and specs.
> 
> What documentation?  ARM docs don't say you can use a division instruction
> for doing general addition, but clearly you can't...

I'm just saying that I can't find anything official that says I can't use this instruction for non-PIC code. So I think that I should be allowed to do that, and that the binutils linker should calculate the right offsets for that case.
Comment 7 Szabolcs Nagy 2019-10-21 11:18:26 UTC
in position dependent code the most reasonable behaviour is to try to
set the symbol address to 0 and if 0 is outside of the range of adrp
then report a linker error.

in position independent code adrp for weak undefined symbol should be
a linker error.
Comment 8 Julius Werner 2019-10-21 19:38:02 UTC
> in position dependent code the most reasonable behaviour is to try to
> set the symbol address to 0 and if 0 is outside of the range of adrp
> then report a linker error.

Yes, thanks, that's exactly the behavior I think it should have!
Comment 9 Szabolcs Nagy 2022-11-18 15:25:50 UTC
i ran into this again and i think the linker could relax 'adrp xN, weaksym' into 'mov xN, 0' if weaksym is undefined.

link error is not helpful since such code (accessing weak symbols) may be behind checks and unreachable if the symbol is undefined.

normally weak syms are accessed via GOT which can be 0 for undef, but in PIC this depends on a dynamic relocation even for hidden visibility syms. this does not work in early start code (e.g. *crt1.o) accessing weak, linker generated symbols (such as __ehdr_start) where the code must not generate dynamic relocations (since it may be executed before ld.so or static pie self relocation). if such symbol is potentially undefined then we have a problem: adrp does not work, GOT does not work, movz does not work. so i dont see a way to implement

  if (&weaksym != 0)
    use(&weaksym);

logic in the ld.so or static pie start code. i ran into this with the morello port where there are linker created symbols (__rela_dyn_start) that are only defined under certain conditions (static exe with no dynamic-linker) and must be checked and accessed in gcrt1.o that is used in both pde and pie.
Comment 10 Richard Earnshaw 2022-11-21 13:53:14 UTC
(In reply to Szabolcs Nagy from comment #9)
> i ran into this again and i think the linker could relax 'adrp xN, weaksym'
> into 'mov xN, 0' if weaksym is undefined.

Static linker or dynamic?  The dynamic linker can't change any code segment during loading, since that would break the code-sharing model used by SVr4-like systems.  I don't see how the static linker can know that a dynamically loaded library won't provide a definition at run-time, so that doesn't make a lot of sense to me either.

> 
> link error is not helpful since such code (accessing weak symbols) may be
> behind checks and unreachable if the symbol is undefined.
> 
> normally weak syms are accessed via GOT which can be 0 for undef, but in PIC
> this depends on a dynamic relocation even for hidden visibility syms. this
> does not work in early start code (e.g. *crt1.o) accessing weak, linker
> generated symbols (such as __ehdr_start) where the code must not generate
> dynamic relocations (since it may be executed before ld.so or static pie
> self relocation). if such symbol is potentially undefined then we have a
> problem: adrp does not work, GOT does not work, movz does not work. so i
> dont see a way to implement
> 
>   if (&weaksym != 0)
>     use(&weaksym);
> 
> logic in the ld.so or static pie start code. i ran into this with the
> morello port where there are linker created symbols (__rela_dyn_start) that
> are only defined under certain conditions (static exe with no
> dynamic-linker) and must be checked and accessed in gcrt1.o that is used in
> both pde and pie.
Comment 11 Szabolcs Nagy 2022-11-21 16:02:09 UTC
(In reply to Richard Earnshaw from comment #10)
> (In reply to Szabolcs Nagy from comment #9)
> > i ran into this again and i think the linker could relax 'adrp xN, weaksym'
> > into 'mov xN, 0' if weaksym is undefined.
> 
> Static linker or dynamic?  The dynamic linker can't change any code segment
> during loading, since that would break the code-sharing model used by
> SVr4-like systems.  I don't see how the static linker can know that a
> dynamically loaded library won't provide a definition at run-time, so that
> doesn't make a lot of sense to me either.

for extern weak reference with dynamic linking yes (but such references have to go through indirection anyway that can turn into 0 already), adrp+add like access only makes sense for hidden weak syms or in case of static linking which is known by the static linker.

my use case is hidden weak symbol access.