Bug 25599 - gas generates invalid PCREL60B relocation offset with brl.call
Summary: gas generates invalid PCREL60B relocation offset with brl.call
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: 2.32
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-25 16:22 UTC by Peter Bisroev
Modified: 2021-05-22 13:38 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Code and binaries that display this bug. (11.25 KB, application/x-compressed-tar)
2020-02-25 16:22 UTC, Peter Bisroev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Bisroev 2020-02-25 16:22:52 UTC
Created attachment 12322 [details]
Code and binaries that display this bug.

Hi Everyone,

While attempting to help bootstrap a recent version of gcc on HP-UX IA-64 platform (GCC bug 61577 comments 202, 203 and 204), we have hit a PCREL60B reloc related issue.

The following tools were used:
- HP-UX B.11.31 (IA-64)
- GCC 4.7.4
- Binutils 2.32 (but this happens with 2.34 as well)
- 92453-07 linker ld HP Itanium(R) B.12.65  IPF/IPF

In the attachment binutils-gas-PCREL60B-bug.tgz you will find all the code as well as binaries used to trigger and demonstrate this issue. The source main-brl.s was generated by taking the output of compiling main.c with -save-temps option and changing br.call to brl.call, this source is presented below:
------------------------------
        .file   "main.c"
        .pred.safe_across_calls p1-p5,p16-p63
        .section        .text,  "ax",   "progbits"
        .align 16
        .global main#
        .type   main#, @function
        .proc main#
main:
        .prologue 14, 32
        .save ar.pfs, r33
        alloc r33 = ar.pfs, 0, 4, 0, 0
        .vframe r34
        mov r34 = r12
        .save rp, r32
        mov r32 = b0
        mov r35 = r1
        .body
        ;;
        brl.call.sptk.many b0 = hello#
        mov r1 = r35
        mov r14 = r0
        ;;
        mov r8 = r14
        mov ar.pfs = r33
        mov b0 = r32
        .restore sp
        mov r12 = r34
        br.ret.sptk.many b0
        ;;
        .endp main#
        .global hello#
        .type   hello#, @function
        .ident  "GCC: (GNU) 4.7.4"
------------------------------

When assembling main-brl.s we get the expected PCREL60B relocation but at the wrong offset (0x21 instead of 0x22):
------------------------------
20:   04 00 00 00 01 00       [MLX]       nop.m 0x0
                      21: PCREL60B    hello
26:   00 00 00 00 00 00                   brl.call.sptk.many b0=20 <main+0x20>
------------------------------

If we now link this object into the final executable HP's linker clobbers the nop.m at 0x20 due to the wrong reloc offset as shown below:
------------------------------
4000930:       04 00 00 00 00 00       [MLX]       break.m 0x0
4000936:       00 30 00 00 00 00                   brl.call.sptk.many b0=4000930 <_end+0xc3ff08f0>
400093c:       08 00 00 d0 
------------------------------

This seems to be the only issue here. To verify I checked the 60 bit immediate that gets encoded and it seems to be correct. I have also manually patched main-brl.o to change the reloc offset from 0x21 to 0x22 and relinked as shown below:
------------------------------
# objdump -rd main-brl-patched.o
20:   04 00 00 00 01 00       [MLX]       nop.m 0x0
                      22: PCREL60B    hello
26:   00 00 00 00 00 00                   brl.call.sptk.many b0=20 <main+0x20>


# objdump -rd hello-brl-patched
4000930:       04 00 00 00 01 00       [MLX]       nop.m 0x0
4000936:       00 00 00 00 00 00                   brl.call.sptk.many b0=4000990 <hello>
400093c:       68 00 00 d0 
------------------------------

As can be seen above the relocation is now correct and the executable works as expected.

Please let me know if further info is required.

Thank you!
--peter
Comment 1 Nick Clifton 2020-04-08 12:57:56 UTC
(In reply to Peter Bisroev from comment #0)
Hi Peter,

  I am afraid that I am totally unfamiliar with the IA64 ABI, so if my
  questions below do not make sense, then please forgive me:

  1.  Is it the case that all instances of the PCREL60B reloc are wrong
      or is it only under certain circumstances ?  If it is under
      certain circumstances, then what are they ?

  2.  Are other, similar relocs affected ?

  3.  Does this problem only apply to the HP-UX targeted version of the
      IA64 assembler, or does it apply to other versions too ?  eg
      ia64-linux, ia64-netbsd, etc.

  4.  Is there a specification of the PCREL60B reloc available somewhere ?

Cheers
  Nick
Comment 2 John David Anglin 2020-04-14 14:52:44 UTC
For some reason my responses to Nick's questions bounced.

Documentation on ia64 relocs is here:
https://refspecs.linuxfoundation.org/elf/IA64-SysV-psABI.pdf

On 2020-04-08 8:57 a.m., nickc at redhat dot com wrote:
>   I am afraid that I am totally unfamiliar with the IA64 ABI, so if my
>   questions below do not make sense, then please forgive me:
>
>   1.  Is it the case that all instances of the PCREL60B reloc are wrong
>       or is it only under certain circumstances ?  If it is under
>       certain circumstances, then what are they ?

I believe the issue applies to all instances.  The reloc just isn't tested.

>
>   2.  Are other, similar relocs affected ?

I believe the PCREL21B works properly.

>
>   3.  Does this problem only apply to the HP-UX targeted version of the
>       IA64 assembler, or does it apply to other versions too ?  eg
>       ia64-linux, ia64-netbsd, etc.

HP-UX ld handles calls to "weak" functions differently from GNU ld.  The big
difference is calls to "weak" functions don't go through the PLT.  Currently,
calls to weak functions use the PCREL21B relocation.

When gcc switched to building with g++, this resulted in the gcc build using
weak functions.  There's no garbage collection in the HP linker (i.e., linkonce
support), so some calls ended up exceeding the maximum branch distance of the
PCREL21B relocation.  This is a particular problem when stage1 is built
with -O0.  The cc1 and cc1plus binaries are huge.

Given the above, the simplest solution seemed to be to add a long call feature
using the PCREL60B relocation.  However, Peter found that the GNU tools don't
appear to handle the relocation correctly.
Comment 3 Peter Bisroev 2020-04-23 15:44:44 UTC
Hi Nick,
Hi John,

I apologize for a very delayed response gentlemen. But I completely agree with what John has said in his previous comment. Just to expand on his answer for point (3), from my current understanding, I think in general this issue applies to all IA64 based systems. However because on the other systems as John has mentioned, linkers in use most likely support garbage collection, the PCREL60B issue has not been triggered as it really seems to apply to corner cases when we have fairly large binaries to deal with.

Unfortunately due to severe lack of time right now, I did not have a chance to see if I can provide a proof of concept patch for this issue for now. However if you have one that I can test, I will be able to do so fairly quickly as I have access to IA HPUX box right now.

Thanks!
--peter
Comment 4 John Buddery 2021-05-17 10:07:06 UTC
Here's the solution I used to fix the PCREL60B offset for HP:

--- binutils-2.36/gas/config/tc-ia64.c	2021-01-09 10:47:33.000000000 +0000
+++ binutils-2.36-snake/gas/config/tc-ia64.c	2021-05-17 10:21:40.651307362 +0100
@@ -6892,7 +6892,13 @@
       for (j = 0; j < md.slot[curr].num_fixups; ++j)
 	{
 	  ifix = md.slot[curr].fixup + j;
-	  fix = fix_new_exp (frag_now, frag_now_fix () - 16 + i, 8,
+          
+          unsigned long where = frag_now_fix () - 16 + i;
+#ifdef TE_HPUX
+          if ( ifix->code == BFD_RELOC_IA64_PCREL60B ) where++;
+#endif
+                   
+	  fix = fix_new_exp (frag_now, where, 8,
 			     &ifix->expr, ifix->is_pcrel, ifix->code);
 	  fix->tc_fix_data.opnd = ifix->opnd;
 	  fix->fx_file = md.slot[curr].src_file;

I've made the change HP specific, as I'm not sure the binutils linker treats the offset the same way, and I've no way to test.

I've tested this and it works for brl instructions in all the cases I've seen (including a full bootstrap of gcc and a separate large project build).

I know this platform is obsoleted in 2.36, but as it's the only way to get a working modern gcc version, it would be really helpful if this change or something similar could be accepted.

---
John
Comment 5 dave.anglin 2021-05-17 15:05:03 UTC
Hi John,

Please send change to binutils list with install request.  CC Jim Wilson and myself.  Jim is the
expert on ia64 assembly code.

There are some GNU style issues with the change as written.  The declaration of "where"
should be at the start of the block.  There should be no space after "(" or before ")".  "where++"
should be on following line.  Check white space.

The Debian ia64 Linux port is still active, so I don't think ia64 should be obsoleted at this time.

After the binutils change is accepted, please submit gcc changes to the gcc patches list.  Please
test change on master if possible.

On 2021-05-17 6:07 a.m., jvb at cyberscience dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25599
>
> John Buddery <jvb at cyberscience dot com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jvb at cyberscience dot com
>
> --- Comment #4 from John Buddery <jvb at cyberscience dot com> ---
> Here's the solution I used to fix the PCREL60B offset for HP:
>
> --- binutils-2.36/gas/config/tc-ia64.c  2021-01-09 10:47:33.000000000 +0000
> +++ binutils-2.36-snake/gas/config/tc-ia64.c    2021-05-17 10:21:40.651307362
> +0100
> @@ -6892,7 +6892,13 @@
>        for (j = 0; j < md.slot[curr].num_fixups; ++j)
>         {
>           ifix = md.slot[curr].fixup + j;
> -         fix = fix_new_exp (frag_now, frag_now_fix () - 16 + i, 8,
> +          
> +          unsigned long where = frag_now_fix () - 16 + i;
> +#ifdef TE_HPUX
> +          if ( ifix->code == BFD_RELOC_IA64_PCREL60B ) where++;
> +#endif
> +                   
> +         fix = fix_new_exp (frag_now, where, 8,
>                              &ifix->expr, ifix->is_pcrel, ifix->code);
>           fix->tc_fix_data.opnd = ifix->opnd;
>           fix->fx_file = md.slot[curr].src_file;
>
Comment 6 Sourceware Commits 2021-05-19 15:28:19 UTC
The master branch has been updated by John David Anglin <danglin@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ee22a1a31d3432bb40b67229f75ae7c2a271e38e

commit ee22a1a31d3432bb40b67229f75ae7c2a271e38e
Author: John David Anglin <danglin@gcc.gnu.org>
Date:   Wed May 19 15:27:28 2021 +0000

    Fix offset for ia64 PCREL60B relocation on HP-UX
    
    gas/ChangeLog:
    2021-05-19  John Buddery  <jvb@cyberscience.com>
            PR 25599
            * config/tc-ia64.c (emit_one_bundle): Increment fixup offset
            by one for PCREL60B relocation on HP-UX.
Comment 7 John David Anglin 2021-05-19 15:33:34 UTC
Patch applied to master.
Comment 8 Alan Modra 2021-05-21 01:16:17 UTC
This patch caused a failure in the gas testsuite.
ia64-hpux  +FAIL: ia64 mlx reloc

Investigating how this testsuite failure should be fixed, I took a look at the ia64 abi and found that the bottom two bits of r_offset for relocations applying to instructions encode a slot number.  So the comment in the patch really ought to talk about slot numbers rather than "offset" to better explain what is going on.  Furthermore, PCREL60B applies to slot 1 and slot 2, so it seems wrong to specify slot 2 for PCREL60B.  If that really is needed the comment ought to say something about a workaround for an ia64-hpux linker bug, I think.  Note that there are a number of relocations with form instruction-immediate64.  Like the form instruction-immediate60 for PCREL60B those also apply to slot 1 and slot 2.  It looks like gas emits a slot 1 encoded offset for those.  Is that correct for ia64-hpux?

Also, if I'm correct in my analysis, what happens when/if the ia64-hpux linker is fixed?  I'm inclined to think the gas patch should not have been applied, the bugzilla subject line is wrong, and the reporter should have instead pursued an ia64-hpux linker bug fix.
Comment 9 John Buddery 2021-05-21 09:00:50 UTC
Thanks very much for the analysis - I agree, this is about slot numbers not offsets and the comment is inaccurate.

I too found the HP behaviour odd, when considering the instruction as using slots 1 and 2. Possibly the reasoning is that the actual instruction really occupies slot 2, with slot 1 containing 39 bits of the immediate. So, HP could argue that the instruction slot is 2, even though the relocation starts at slot 1.

In any case, the HP tool chain generates the relocation that way. For this reason, HP would effectively be unable to make a linker change without breaking the existing library codebase. It's also vanishingly unlikely they will make further changes as the OS is end of life in 2025.

The binutils linker does not support HP, otherwise of course the preferred solution would be to use that

So, the only possibility to get a working gas is compatibility with the HP linker, and I would ask you to please consider keeping this change - with an improved comment to say this is a workaround for the HP linker using slot 2 as the target.

As far as I can tell, it is only the immediate-60 relocation that has problems on HP. I have not seen issues with any of the immediate-64 relocations, at least not in any of the instructions gcc generates.
Comment 10 dave.anglin 2021-05-21 15:34:00 UTC
I  think the test is wrong.  The brl instruction is comprised of two instruction slots (L+X).  For brl, the
imm39 field and a 2-bit Ignored field occupy the L instruction slot.  The actual branch instruction is in the
second (X) slot.  The PCREL relocation needs to be computed relative to this slot, so it makes sense
that the relocation would be specified relative to this slot.

The psABI says "P" is the address of the bundle containing the instruction.  However, the brl instruction
was introduced in itanium2 and it uses two instruction slots.

I think the other L+X instructions (movl, break.x and nop.x) need checking with the PCREL60B and other
relocations involving instruction placement.  John, could you look at this?

This is failing test:
        .text
        .proc foo#
foo:
        .mlx
        mov r25 = r0
        brl.call.sptk.many b0 = bar#
        .endp foo#

I agree comment in the change needs an update.

Even if the HP linker is wrong, it's important that we get the brl instruction working on HPUX so that
current versions of gcc will build.
Comment 11 John Buddery 2021-05-21 16:15:30 UTC
I will test movl, break.x and nop.x. They have operands of 64, 62 and 62 bits, with very different layout to brl, so PCREL60 won't apply. One potential candidate is PCREL64 against a movl, I'll see if I can generate and test this and anything else I can find (it will be next week though).

I see your point about the test - we could be checking for PCREL relocations for an L+X instruction class, rather than PCREL60 relocations.

Although the P calculation is bundle based, the r_offset uses 2 bits for the slot, as stated above. It's the r_offset we're adjusting. The info I've been using is on pages 4-6 to 4-8 of the relocation reference you provided a link for.

Specifying a slot value for a 2 slot instruction + immediate is clearly ambiguous without further clarification. HP interprets it one way, other platforms differently. I'd say neither is really right or wrong, it's unfortunate that there are differences but we're stuck with it now.
Comment 12 dave.anglin 2021-05-21 17:18:49 UTC
On 2021-05-21 12:15 p.m., jvb at cyberscience dot com wrote:
> Specifying a slot value for a 2 slot instruction + immediate is clearly
> ambiguous without further clarification. HP interprets it one way, other
> platforms differently. I'd say neither is really right or wrong, it's
> unfortunate that there are differences but we're stuck with it now.
I agree.

On page 3:294 of the architecture, I see that the .mlx template has the X unit instruction in slot 2.
However, the IP value used with the immediate is the address of the bundle which contains the
current executing instruction (page 1:27).  So, the immediate value doesn't depend on slot.  But
I still think it makes sense to apply the relocation to the X unit instruction as it determines the X3, X4
immediate encoding.
Comment 13 dave.anglin 2021-05-21 17:36:28 UTC
On 2021-05-21 12:15 p.m., jvb at cyberscience dot com wrote:
> I will test movl, break.x and nop.x. They have operands of 64, 62 and 62 bits,
> with very different layout to brl, so PCREL60 won't apply. One potential
> candidate is PCREL64 against a movl, I'll see if I can generate and test this
> and anything else I can find (it will be next week though).
Could you test the testcase with HP assembler and compare with GAS using objdump -r?
Comment 14 Sourceware Commits 2021-05-22 07:53:44 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d71893802fe424e3123ced8c6ed158958487f716

commit d71893802fe424e3123ced8c6ed158958487f716
Author: Alan Modra <amodra@gmail.com>
Date:   Sat May 22 12:59:36 2021 +0930

    Re: Fix offset for ia64 PCREL60B relocation on HP-UX
    
            PR 25599
            * config/tc-ia64.c (emit_one_bundle): Expand comment for HP-UX
            adjustment.  Add assertion.
            * testsuite/gas/ia64/reloc-mlx.d: Pass when slot 2 specified
            for PCREL60B.
Comment 15 Alan Modra 2021-05-22 08:13:53 UTC
Thanks for delving into this, as you can see I've updated the comment and made the testcase accept either slot 1 or 2 for PCREL60B.

Interestingly, GNU ld seems to accept either slot 1 or 2 for PCREL60B.  At least, the gas/testsuite/gas/ia64/reloc-mlx testcase object with r_offset edited seems to produce the same final linked output as the original, for various --defsym bar=0x<hex_number>.
Comment 16 dave.anglin 2021-05-22 13:38:03 UTC
Thanks for adjusting the comment.

It seems the person that wrote the GNU ld code was aware of the ambiguity.  I suspect
there will be a need to implement the brl instruction in gcc for GNU Linux at some point,
so its good to know that linking with it works.