Bug 25181 - RISC-V: Linker relaxation may fail if there are R_RISCV_ALIGN type relocations
Summary: RISC-V: Linker relaxation may fail if there are R_RISCV_ALIGN type relocations
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.32
: P2 normal
Target Milestone: ---
Assignee: Jim Wilson
Depends on:
Reported: 2019-11-10 03:31 UTC by Yiting Wang
Modified: 2019-11-12 23:57 UTC (History)
1 user (show)

See Also:
Last reconfirmed: 2019-11-12 00:00:00

test case (485 bytes, application/x-zip-compressed)
2019-11-10 03:37 UTC, Yiting Wang
untested patch to fix _bfd_riscv_relax_call (1.19 KB, patch)
2019-11-12 00:05 UTC, Jim Wilson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yiting Wang 2019-11-10 03:31:16 UTC
If there are two callers in two .text sections (for example sections A, B), followed by two sections with R_RISCV_ALIGN type relocations (section C, D), then followed by the called functions in sections behind (section E), there is a chance to reproduce the issue.

I draw a picture to illustrate the problem:

       high        |                                   |
        ^       -> | section E: cc, dd                 | <-
 	|       |  |___________________________________|  |
        |       |  |                                   |  |
        |       |  | section D: align2 (R_RISCV_ALIGN) |  |
        |       |  |___________________________________|  |
        |       |  |                                   |  |
        |       |  | section C: align1 (R_RISCV_ALIGN) |  |
        |       |  |___________________________________|  |
        |       |  |                                   |  |
        |       |  | section B : ff  (R_RISCV_CALL)    |--
        |       |  |___________________________________|
        |       |  |                                   |
  address low   -- | section A : _start (R_RISCV_CALL) |

Consider this situation: Section A, B, C, D, E are linked in sequence. Section C and D have R_RISCV_ALIGN type relocations. Function _start() calls function cc(), function ff() calls dd(). The size of each section is 8, 8, 64, 0xfffbc, 10 bytes before relaxation.

In the first relaxation pass (info->relax_pass ==0), the R_RISCV_CALL relocations in section A and B couldn't be relaxed to R_RISCV_JAL, because the call from _start() to cc() and the call from ff() to dd() don't fit with 21-bit offset. The offset is 0x10001c indeed. In the first relaxation pass, nothing is done.

In the third relaxation pass (info->relax_pass ==2), the R_RISCV_ALIGN relocations in section C and D could be relaxed. The sizes of section C and D were changed to 34 and 0xfffba bytes. Now, the offset of the call from _start() to cc() is 0xffffe, the offset from ff() to dd() equals to this value too.

In the second round of relaxation, in the first pass (info->relax_pass ==0), the R_RISCV_CALL relocation in section A could be relaxed to R_RISCV_JAL. After the relaxation, section A's size is reduced by 4 bytes. So, section B's base address and every symbol go down 4 bytes forward. However section C, D and E will not go down 4 bytes, because of the .balign restriction placed in section C and D.
But, when linker processes the relaxation of section B, it uses the original section B symbol addresses to calculate the offset of R_RISCV_CALL relocation (from ff() to dd()). The offset is within 1M bytes offset (0xffffe).  So, the R_RISCV_CALL relocation is relaxed to R_RISCV_JAL. This is a mistake!
When linker finally performs the relocation (in perform_relocation()), the section B's symbol has been adjusted, then the bug is exposed.

I created a simple test case to trigger this issue:

    riscv64-unknown-linux-gnu-gcc -nostdlib -o a.out a.s align_1.s align_2.s c.s
    /tmp/ccgFgKhw.o: In function `ff':
    (.text+0x0): relocation truncated to fit: R_RISCV_JAL against symbol `dd' defined in .text section in /tmp/ccR0Ha6r.o
Comment 1 Yiting Wang 2019-11-10 03:37:51 UTC
Created attachment 12070 [details]
test case
Comment 2 Yiting Wang 2019-11-10 03:40:31 UTC
I have a fix locally which changes the common ld/ldlang.c. I can post to binutils mailing list for review.

I am not sure whether there is a possibility of a similar problem occurring *between* passes instead of within a single pass. I.e. what if a call in B is within range and so gets relaxed in one pass, then in a later pass B moves down due to relaxation in a previous section - but the destination stays in the same place due to alignment? Is that possible?
Comment 3 Yiting Wang 2019-11-10 05:13:03 UTC
I've not figured out how to send a proper patch to binutils mailing list.

github is much easier. See https://github.com/riscv/riscv-binutils-gdb/pull/185
Comment 4 Jim Wilson 2019-11-12 00:03:59 UTC
The way that this should work is that if the call crosses section boundaries, then we need to use the max alignment of any section between the call instruction and the target, including the call and target sections.  Except we don't have code to compute that yet, so we just use the maximum alignment of all sections in the output.  If the call does not cross section boundaries, then it should only use the alignment of the section it is in.  This alignment value is added in to the offset to make sure that we don't relax something that might not fit after alignment is handled.  This has the downside that sometimes we refuse to relax something that might be relaxable because of a section with large alignment constraint.  But it is safe, in that we never accidentally relax something that should not be relaxed.

This is how _bfd_riscv_relax_lui and _bfd_riscv_relax_pc work.  They use max_alignment unconditionally, and set max_alignment to the section alignment if the lui and gp are in the same section and it isn't the abs section.

This is unfortunately not how _bfd_riscv_relax_call works.  It only adds in the alignment if the call and target are not in the same section.  I'd call this a bug and that it needs to work the same as the other two functions.

I attached a proposed patch that fixes _bfd_riscv_relax_call and adds your example as a testcase.
Comment 5 Jim Wilson 2019-11-12 00:05:14 UTC
Created attachment 12071 [details]
untested patch to fix _bfd_riscv_relax_call
Comment 6 Jim Wilson 2019-11-12 00:10:50 UTC
You can create a patch with git diff and then attach it to the bug report.  It needs to be mailed to the binutils list if it gets checked in, but you can always ask someone else to do that for you.

Contributing a patch to Binutils requires an FSF copyright assignment.  I don't see an assignment in your name, and it isn't clear who you work for.  Though from github I see you are in Beijing, so Alibaba and Huawei would be my first two guesses.  I see GCC assignments for both of them, but not Binutils assignments.  If you are serious about trying to contribute to GNU toolchain, then you should try to get a copyright assignment.  Meanwhile, it might be simpler if someone who already has an assignment writes a patch.

Generally, I agree with Nelson Chu's github comment that we would prefer to avoid a ldlang.c change if we can.  We might be able to get a better result by changing ldlang.c instead of the RISC-V port, but a change to ldlang.c requires a lot more testing because it affects all targets.
Comment 7 Yiting Wang 2019-11-12 10:06:30 UTC
Hi Jim,

I am very grateful to you for fixing the defect very quickly and giving me the instructions on how to contributing a patch to Binutils! My colleague has verified your patch works fine. :)

I am a WindRiver engineer, working on VxWorks. I mainly do system software and know very little toolchains. I just tried to contribute a patch. I see what you and Nelson Chu mean, we'd better not modify the common file, however, I didn't know how to fix elfnn-riscv.c. I sure am glad you resolved the defect, thank you again. :)
Comment 8 cvs-commit@gcc.gnu.org 2019-11-12 23:54:48 UTC
The master branch has been updated by Jim Wilson <wilson@sourceware.org>:


commit c6261a00c3e70dd8e508062ea43a1bcb6d547621
Author: Jim Wilson <jimw@sifive.com>
Date:   Tue Nov 12 15:50:48 2019 -0800

    RISC-V: Fix ld relax failure with calls and align directives.
    Make _bfd_riscv_relax_call handle section alignment padding same as
    the _bfd_riscv_relax_lui and _bfd_riscv_relax_pc functions already
    do.  Use the max section alignment if section boundaries are crossed,
    otherwise the alignment of the containing section.
    	PR 25181
    	* elfnn-riscv.c (_bfd_riscv_relax_call): Always add max_alignment to
    	foff.  If sym_sec->output_section and sec->output_section are the same
    	and not *ABS* then set max_alignment to that section's alignment.
    	PR 25181
    	* testsuite/ld-riscv-elf/call-relax-0.s: New file.
    	* testsuite/ld-riscv-elf/call-relax-1.s: New file.
    	* testsuite/ld-riscv-elf/call-relax-2.s: New file.
    	* testsuite/ld-riscv-elf/call-relax-3.s: New file.
    	* testsuite/ld-riscv-elf/call-relax.d: New test.
    	* testsuite/ld-riscv-elf/ld-riscv-elf.exp: Run call-relax test.
    Change-Id: Iaf65cee52345abf1955f36e8e72c4f6cc0db8d9a
Comment 9 Jim Wilson 2019-11-12 23:57:03 UTC
Fixed on mainline.