Bug 16005 - [avr] Linker crash on a particular instruction sequence with --relax
Summary: [avr] Linker crash on a particular instruction sequence with --relax
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.23
: P2 normal
Target Milestone: 2.40
Assignee: Alan Modra
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-05 12:02 UTC by Jan Waclawek
Modified: 2022-07-29 13:58 UTC (History)
4 users (show)

See Also:
Host:
Target: avr
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Waclawek 2013-10-05 12:02:26 UTC
As reported in http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=136763 , the linker for avr target crashes when trying to relax a rjmp/ret sequence.

The test case (crash.s):
main:
   nop
   rjmp main
   ret
   rjmp main 

>avr-as crash.s -o crash.o
>avr-ld --relax --debug-relax crash.o -o crash.elf

Observations:
1. this occurs on various builds ranging back 5 years, hosted both on Win and Linux, although the visible symptoms vary, on some builds ld ends silently and generates an empty output file

2. the second rjmp is not needed, a nop or any other instruction would do. In fact, the observation is, that the error is not occuring if there is no instruction after the ret

3. the target of rjmp is irrelevant, it can be any symbol

4. adding --debug-relax switch reveals more details: the error occurs as the consequence of rjmp / ret detection and subsequent ret removal

>avr-ld --relax --debug-relax crash.o -o crash.elf
found rjmp / ret sequence at address 0x2
unreachable ret instruction at address 0x4 deleted.

This locates the error into bfd/elf32-avr.c:elf32_avr_relax_delete_bytes(), being called just after the debug printout "unreachable ret instruction... deleted".

5. Replacing the first rjmp by jmp or rcall results in normal linking. This is interesting, because both the jmp/ret and rcall/ret sequences are in previous relaxation iteration replaced by rjmp/ret sequence, thus then undergo the same ret-removal procedure, which in this case is completed correctly:
main:
   nop
   rcall main
   ret
   rjmp main

>avr-as crash.s -o crash.o
>avr-ld --relax --debug-relax crash.o -o crash.elf
converted rcall/ret sequence at address 0x2 into rjmp/ret sequence. Section is .
text

found rjmp / ret sequence at address 0x2
unreachable ret instruction at address 0x4 deleted.
Relocation at address 0x6 needs to be moved.
Old section offset: 0x6, New section offset: 0x4
Checking if the relocation's addend needs corrections.
Address of anchor symbol: 0x0
Address of relocation target: 0x0
Address of relaxed insn: 0x2
Checking if the relocation's addend needs corrections.
Address of anchor symbol: 0x0
Address of relocation target: 0x0
Address of relaxed insn: 0x2

---

Stepping through elf32_avr_relax_delete_bytes() for the original input file results in crash in this line:

    memmove (contents + addr, contents + addr + count,
             (size_t) (toaddr - addr - count));

(which is conditionally executed only if there is anything to move, which btw. matches observation 2). contents was found to be NULL. As previously in this routine
 contents = elf_section_data (sec)->this_hdr.contents;
it's elf_section_data (sec) which is not properly set up.

Indeed, in the only other case when elf32_avr_relax_delete_bytes() is called (in the jmp/call shrink branch of the relaxation loop), there are several lines which might be relevant:

/* Note that we've changed the relocs, section contents,
                   etc.  */
                elf_section_data (sec)->relocs = internal_relocs;
                elf_section_data (sec)->this_hdr.contents = contents;
                symtab_hdr->contents = (unsigned char *) isymbuf;

Copying these three lines just before the other call to elf32_avr_relax_delete_bytes() and recompiling the linker, appears to have fixed the problem.

Unfortunately, I am not versed enough to judge, whether this is a proper fix to the problem.
Comment 1 Nick Clifton 2013-10-14 15:10:25 UTC
Hi Jan,

  I am sorry, I am unable to reproduce this problem.  Possibly it has been fixed by some other change.  I was using the current mainline binutils development sources so maybe that is what is different.  Please could you try using the latest sources and let me know if the problem still exists for you.

Cheers
  Nick
Comment 2 Sourceware Commits 2022-07-29 07:56:54 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b875e9c93df9f30efb34a75b9379490c03ec4d4b

commit b875e9c93df9f30efb34a75b9379490c03ec4d4b
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Jul 29 16:52:52 2022 +0930

    PR16005, avr linker crash on a particular instruction sequence with --relax
    
    It's possible for relax_delete_bytes to be called with section
    contents NULL, as demonstrated by the testcase in this PR.
    
            PR 16005
            * elf32-avr.c (elf32_avr_relax_delete_bytes): Get section contents
            if not already available.
Comment 3 Alan Modra 2022-07-29 07:57:48 UTC
Fixed for 2.40
Comment 4 Sourceware Commits 2022-07-29 13:58:28 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=10948fb9fd66c029d59c97e04556ab827076336c

commit 10948fb9fd66c029d59c97e04556ab827076336c
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Jul 29 22:35:13 2022 +0930

    Re: PR16005, avr linker crash on a particular instruction sequence with --relax
    
    The last patch wasn't so clever.  The contents in fact have already
    been read, just not cached where relax_delete_bytes expects them.
    relax_delete_bytes also modifies relocs and syms, so they should be
    cached too.
    
            PR 16005
            * elf32-avr.c (elf32_avr_relax_delete_bytes): Revert last change.
            (elf32_avr_relax_section): Cache contents, relocs and syms
            before calling relax_delete_bytes.