Bug 33789 - debugedit corrupts RISC-V object files
Summary: debugedit corrupts RISC-V object files
Status: ASSIGNED
Alias: None
Product: debugedit
Classification: Unclassified
Component: debugedit (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-01-13 14:52 UTC by Nick Clifton
Modified: 2026-01-24 16:05 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:


Attachments
riscv64 built (61.05 KB, application/octet-stream)
2026-01-13 18:34 UTC, Yanko Kaneti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Clifton 2026-01-13 14:52:13 UTC
Version 5.2 of debugedit (and version 5.1 and maybe others) appears to corrupt some Risc-V object files.

What appears to be happening is that debugedit is changing the contents of mergeable dwarf string sections but not correctly adjusting all of the symbols that reference those sections.  In particular it is leaving symbols that are placed beyond the end of the section.  This causes issues when the sections are merged as the linker has no way to deduce the correct new location for these symbols.  This in turn leads to relocations which reference these symbols to misbehave, resulting in malformed DWARF debug information.

This issue is probably the cause of the problem reported in PR 33723.

In order to reproduce the issue, you need a riscv64 machine.  Speak to David Abdurachmanov <david.abdurachmanov@gmail.com> or Kashyap Chamarthy <kchamart@redhat.com> for assistance with this.

Once you have a machine you first need to build the tcl package.  For Fedora I used:

  fedpkg clone tcl
  cd tcl
  fedpkg srpm
  rpm -ivh tcl*.src.rpm
  rpmbuild -bb --noclean ~/rpmbuild/SPECS/tcl.spec

I could have used "fedpkg local" I suppose, but I am more familiar with the rpmbuild method.

Next, inspect the tclStubLib.o file in the build:

  cd ~/rpmbuild/BUILD/tcl-9.0.2-build/tcl9.0.2/unix/
  readelf --wide --sections tclStubLib.o | grep debug_line_str

   [19] .debug_line_str   PROGBITS  0000000000000000 01299d 0001dd 01  MS  0   0  1

Note the size of the .debug_line_str section: 0x1dd bytes in this case.
Now check the location of the .LASF1 symbol:

  readelf --wide --symbols tclStubLib.o  | grep ".LASF1$"

    75: 00000000000001a3     0 NOTYPE  LOCAL  DEFAULT   19 .LASF1

Its value is 0x1a3, well inside section 19, so all is well.
Finally check the relocations that reference this symbol:

  readelf --wide --relocs  tclStubLib.o  | grep ".LASF1 "

    000000000000001b  0000004b00000001 R_RISCV_32    00000000000001a3 .LASF1 + 0
    0000000000000022  0000004b00000001 R_RISCV_32    00000000000001a3 .LASF1 + 0

These are OK too.
Now run debugedit on the file.  (I suggest making a backup of it first since debugedit overwrites the input file).

  debugedit -b ~/rpmbuild/BUILD/tcl-9.0.2-build/tcl9.0.2 \
    -d /usr/src/debug/tcl-9.0.2-1.fc43.riscv64 \
    -l /tmp/find-debuginfo.1lDPj7/debugsources.2 \
    `pwd`/tclStubLib.o 

(I am not sure if the parameter values are significant.  They are just the ones that are used when debugedit is invoked from inside find-debuginfo whilst processing the libtclstub.a static archive).

Now check the .debug_line_str section:

  readelf --wide --sections tclStubLib.o | grep debug_line_str

   [19] .debug_line_str   PROGBITS   0000000000000000 023715 0001a2 01  MS  0   0  1

It has changed size to 0x1a2.
But the .LASF1 symbol has not moved:

  readelf --wide --symbols tclStubLib.o  | grep ".LASF1$"

   75: 00000000000001a3     0 NOTYPE  LOCAL  DEFAULT   19 .LASF1

So it now references a location beyond the end of the .debug_line_str section.

And the relocations that use the .LASF1 symbol:

  readelf --wide --symbols tclStubLib.o  | grep ".LASF1$"

    000000000000001b  0000004b00000001 R_RISCV_32     00000000000001a3 .LASF1 - 2e
    0000000000000022  0000004b00000001 R_RISCV_32     00000000000001a3 .LASF1 - 2e

Now have *negative* offsets from the symbol's location.

When the tclStubLib.o object file is merged with other files the linker will complain about "access beyond the end of a section" and the DWARF information that relies upon those relocations will be corrupt.
Comment 1 Kashyap Chamarthy 2026-01-13 15:36:36 UTC
Here are the instructions for getting Fedora running under emulated RISC-V hardware:

https://fedoraproject.org/wiki/Architectures/RISC-V/QEMU
Comment 2 Mark Wielaard 2026-01-13 17:30:13 UTC
I can probably setup some RISC-V environment, but in theory debugedit should work cross arch. So if you happen to have this tclStubLib.o object file available somewhere that would be useful then I can try quickly debugging this on my x86_64 setup.
Comment 3 Mark Wielaard 2026-01-13 17:38:28 UTC
But even without a RISC-V object I do see something that debugedit isn't expecting. On other arches cross-.debug section reference relocations are against the section symbol, aka the start of the .debug section. The start of the debug section symbol wouldn't change even if the section size would change, and so doesn't need to be adjusted. So debugedit doesn't contain code to adjust symbol values. We will have to figure out why on RISC-V this doesn't happen. Why is there this .LASF1 symbol? And why do the relocations use that instead of the start of section symbol?
Comment 4 David Abdurachmanov 2026-01-13 18:33:13 UTC
Jason upload the object file here: https://jmontleon.fedorapeople.org/tclStubLib.o
Comment 5 Yanko Kaneti 2026-01-13 18:34:05 UTC
Created attachment 16569 [details]
riscv64 built
Comment 6 David Abdurachmanov 2026-01-13 18:35:31 UTC
I almost forgot about this patch in debugedit fork (not sure if it was posted on mailing-lists, etc) that someone shared on Matrix too: https://src.fedoraproject.org/fork/u2fsdgvkx1/rpms/debugedit/c/120cd89d15831a32a0937562c3cda9dfb94391e5?branch=rawhide

I am not sure it was for exact this problem, but it looks like it.
Comment 7 Jason Montleon 2026-01-13 22:20:53 UTC
Claude generated this suggestion for me before I saw the patch David shared. While seeming somewhat similar my confidence in it being fully correct is low with it being AI generated and my lack of familiarity.
https://gist.github.com/jmontleon/f57b7a2d282d64a4479cff094a94306c

That said, I built a debugedit package with it (https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.1.fc43.riscv64.rpm)

And using it I was able to successfully rebuild tcl and environment-modules
mock -r fedora-43-riscv64 --init
mock -r fedora-43-riscv64 --no-clean install ./debugedit-5.2-3.rvre0.1.fc43.riscv64.rpm 
mock -r fedora-43-riscv64 --no-clean rebuild ./tcl-9.0.2-1.rvre1.fc43.src.rpm 
mock -r fedora-43-riscv64 --no-clean install /var/lib/mock/fedora-43-riscv64/result/tcl-9.0.2-1.rvre1.fc43.riscv64.rpm /var/lib/mock/fedora-43-riscv64/result/tcl-devel-9.0.2-1.rvre1.fc43.riscv64.rpm
mock -r fedora-43-riscv64 --no-clean rebuild ./environment-modules-5.6.1-1.fc43.src.rpm 

I took https://src.fedoraproject.org/fork/u2fsdgvkx1/rpms/debugedit/c/120cd89d15831a32a0937562c3cda9dfb94391e5?branch=rawhide instead after learning of it and rebuilt again (https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.2.fc43.riscv64.rpm)

It also appeared to succeed running thru these steps with it:
mock -r fedora-43-riscv64 --init
mock -r fedora-43-riscv64 --no-clean install ./debugedit-5.2-3.rvre0.2.fc43.riscv64.rpm 
mock -r fedora-43-riscv64 --no-clean rebuild ./tcl-9.0.2-1.rvre1.fc43.src.rpm 
mock -r fedora-43-riscv64 --no-clean install /var/lib/mock/fedora-43-riscv64/result/tcl-9.0.2-1.rvre1.fc43.riscv64.rpm /var/lib/mock/fedora-43-riscv64/result/tcl-devel-9.0.2-1.rvre1.fc43.riscv64.rpm
mock -r fedora-43-riscv64 --no-clean rebuild ./environment-modules-5.6.1-1.fc43.src.rpm

SRPMs are also uploaded to https://jmontleon.fedorapeople.org/
https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.1.fc43.src.rpm
https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.2.fc43.src.rpm

If you are able to use mock it should do the right thing with qemu-user-static installed without having to resort to VMs, containers, etc. Albeit maybe not the most fun environment to debug in but you can install additional packages as in the examples above and get a shell in the environment with `mock -r fedora-43-riscv64 --no-clean --shell` if desired or needed. It is not super fast, but tolerable in the absence or real hardware in a small case like this.
Comment 8 Mark Wielaard 2026-01-14 00:54:07 UTC
Thanks for the binary and patch idea of updating the symtab symbol values after the .debug_str and .debug_line_str are updated for symbols that point into those pools.

But I have to check why that works.

On every other architecture the symbol used for debug section relocation is the zero/section symbol. So they have very few local symbols, basically none for/inside the .debug_str or .debug_line_str section.

On other arches we only have to update the relocation addend.
Here, with the patch, we would update both.

It would be good to figure out why we are getting all these local symbols inside the .debug_str and .debug_line_str. It looks very inefficient. You seem to have a symbol and a relocation for each string. Which seems to explain why all relocations have an addend of zero.

So on x86_64 you would have:

Relocation section [26] '.rela.debug_line' for section [25] '.debug_line' at offset 0x96f18 contains 52 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000022  X86_64_32       000000000000000000     +47 .debug_line_str
  0x0000000000000026  X86_64_32       000000000000000000     +76 .debug_line_str
  0x000000000000002a  X86_64_32       000000000000000000     +82 .debug_line_str
  0x000000000000002e  X86_64_32       000000000000000000     +95 .debug_line_str

Relocation section [17] '.rela.debug_line' for section [16] '.debug_line' at offset 0x31d48 contains 375 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000022  RISCV_32        0x0000000000000169      +0 .LASF1
  0x0000000000000026  RISCV_32        0x000000000000000b      +0 .LASF1650
  0x000000000000002a  RISCV_32        0x000000000000019d      +0 .LASF1651
  0x000000000000002e  RISCV_32        0x000000000000007d      +0 .LASF1652

Note how all relocations on x86_64 use the same "symbol" with Value zero, and the relocation addend is the actual offset into the debug string table.

But on riscv each relocation has a different symbol with a value that is the offset into the debug string table, and all the Addends are zero.

The result is a much larger symtabs.

And debugedit doing the wrong thing since it assumed only the relocation added, and not the symbol value, needs adjusting.

It definitely seems a bug in debugedit and I'll try to fix it, Maybe by adding a symtab update pass or maybe we can do it while updating the relocations (if we adjust the symbols we shouldn't also update the addends).

But it would also be good if someone looked into why riscv creates these relocation/symbol pairs for debug string tables. It seems it would be more efficient to do like other arches and just have one symbol at the start of the section against which you relocate.
Comment 9 Jason Montleon 2026-01-14 02:32:29 UTC
Perhaps this has to do with what is discussed here?
"Why is .symtab so huge on riscv?" https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036

"This is due, at least in part, to the fact that RISC-V doesn’t compute branch targets until link time, to facilitate aggressive linker relaxation. By contrast, x86 and ARMv8 assemblers compute branch targets at assembly time, so don’t need to carry around some of those symbols.

Ordinarily, this only bloats the intermediate build artifacts (and static libraries), not linked executables."
Comment 10 Mark Wielaard 2026-01-14 12:37:33 UTC
(In reply to Jason Montleon from comment #9)
> Perhaps this has to do with what is discussed here?
> "Why is .symtab so huge on riscv?"
> https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036

O, yes. That does explain things. Thanks.
And the corresponding GCC bug has been closed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107251
So for now we are stuck with it, it seems :{

There is way too much information about riscv linker relaxation here:
https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation

I still think this doesn't make sense for cross (non-allocated) debug section references. Especially if the target reference is an index in a string table section.
It might make sense to somehow special case those, because the debug string tables can be huge so the overhead of having one extra symbol per string is also huge.

But for now it seems we have to deal with it. And it explains why this is only a bug on riscv when dealing with debuginfo in ET_REL files (objects, static archives and kernel modules).
Comment 11 Nick Clifton 2026-01-15 11:19:43 UTC
FYI: I have posted a potential patch which would cause the Risc-V assembler to adjust relocations against local symbols in mergeable sections so that they are against the section's symbol:

https://sourceware.org/pipermail/binutils/2026-January/147334.html
Comment 12 Mark Wielaard 2026-01-21 12:35:03 UTC
Proposed patch/fix:
https://inbox.sourceware.org/debugedit/20260121123342.137521-1-mark@klomp.org

Currently testing https://builder.sourceware.org/buildbot/#/changes/108020      
But doesn't have an good testcase yet.
Comment 13 Yanko Kaneti 2026-01-22 09:07:25 UTC
Sharing my observations from yesterdays matrix session..

The debugedit patch fixes the issue in local testing i.e. the static libs processed by find-debuginfo are no longer corrupted.

Mark suggested using elf-ellint to check the .o/.a before and after processing, which turned up to show signs of something gone wrong.

e.g with instrumented  debugedit-5.2-3.rvre0.fc43.riscv64 (before the fix) running extracting debuginfo from libtclstub.a

+ /usr/bin/find-debuginfo -j4 --strict-build-id -m -i --build-id-seed 9.0.2-1.fc43 --unique-debug-suffix -9.0.2-1.fc43.riscv64 --unique-debug-src-base tcl-9.0.2-1.fc43.riscv64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 -S debugsourcefiles.list /builddir/build/BUILD/tcl-9.0.2-build/tcl9.0.2
find-debuginfo: starting
Extracting debug info from 3 files
Running eu-elflint on tclStubLib.o before debugedit
No errors
Running eu-elflint on tclStubLib.o after debugedit
section [25] '.symtab': symbol 1842 (.LASF1653): st_value out of bounds
section [25] '.symtab': symbol 1845 (.LASF1656): st_value out of bounds
Running eu-elflint on tclStubCall.o before debugedit
No errors
Running eu-elflint on tclStubCall.o after debugedit
section [26] '.symtab': symbol 204 (.LASF79): st_value out of bounds
section [26] '.symtab': symbol 207 (.LASF82): st_value out of bounds
Running eu-elflint on tclStubLibTbl.o before debugedit
No errors
Running eu-elflint on tclStubLibTbl.o after debugedit
section [20] '.symtab': symbol 1687 (.LASF1646): st_value out of bounds
section [20] '.symtab': symbol 1693 (.LASF1652): st_value out of bounds
Running eu-elflint on tclTomMathStubLib.o before debugedit
No errors
Running eu-elflint on tclTomMathStubLib.o after debugedit
section [23] '.symtab': symbol 1817 (.LASF1733): st_value out of bounds
section [23] '.symtab': symbol 1820 (.LASF1736): st_value out of bounds
section [23] '.symtab': symbol 1826 (.LASF1742): st_value out of bounds
Running eu-elflint on tclOOStubLib.o before debugedit
No errors
Running eu-elflint on tclOOStubLib.o after debugedit
section [23] '.symtab': symbol 1906 (.LASF1820): st_value out of bounds
section [23] '.symtab': symbol 1909 (.LASF1823): st_value out of bounds
section [23] '.symtab': symbol 1913 (.LASF1827): st_value out of bounds
DWARF-compressing 2 files
sepdebugcrcfix: Updated 2 CRC32s, 0 CRC32s did match.
Creating .debug symlinks for symlinks to ELF files
Copying sources found by 'debugedit -l' to /usr/src/debug/tcl-9.0.2-1.fc43.riscv64
cpio: unix/.dtrace-temp.43961fde.c: Cannot stat: No such file or directory
find-debuginfo: done


Running eu-elflint on the whole libtclstub.a after processing
$ eu-elflint --gnu --debug  /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a 

/var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubLib.o):
section [25] '.symtab': symbol 1842 (.LASF1653): st_value out of bounds
section [25] '.symtab': symbol 1845 (.LASF1656): st_value out of bounds

/var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubCall.o):
section [26] '.symtab': symbol 204 (.LASF79): st_value out of bounds
section [26] '.symtab': symbol 207 (.LASF82): st_value out of bounds

/var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubLibTbl.o):
section [20] '.symtab': symbol 1687 (.LASF1646): st_value out of bounds
section [20] '.symtab': symbol 1693 (.LASF1652): st_value out of bounds

/var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclTomMathStubLib.o):
section [23] '.symtab': symbol 1817 (.LASF1733): st_value out of bounds
section [23] '.symtab': symbol 1820 (.LASF1736): st_value out of bounds
section [23] '.symtab': symbol 1826 (.LASF1742): st_value out of bounds

/var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclOOStubLib.o):
section [23] '.symtab': symbol 1906 (.LASF1820): st_value out of bounds
section [23] '.symtab': symbol 1909 (.LASF1823): st_value out of bounds
section [23] '.symtab': symbol 1913 (.LASF1827): st_value out of bounds


With the debugedit patch all these lints are 'No errors'
Comment 14 David Abdurachmanov 2026-01-22 09:55:26 UTC
We landed debugedit-5.2-4.0.riscv64.fc43 with RFC patchset + performance improvement for Rawhide (dd changes, not related to this issue) yesterday.

The build: https://riscv-koji.fedoraproject.org/koji/buildinfo?buildID=60319

We have rebuilt successfully <30 packages so far, most of them affected by tcl and sysprof static libraries.

I suggested potentially using eu-elflint as an extra post-debugedit (or in general as the last step) in https://sourceware.org/cgit/debugedit/tree/scripts/find-debuginfo.in

This could be controlled by the flag, which we could enable in Fedora/RISCV for some time, especially during rebuild of all affected libraries. Something like:

--- /root/macros.orig   2026-01-13 08:22:50.104182622 +0000
+++ redhat/macros       2026-01-13 10:22:38.039646271 +0000
@@ -356,7 +356,8 @@
 %_annotation_ldflags   %{?_lto_cflags:%{_annotation_cflags}}
 # Use the remove-section option to force the find-debuginfo script
 # to move the annobin notes into the separate debuginfo file.
-%_find_debuginfo_extra_opts %{?_annotated_build:--remove-section .gnu.build.attributes}
+# Disable processing static libraries on riscv64 to avoid damaged DWARF
+%_find_debuginfo_vendor_opts %{?_annotated_build:--remove-section .gnu.build.attributes} %[ "%{_target_cpu}" == "riscv64" ? "--elflint" : "" ]
 
 # Include frame pointer information by default, except on RHEL 10 and earlier
 # On RHEL 11, we are enabling it for now, with the possibility of revoking it
``` 

This should produce a fatal error if something wrong is detected by eu-elflint in any binary processed (executables, shared libraries, static libraries, or even object files; basically ELF in some form and shape).

It would be nice to have have an option to provide some timing statistics from find-debuginfo. There was a dd performance issue, and we don't have number how much eu-elflint would cost. That's one of the reasons I suggested to enable it on riscv64 too. Extra check, collect some statistics too.