Version 5.2 of debugedit (and version 5.1 and maybe others) appears to corrupt some Risc-V object files. What appears to be happening is that debugedit is changing the contents of mergeable dwarf string sections but not correctly adjusting all of the symbols that reference those sections. In particular it is leaving symbols that are placed beyond the end of the section. This causes issues when the sections are merged as the linker has no way to deduce the correct new location for these symbols. This in turn leads to relocations which reference these symbols to misbehave, resulting in malformed DWARF debug information. This issue is probably the cause of the problem reported in PR 33723. In order to reproduce the issue, you need a riscv64 machine. Speak to David Abdurachmanov <david.abdurachmanov@gmail.com> or Kashyap Chamarthy <kchamart@redhat.com> for assistance with this. Once you have a machine you first need to build the tcl package. For Fedora I used: fedpkg clone tcl cd tcl fedpkg srpm rpm -ivh tcl*.src.rpm rpmbuild -bb --noclean ~/rpmbuild/SPECS/tcl.spec I could have used "fedpkg local" I suppose, but I am more familiar with the rpmbuild method. Next, inspect the tclStubLib.o file in the build: cd ~/rpmbuild/BUILD/tcl-9.0.2-build/tcl9.0.2/unix/ readelf --wide --sections tclStubLib.o | grep debug_line_str [19] .debug_line_str PROGBITS 0000000000000000 01299d 0001dd 01 MS 0 0 1 Note the size of the .debug_line_str section: 0x1dd bytes in this case. Now check the location of the .LASF1 symbol: readelf --wide --symbols tclStubLib.o | grep ".LASF1$" 75: 00000000000001a3 0 NOTYPE LOCAL DEFAULT 19 .LASF1 Its value is 0x1a3, well inside section 19, so all is well. Finally check the relocations that reference this symbol: readelf --wide --relocs tclStubLib.o | grep ".LASF1 " 000000000000001b 0000004b00000001 R_RISCV_32 00000000000001a3 .LASF1 + 0 0000000000000022 0000004b00000001 R_RISCV_32 00000000000001a3 .LASF1 + 0 These are OK too. Now run debugedit on the file. (I suggest making a backup of it first since debugedit overwrites the input file). debugedit -b ~/rpmbuild/BUILD/tcl-9.0.2-build/tcl9.0.2 \ -d /usr/src/debug/tcl-9.0.2-1.fc43.riscv64 \ -l /tmp/find-debuginfo.1lDPj7/debugsources.2 \ `pwd`/tclStubLib.o (I am not sure if the parameter values are significant. They are just the ones that are used when debugedit is invoked from inside find-debuginfo whilst processing the libtclstub.a static archive). Now check the .debug_line_str section: readelf --wide --sections tclStubLib.o | grep debug_line_str [19] .debug_line_str PROGBITS 0000000000000000 023715 0001a2 01 MS 0 0 1 It has changed size to 0x1a2. But the .LASF1 symbol has not moved: readelf --wide --symbols tclStubLib.o | grep ".LASF1$" 75: 00000000000001a3 0 NOTYPE LOCAL DEFAULT 19 .LASF1 So it now references a location beyond the end of the .debug_line_str section. And the relocations that use the .LASF1 symbol: readelf --wide --symbols tclStubLib.o | grep ".LASF1$" 000000000000001b 0000004b00000001 R_RISCV_32 00000000000001a3 .LASF1 - 2e 0000000000000022 0000004b00000001 R_RISCV_32 00000000000001a3 .LASF1 - 2e Now have *negative* offsets from the symbol's location. When the tclStubLib.o object file is merged with other files the linker will complain about "access beyond the end of a section" and the DWARF information that relies upon those relocations will be corrupt.
Here are the instructions for getting Fedora running under emulated RISC-V hardware: https://fedoraproject.org/wiki/Architectures/RISC-V/QEMU
I can probably setup some RISC-V environment, but in theory debugedit should work cross arch. So if you happen to have this tclStubLib.o object file available somewhere that would be useful then I can try quickly debugging this on my x86_64 setup.
But even without a RISC-V object I do see something that debugedit isn't expecting. On other arches cross-.debug section reference relocations are against the section symbol, aka the start of the .debug section. The start of the debug section symbol wouldn't change even if the section size would change, and so doesn't need to be adjusted. So debugedit doesn't contain code to adjust symbol values. We will have to figure out why on RISC-V this doesn't happen. Why is there this .LASF1 symbol? And why do the relocations use that instead of the start of section symbol?
Jason upload the object file here: https://jmontleon.fedorapeople.org/tclStubLib.o
Created attachment 16569 [details] riscv64 built
I almost forgot about this patch in debugedit fork (not sure if it was posted on mailing-lists, etc) that someone shared on Matrix too: https://src.fedoraproject.org/fork/u2fsdgvkx1/rpms/debugedit/c/120cd89d15831a32a0937562c3cda9dfb94391e5?branch=rawhide I am not sure it was for exact this problem, but it looks like it.
Claude generated this suggestion for me before I saw the patch David shared. While seeming somewhat similar my confidence in it being fully correct is low with it being AI generated and my lack of familiarity. https://gist.github.com/jmontleon/f57b7a2d282d64a4479cff094a94306c That said, I built a debugedit package with it (https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.1.fc43.riscv64.rpm) And using it I was able to successfully rebuild tcl and environment-modules mock -r fedora-43-riscv64 --init mock -r fedora-43-riscv64 --no-clean install ./debugedit-5.2-3.rvre0.1.fc43.riscv64.rpm mock -r fedora-43-riscv64 --no-clean rebuild ./tcl-9.0.2-1.rvre1.fc43.src.rpm mock -r fedora-43-riscv64 --no-clean install /var/lib/mock/fedora-43-riscv64/result/tcl-9.0.2-1.rvre1.fc43.riscv64.rpm /var/lib/mock/fedora-43-riscv64/result/tcl-devel-9.0.2-1.rvre1.fc43.riscv64.rpm mock -r fedora-43-riscv64 --no-clean rebuild ./environment-modules-5.6.1-1.fc43.src.rpm I took https://src.fedoraproject.org/fork/u2fsdgvkx1/rpms/debugedit/c/120cd89d15831a32a0937562c3cda9dfb94391e5?branch=rawhide instead after learning of it and rebuilt again (https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.2.fc43.riscv64.rpm) It also appeared to succeed running thru these steps with it: mock -r fedora-43-riscv64 --init mock -r fedora-43-riscv64 --no-clean install ./debugedit-5.2-3.rvre0.2.fc43.riscv64.rpm mock -r fedora-43-riscv64 --no-clean rebuild ./tcl-9.0.2-1.rvre1.fc43.src.rpm mock -r fedora-43-riscv64 --no-clean install /var/lib/mock/fedora-43-riscv64/result/tcl-9.0.2-1.rvre1.fc43.riscv64.rpm /var/lib/mock/fedora-43-riscv64/result/tcl-devel-9.0.2-1.rvre1.fc43.riscv64.rpm mock -r fedora-43-riscv64 --no-clean rebuild ./environment-modules-5.6.1-1.fc43.src.rpm SRPMs are also uploaded to https://jmontleon.fedorapeople.org/ https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.1.fc43.src.rpm https://jmontleon.fedorapeople.org/debugedit-5.2-3.rvre0.2.fc43.src.rpm If you are able to use mock it should do the right thing with qemu-user-static installed without having to resort to VMs, containers, etc. Albeit maybe not the most fun environment to debug in but you can install additional packages as in the examples above and get a shell in the environment with `mock -r fedora-43-riscv64 --no-clean --shell` if desired or needed. It is not super fast, but tolerable in the absence or real hardware in a small case like this.
Thanks for the binary and patch idea of updating the symtab symbol values after the .debug_str and .debug_line_str are updated for symbols that point into those pools. But I have to check why that works. On every other architecture the symbol used for debug section relocation is the zero/section symbol. So they have very few local symbols, basically none for/inside the .debug_str or .debug_line_str section. On other arches we only have to update the relocation addend. Here, with the patch, we would update both. It would be good to figure out why we are getting all these local symbols inside the .debug_str and .debug_line_str. It looks very inefficient. You seem to have a symbol and a relocation for each string. Which seems to explain why all relocations have an addend of zero. So on x86_64 you would have: Relocation section [26] '.rela.debug_line' for section [25] '.debug_line' at offset 0x96f18 contains 52 entries: Offset Type Value Addend Name 0x0000000000000022 X86_64_32 000000000000000000 +47 .debug_line_str 0x0000000000000026 X86_64_32 000000000000000000 +76 .debug_line_str 0x000000000000002a X86_64_32 000000000000000000 +82 .debug_line_str 0x000000000000002e X86_64_32 000000000000000000 +95 .debug_line_str Relocation section [17] '.rela.debug_line' for section [16] '.debug_line' at offset 0x31d48 contains 375 entries: Offset Type Value Addend Name 0x0000000000000022 RISCV_32 0x0000000000000169 +0 .LASF1 0x0000000000000026 RISCV_32 0x000000000000000b +0 .LASF1650 0x000000000000002a RISCV_32 0x000000000000019d +0 .LASF1651 0x000000000000002e RISCV_32 0x000000000000007d +0 .LASF1652 Note how all relocations on x86_64 use the same "symbol" with Value zero, and the relocation addend is the actual offset into the debug string table. But on riscv each relocation has a different symbol with a value that is the offset into the debug string table, and all the Addends are zero. The result is a much larger symtabs. And debugedit doing the wrong thing since it assumed only the relocation added, and not the symbol value, needs adjusting. It definitely seems a bug in debugedit and I'll try to fix it, Maybe by adding a symtab update pass or maybe we can do it while updating the relocations (if we adjust the symbols we shouldn't also update the addends). But it would also be good if someone looked into why riscv creates these relocation/symbol pairs for debug string tables. It seems it would be more efficient to do like other arches and just have one symbol at the start of the section against which you relocate.
Perhaps this has to do with what is discussed here? "Why is .symtab so huge on riscv?" https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036 "This is due, at least in part, to the fact that RISC-V doesn’t compute branch targets until link time, to facilitate aggressive linker relaxation. By contrast, x86 and ARMv8 assemblers compute branch targets at assembly time, so don’t need to carry around some of those symbols. Ordinarily, this only bloats the intermediate build artifacts (and static libraries), not linked executables."
(In reply to Jason Montleon from comment #9) > Perhaps this has to do with what is discussed here? > "Why is .symtab so huge on riscv?" > https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036 O, yes. That does explain things. Thanks. And the corresponding GCC bug has been closed. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107251 So for now we are stuck with it, it seems :{ There is way too much information about riscv linker relaxation here: https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation I still think this doesn't make sense for cross (non-allocated) debug section references. Especially if the target reference is an index in a string table section. It might make sense to somehow special case those, because the debug string tables can be huge so the overhead of having one extra symbol per string is also huge. But for now it seems we have to deal with it. And it explains why this is only a bug on riscv when dealing with debuginfo in ET_REL files (objects, static archives and kernel modules).
FYI: I have posted a potential patch which would cause the Risc-V assembler to adjust relocations against local symbols in mergeable sections so that they are against the section's symbol: https://sourceware.org/pipermail/binutils/2026-January/147334.html
Proposed patch/fix: https://inbox.sourceware.org/debugedit/20260121123342.137521-1-mark@klomp.org Currently testing https://builder.sourceware.org/buildbot/#/changes/108020 But doesn't have an good testcase yet.
Sharing my observations from yesterdays matrix session.. The debugedit patch fixes the issue in local testing i.e. the static libs processed by find-debuginfo are no longer corrupted. Mark suggested using elf-ellint to check the .o/.a before and after processing, which turned up to show signs of something gone wrong. e.g with instrumented debugedit-5.2-3.rvre0.fc43.riscv64 (before the fix) running extracting debuginfo from libtclstub.a + /usr/bin/find-debuginfo -j4 --strict-build-id -m -i --build-id-seed 9.0.2-1.fc43 --unique-debug-suffix -9.0.2-1.fc43.riscv64 --unique-debug-src-base tcl-9.0.2-1.fc43.riscv64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 -S debugsourcefiles.list /builddir/build/BUILD/tcl-9.0.2-build/tcl9.0.2 find-debuginfo: starting Extracting debug info from 3 files Running eu-elflint on tclStubLib.o before debugedit No errors Running eu-elflint on tclStubLib.o after debugedit section [25] '.symtab': symbol 1842 (.LASF1653): st_value out of bounds section [25] '.symtab': symbol 1845 (.LASF1656): st_value out of bounds Running eu-elflint on tclStubCall.o before debugedit No errors Running eu-elflint on tclStubCall.o after debugedit section [26] '.symtab': symbol 204 (.LASF79): st_value out of bounds section [26] '.symtab': symbol 207 (.LASF82): st_value out of bounds Running eu-elflint on tclStubLibTbl.o before debugedit No errors Running eu-elflint on tclStubLibTbl.o after debugedit section [20] '.symtab': symbol 1687 (.LASF1646): st_value out of bounds section [20] '.symtab': symbol 1693 (.LASF1652): st_value out of bounds Running eu-elflint on tclTomMathStubLib.o before debugedit No errors Running eu-elflint on tclTomMathStubLib.o after debugedit section [23] '.symtab': symbol 1817 (.LASF1733): st_value out of bounds section [23] '.symtab': symbol 1820 (.LASF1736): st_value out of bounds section [23] '.symtab': symbol 1826 (.LASF1742): st_value out of bounds Running eu-elflint on tclOOStubLib.o before debugedit No errors Running eu-elflint on tclOOStubLib.o after debugedit section [23] '.symtab': symbol 1906 (.LASF1820): st_value out of bounds section [23] '.symtab': symbol 1909 (.LASF1823): st_value out of bounds section [23] '.symtab': symbol 1913 (.LASF1827): st_value out of bounds DWARF-compressing 2 files sepdebugcrcfix: Updated 2 CRC32s, 0 CRC32s did match. Creating .debug symlinks for symlinks to ELF files Copying sources found by 'debugedit -l' to /usr/src/debug/tcl-9.0.2-1.fc43.riscv64 cpio: unix/.dtrace-temp.43961fde.c: Cannot stat: No such file or directory find-debuginfo: done Running eu-elflint on the whole libtclstub.a after processing $ eu-elflint --gnu --debug /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubLib.o): section [25] '.symtab': symbol 1842 (.LASF1653): st_value out of bounds section [25] '.symtab': symbol 1845 (.LASF1656): st_value out of bounds /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubCall.o): section [26] '.symtab': symbol 204 (.LASF79): st_value out of bounds section [26] '.symtab': symbol 207 (.LASF82): st_value out of bounds /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclStubLibTbl.o): section [20] '.symtab': symbol 1687 (.LASF1646): st_value out of bounds section [20] '.symtab': symbol 1693 (.LASF1652): st_value out of bounds /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclTomMathStubLib.o): section [23] '.symtab': symbol 1817 (.LASF1733): st_value out of bounds section [23] '.symtab': symbol 1820 (.LASF1736): st_value out of bounds section [23] '.symtab': symbol 1826 (.LASF1742): st_value out of bounds /var/lib/mock/fedora-43-riscv64/root/builddir/build/BUILD/tcl-9.0.2-build/BUILDROOT/usr/lib64/libtclstub.a(tclOOStubLib.o): section [23] '.symtab': symbol 1906 (.LASF1820): st_value out of bounds section [23] '.symtab': symbol 1909 (.LASF1823): st_value out of bounds section [23] '.symtab': symbol 1913 (.LASF1827): st_value out of bounds With the debugedit patch all these lints are 'No errors'
We landed debugedit-5.2-4.0.riscv64.fc43 with RFC patchset + performance improvement for Rawhide (dd changes, not related to this issue) yesterday. The build: https://riscv-koji.fedoraproject.org/koji/buildinfo?buildID=60319 We have rebuilt successfully <30 packages so far, most of them affected by tcl and sysprof static libraries. I suggested potentially using eu-elflint as an extra post-debugedit (or in general as the last step) in https://sourceware.org/cgit/debugedit/tree/scripts/find-debuginfo.in This could be controlled by the flag, which we could enable in Fedora/RISCV for some time, especially during rebuild of all affected libraries. Something like: --- /root/macros.orig 2026-01-13 08:22:50.104182622 +0000 +++ redhat/macros 2026-01-13 10:22:38.039646271 +0000 @@ -356,7 +356,8 @@ %_annotation_ldflags %{?_lto_cflags:%{_annotation_cflags}} # Use the remove-section option to force the find-debuginfo script # to move the annobin notes into the separate debuginfo file. -%_find_debuginfo_extra_opts %{?_annotated_build:--remove-section .gnu.build.attributes} +# Disable processing static libraries on riscv64 to avoid damaged DWARF +%_find_debuginfo_vendor_opts %{?_annotated_build:--remove-section .gnu.build.attributes} %[ "%{_target_cpu}" == "riscv64" ? "--elflint" : "" ] # Include frame pointer information by default, except on RHEL 10 and earlier # On RHEL 11, we are enabling it for now, with the possibility of revoking it ``` This should produce a fatal error if something wrong is detected by eu-elflint in any binary processed (executables, shared libraries, static libraries, or even object files; basically ELF in some form and shape). It would be nice to have have an option to provide some timing statistics from find-debuginfo. There was a dd performance issue, and we don't have number how much eu-elflint would cost. That's one of the reasons I suggested to enable it on riscv64 too. Extra check, collect some statistics too.