Bug 33723 - [RISC-V] Debug info corruption due to relocations past end of .debug_line_str
Summary: [RISC-V] Debug info corruption due to relocations past end of .debug_line_str
Status: NEW
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.45
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-12-16 10:18 UTC by Kashyap Chamarthy
Modified: 2026-01-19 11:56 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:


Attachments
bcc build log (98.04 KB, text/plain)
2026-01-05 12:37 UTC, Yanko Kaneti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kashyap Chamarthy 2025-12-16 10:18:27 UTC
Description
-----------
    
This was noticed during Fedora 43 rebuilds.  Where builds fail with:
    
    [...]
    debugedit:
    /builddir/build/BUILD/qt5-qttools-5.15.18-build/BUILDROOT/usr/bin/lupdate-qt5:
    Bad string pointer index 4006 for unit name (.debug_line_str)
    
    debugedit:
    /builddir/build/BUILD/qt5-qttools-5.15.18-build/BUILDROOT/usr/bin/qdoc:
    Bad string pointer index 4274 for unit name (.debug_line_str)
    [...]
    
    [...]
    rkward: /usr/bin/ld:
    /usr/lib64/lp64d/libc_nonshared.a(pthread_atfork.oS): access beyond
    end of merged section (341) 
    [...]
    
Details: https://forge.fedoraproject.org/riscv/planning/issues/1
Comment 1 Kashyap Chamarthy 2025-12-16 10:20:46 UTC
Root case analysis (Nick Clifton)
---------------------------------

This is Nick's analysis, I'm posting it verbatim with minor formatting
edits:
                                                                                                               
    What appears to be happening is that some RISC-V object files 
    contain "bad" relocations that are supposed to put the correct 
    offsets from places in the .debug_info section to strings in the
    .debug_line_str section. (and possibly the .debug_str section too,                                         
    although I have not confirmed this).                                                                       
                                                                                                               
    I put "bad" in quotes because the relocations do actually work, but                                        
    only when the object file is examined in isolation.  When the object                                       
    file is linked into an executable and the contents of the                                                  
    .debug_line_str section are merged with other .debug_line_str                                              
    sections in other object files, the relocations are not updated                                            
    correctly and the result is that the executable ends up with entries                                       
    in the .debug_info section that refer to bogus offsets in the                                              
    .debug_line_str section. The reason that these relocations are "bad"                                       
    is that they reference local symbols that are placed beyond the end                                        
    of the .debug_line_str section. Such symbols are not adjusted when                                         
    the .debug_line_str section is merged, which is why things break.                                          
                                                                                                               
    To proceed we need to find out if it is the assembler that is                                              
    placing these symbols beyond the end of the .debug_line_str section,                                       
    or if it is gcc telling the assembler to place them there. For that                                        
    to happen someone needs to build a problematic object file with the                                         
    `--save-temps` option added to the `gcc` command line, and then the                                         
    assembler output from gcc needs to be examined.
Comment 2 David Abdurachmanov 2025-12-17 11:36:41 UTC
My estimation is that ~200 in total packages are affected in Fedora 43 for riscv64. Probably half of them being OCaml.

We have noticed the following messages:
[..]
/usr/bin/ld: /lib64/lp64d/../lib64/lp64d/libiberty.a(cplus-dem.o): access beyond end of merged section (338)
/usr/bin/ld: /lib64/lp64d/../lib64/lp64d/libiberty.a(cplus-dem.o): access beyond end of merged section (347)
/usr/bin/ld: /lib64/lp64d/../lib64/lp64d/libiberty.a(d-demangle.o): access beyond end of merged section (318)
/usr/bin/ld: /lib64/lp64d/../lib64/lp64d/libiberty.a(d-demangle.o): access beyond end of merged section (331)
[..]

In most cases these are just warning and do not cause issue with building packages, but Nick assumes these are probably related to the next problem.

Note, these problems go away once relaxation is disabled, but does not solve the next problem AFAIK (I probably need to re-test).

We have tested ld.bfd, lld and mold in Fedora 43 and all of them produced broken DWARF5 information.

One of the smallest test cases is tcl + environment-modules. Note, that tcl has "access beyond end of merged section" warnings. environment-modules has a single C files, which links to libtclstub.a. Based on DWARF size it seems to be tclStubLib.o. So we are basically linking two object files here, and the final shared library has 2 CUs in DWARF5 dump.

It's tclStubLib.o DWARF5 information that gets a wrong pointer/offeset for the filename. The string itself is available in .debug_line_str section, but offsets are wrong in DWARF5.

Note, that all "access beyond end of merged section" messages that I seen in the logs are only from static libraries.

I have some information saved.

debugedit fails because of these offsets being wrong.

$ readelf -p .debug_line_str libtclenvmodules.so | less

String dump of section '.debug_line_str':
  [     0]  tclPlatDecls.h
  [     f]  time_t.h
  [    18]  time.h
  [    1f]  unistd.h
  [    28]  tclInt.h
  [    31]  /usr/include
  [    3e]  string.h
  [    47]  stddef.h
  [    50]  tclIntPlatDecls.h
  [    62]  tcl.h
  [    68]  pwd.h
  [    6e]  /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/generic/tclStubLib.c
  [    b4]  envmodules.c
  [    c1]  unistd-decl.h
  [    cf]  /usr/lib/gcc/riscv64-redhat-linux/15/include
  [    fc]  /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/generic
  [   135]  /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/unix
  [   16b]  string_fortified.h
  [   17e]  struct_stat.h
  [   18c]  fcntl2.h
  [   195]  types.h
  [   19d]  dirent.h
  [   1a6]  stdlib.h
  [   1af]  <built-in>
  [   1ba]  stdio2.h
  [   1c3]  /usr/include/bits
  [   1d5]  grp.h
  [   1db]  tclDecls.h
  [   1e6]  /usr/include/bits/types
  [   1fe]  tclIntDecls.h
  [   20c]  struct_timespec.h
  [   21e]  errno.h
  [   226]  /home/fedora/rpmbuild/BUILD/environment-modules-5.6.0-build/modules-5.6.0/lib
  [   274]  utime.h
  [   27c]  struct_tm.h
  [   288]  struct_FILE.h

### -fuse-ld=bfd ####

  Compilation Unit @ offset 0x94ce:
   Length:        0xb1c8 (32-bit)
   Version:       5
   Unit Type:     DW_UT_compile (1)
   Abbrev Offset: 0x57d
   Pointer Size:  8
 <0><94da>: Abbrev Number: 43 (DW_TAG_compile_unit)
    <94db>   DW_AT_producer    : (indirect string, offset: 0x67b0): GNU C23 15.2.1 20251111 (Red Hat 15.2.1-4) -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -mno-omit-leaf-frame-pointer -mno-relax -mabi=lp64d -misa-spec=20191213 -mtls-dialect=trad -march=rv64imafdc_zicsr_zifencei_zmmul_zaamo_zalrsc_zca_zcd -g -O2 -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fno-omit-frame-pointer -fno-merge-constants -fextended-identifiers -fPIC -fno-common -fno-lto -fplugin=gcc-annobin
    <94df>   DW_AT_language    : 29     (C11)
    <94e0>   Unknown AT value: 90: 3
    <94e1>   Unknown AT value: 91: 0x31647
    <94e5>   DW_AT_name        : (indirect line string, offset: 0xffffffd4): <offset is too big>
    <94e9>   DW_AT_comp_dir    : (indirect line string, offset: 0x259): /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/unix
    <94ed>   DW_AT_low_pc      : 0x124c
    <94f5>   DW_AT_high_pc     : 0x2e8
    <94fd>   DW_AT_stmt_list   : 0x1945

### -fuse-ld=lld ###
 Compilation Unit @ offset 0x94ce:
  Length:        0xb1c8 (32-bit)
  Version:       5
  Unit Type:     DW_UT_compile (1)
  Abbrev Offset: 0x57d
  Pointer Size:  8
<0><94da>: Abbrev Number: 43 (DW_TAG_compile_unit)
   <94db>   DW_AT_producer    : (indirect string, offset: 0x14de): GNU C23 15.2.1 20251111 (Red Hat 15.2.1-4) -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -mno-omit-leaf-frame-pointer -mno-relax -mabi=lp64d -misa-spec=20191213 -mtls-dialect=trad -march=rv64imafdc_zicsr_zifencei_zmmul_zaamo_zalrsc_zca_zcd -g -O2 -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fno-omit-frame-pointer -fno-merge-constants -fextended-identifiers -fPIC -fno-common -fno-lto -fplugin=gcc-annobin
   <94df>   DW_AT_language    : 29     (C11)
   <94e0>   Unknown AT value: 90: 3
   <94e1>   Unknown AT value: 91: 0x31647
   <94e5>   DW_AT_name        : (indirect line string, offset: 0xb9): g/tcl-9.0.2-1.davidlt0.fc43.riscv64/generic
   <94e9>   DW_AT_comp_dir    : (indirect line string, offset: 0x1e9): /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/unix

### -fuse-ld=mold ###
  Compilation Unit @ offset 0x94ce:
   Length:        0xb1c8 (32-bit)
   Version:       5
   Unit Type:     DW_UT_compile (1)
   Abbrev Offset: 0x57d
   Pointer Size:  8
 <0><94da>: Abbrev Number: 43 (DW_TAG_compile_unit)
    <94db>   DW_AT_producer    : (indirect string, offset: 0x888): GNU C23 15.2.1 20251111 (Red Hat 15.2.1-4) -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -mno-omit-leaf-frame-pointer -mno-relax -mabi=lp64d -misa-spec=20191213 -mtls-dialect=trad -march=rv64imafdc_zicsr_zifencei_zmmul_zaamo_zalrsc_zca_zcd -g -O2 -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fno-omit-frame-pointer -fno-merge-constants -fextended-identifiers -fPIC -fno-common -fno-lto -fplugin=gcc-annobin
    <94df>   DW_AT_language    : 29     (C11)
    <94e0>   Unknown AT value: 90: 3
    <94e1>   Unknown AT value: 91: 0x31647
    <94e5>   DW_AT_name        : (indirect line string, offset: 0xffffff6f): <offset is too big>
    <94e9>   DW_AT_comp_dir    : (indirect line string, offset: 0x135): /usr/src/debug/tcl-9.0.2-1.davidlt0.fc43.riscv64/unix
    <94ed>   DW_AT_low_pc      : 0x318c
    <94f5>   DW_AT_high_pc     : 0x2e8
    <94fd>   DW_AT_stmt_list   : 0x1945
Comment 3 Nick Clifton 2025-12-17 13:24:35 UTC
Hi David,

  OK, so in order to reproduce this problem we need to link envmodules.o with libtclstub.a and then look at the DWARF info in the resulting binary, yes ?

  If that is right, then please could you upload those two files and also detail the linker command line that you use to combine them ?

  You note that the tcl library has those "access beyond" messages when linking.  Given that, it would seem that one or more object files in that library are causing problems, and I think that it is related.  So would it be possible to capture the assembler output from gcc for of those problematic object files and upload that as well ?

Cheers
  Nick
Comment 4 Yanko Kaneti 2025-12-19 14:53:56 UTC
Sorry if I misunderstand something...
Here is the whole buildroot of libbpf with -save-temps
the static libbpf seems to have the same problem when used to build bcc

http://declera.com/~yaneti/libbpf-1.6.2-save-temps-buildroot.tar.gz

the buildroot is built with:
gcc-15.2.1-4.fc43.riscv64
binutils-2.45.1-1.fc43.riscv64
Comment 5 Nick Clifton 2026-01-05 10:48:27 UTC
Hi Yanko,

  Thanks for the upload.  So which of those files trigger a "access beyond end of merged section" error message when they are linked ?

Cheers
  Nick
Comment 6 Yanko Kaneti 2026-01-05 10:54:08 UTC
I'll give the Marcin output in the downstream fedora issue, this is from a bcc build
bcc:
/usr/bin/ld: /usr/lib64/libbpf.a(btf.o): access beyond end of merged section (603)
/usr/bin/ld: /usr/lib64/libbpf.a(btf.o): access beyond end of merged section (594)
/usr/bin/ld: /usr/lib64/libbpf.a(libbpf.o): access beyond end of merged section (750)
/usr/bin/ld: /usr/lib64/libbpf.a(ringbuf.o): access beyond end of merged section (431)
/usr/bin/ld: /usr/lib64/libbpf.a(usdt.o): access beyond end of merged section (504)
/usr/bin/ld: /usr/lib64/libbpf.a(zip.o): access beyond end of merged section (314)
/usr/bin/ld: /usr/lib64/libbpf.a(elf.o): access beyond end of merged section (329)
/usr/bin/ld: /usr/lib64/libbpf.a(btf_relocate.o): access beyond end of merged section (240)
Comment 7 Nick Clifton 2026-01-05 11:45:50 UTC
Hi Yanko,

  Thanks - one more thing please - the command line that triggers those error messages ?

Cheers
  Nick
Comment 8 Yanko Kaneti 2026-01-05 12:09:30 UTC
The first of many similar while building bcc with the same static libbpf

gcc -O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/bashreadline.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/trace_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/syscall_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/errno_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/map_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/uprobe_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/btf_helpers.o /builddir/build/BUILD/bcc-0.35.0-build/bcc-0.35.0/libbpf-tools/.output/compat.o /usr/lib64/libbpf.a -Wl,-z,relro -Wl,--as-needed   -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-hardened-ld-errors -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes   -lelf -lz -o bashreadline
/usr/bin/ld: /usr/lib64/libbpf.a(btf.o): access beyond end of merged section (603)
/usr/bin/ld: /usr/lib64/libbpf.a(btf.o): access beyond end of merged section (594)
/usr/bin/ld: /usr/lib64/libbpf.a(btf.o): access beyond end of merged section (582)
/usr/bin/ld: /usr/lib64/libbpf.a(libbpf.o): access beyond end of merged section (750)
/usr/bin/ld: /usr/lib64/libbpf.a(libbpf_probes.o): access beyond end of merged section (397)
/usr/bin/ld: /usr/lib64/libbpf.a(ringbuf.o): access beyond end of merged section (431)
/usr/bin/ld: /usr/lib64/libbpf.a(strset.o): access beyond end of merged section (245)
/usr/bin/ld: /usr/lib64/libbpf.a(usdt.o): access beyond end of merged section (490)
/usr/bin/ld: /usr/lib64/libbpf.a(usdt.o): access beyond end of merged section (504)
/usr/bin/ld: /usr/lib64/libbpf.a(zip.o): access beyond end of merged section (314)
/usr/bin/ld: /usr/lib64/libbpf.a(elf.o): access beyond end of merged section (329)
/usr/bin/ld: /usr/lib64/libbpf.a(btf_relocate.o): access beyond end of merged section (240)

I guess if needed I could instrument the bcc build with save-temps.
Comment 9 Yanko Kaneti 2026-01-05 12:37:26 UTC
Created attachment 16549 [details]
bcc build log

This is the whole build log for bcc-0.35.0-3

It fails in the debug extraction phase with 
Extracting debug info from 60 files
debugedit: /builddir/build/BUILD/bcc-0.35.0-build/BUILDROOT/usr/bin/bpf-bashreadline: Bad string pointer index 4294967237 for comp_dir (.debug_line_str)
debugedit: /builddir/build/BUILD/bcc-0.35.0-build/BUILDROOT/usr/bin/bpf-bindsnoop: Bad string pointer index 4294966857 for unit name (.debug_line_str)
debugedit: /builddir/build/BUILD/bcc-0.35.0-build/BUILDROOT/usr/bin/bpf-biolatency: Bad string pointer index 4294967220 for comp_dir (.debug_line_str)
debugedit: /builddir/build/BUILD/bcc-0.35.0-build/BUILDROOT/usr/bin/bpf-biopattern: Bad string pointer index 4294967205 for comp_dir (.debug_line_str)
Comment 10 Nick Clifton 2026-01-05 16:27:22 UTC
Hi Yanko,

  Thanks again.  Unfortunately there are too many missing files and libraries for me to be able to reproduce this problem locally.  (And I do not have access to a risc-v build machine, so I cannot reproduce it remotely).  What I really need is a small, self-contained test case that does not depend upon any libraries or system object files, but which just builds and either contains bad DWARF debug information or else triggers those warning messages from the linker about accesses beyond the end of a section.

Cheers
  Nick
Comment 11 Kashyap Chamarthy 2026-01-09 12:35:21 UTC
I was able to reproduce it in a container.

Reproducer: Pull the OCI tarball from Koji, load it, enable the Koji
repo, update the container and build 'tcl' and 'environment-modules'.

Container setup
---------------

Fetch this container image:

    https://riscv-koji.fedoraproject.org/koji/buildinfo?buildID=46488

Load the image: 

    $ podman load -i Fedora-Container-Base-Generic-43-20251027.0.riscv64.oci.tar.xz
    Getting image source signatures
    Copying blob 786d9c4e829a done   | 
    Copying config 810a05d71f done   | 
    Writing manifest to image destination
    Loaded image: localhost/fedora:43

Start the container in the background with "sleep infinity":

    $ podman run -d --name f43-riscv --platform linux/riscv64 localhost/fedora:43 sleep infinity
    e11657381444853fd91181310eb106ee1a79ff67308e7b211e89176d2656d02d

Enter the container:

    $ podman exec -it f43-riscv /bin/bash

Inside the container, enabled the Koji repo and update:

    bash-5.3# cat /etc/yum.repos.d/fedora-riscv-koji.repo 
    [fedora-riscv-koji]
    name=Fedora RISC-V $releasever - Koji
    baseurl=https://riscv-koji.fedoraproject.org/repos/f$releasever-build/latest/$basearch/
    enabled=1
    gpgcheck=0
    priority=97

    bash-5.3# dnf update -y

Exit the container and  take a snapshot:

    [fedora@fedora ~]$ podman commit f43-riscv snap-with-upgrades-v1
    Getting image source signatures
    Copying blob 0a1a836ec8fb skipped: already exists  
    Copying blob 37e6e2ec845d done   | 
    Copying config 2685571401 done   | 
    Writing manifest to image destination
    26855714013143e5734133186866ea96727d81ab6a20820d5f5c4aa0a53d6a30

Exect the container again and setup the build env:

    $ podman exec -it f43-riscv /bin/bash

    $ dnf install -y @development-tools fedora-packager \
        fedora-review binutils


Build 'tcl'
----------

    $ fedpkg clone -a tcl && cd tcl
    $ sudo dnf builddep -y tcl.spec
    $ fedpkg local

[This builds fine]

Build  'environment-modules'
----------------------------

    $ fedpkg clone -a environment-modules && cd environment-modules

    $ dnf builddep -y environment-modules.spec

    $ fedpkg local |& tee env-modules-build.txt
    [...]
    Extracting debug info from 1 files
    debugedit:
    /home/builder/src/environment-modules/environment-modules-5.6.1-build/BUILDROOT/usr/lib64/environment-modules/libtclenvmodules.so:
    Bad string pointer index 4294967065 for unit name (.debug_line_str)
    error: Bad exit status from /var/tmp/rpm-tmp.malnmQ (%install)

This is the failure.
Comment 12 Kashyap Chamarthy 2026-01-09 12:39:16 UTC
(In reply to Kashyap Chamarthy from comment #11)
> I was able to reproduce it in a container.

This container was running on a RISC-V machine (UR-DP1000)

$ head -10 /proc/cpuinfo 
processor       : 0
hart            : 3
isa             : rv64imafdch_zicntr_zicsr_zifencei_zihpm_zaamo_zalrsc_zca_zcd
mmu             : sv48
mvendorid       : 0x70a
marchid         : 0x8000000000413031
mimpid          : 0x5510000
hart isa        : rv64imafdch_zicntr_zicsr_zifencei_zihpm_zaamo_zalrsc_zca_zcd

[...]
Comment 13 Nick Clifton 2026-01-12 13:11:11 UTC
This might not actually be a binutils problem.  Maybe.

Whilst investigating the issue I came across the fact that Fedora risc-v rpm build system is post processing static archives by running them through a tool called add-determinism.

This tool appears to be corrupting the symbols and relocations in the object files inside the libraries.  Where in this case "corrupting" means that it is changing the relocations for entries into mergeable string sections so that they reference symbols placed beyond the end of the section and use negative offsets to reach the desired place inside the section.  This sort of thing might work for normal sections, but it breaks when megreable sections are combined together.

I am unfamiliar with the add-determinism tool itself, so I cannot be sure that this analysis is correct, but it is certainly worth a deeper investigation.
Comment 14 Yanko Kaneti 2026-01-12 13:30:47 UTC
FWIW testing locally while skipping the add-determinism step did NOT resolve the libbpf->bcc issue.
Comment 15 Nick Clifton 2026-01-12 15:58:40 UTC
(In reply to Yanko Kaneti from comment #14)
> FWIW testing locally while skipping the add-determinism step did NOT resolve
> the libbpf->bcc issue.

Yes, I now believe that I have wrongly accused the add-determinism program.  Sorry about that.

I have however also been able to show that something is corrupting risc-v static archives during the rpm creation process.  In particular we found that the environment-modules package fails to build with the corrupted debug info problem.  But if /usr/lib64/libtclstub.a is replaced with a pristine copy from inside the build tree of the tcl package, the build of environment-modules then works.  

So something is corrupting libtclstub.a as it is copied from the build tree into the install location.  The problem is that I have not been able to identify exactly when the corruption occurs.
Comment 16 Yanko Kaneti 2026-01-13 01:30:59 UTC
Did some more digging and I think the static libs, both in the tcl and libbpf case are broken by the recent addition of static debuginfo extraction in debugedit.

%define _find_debuginfo_opts --no-ar-files

is enough to fix libbpf-static and tcl-devel so that they are used without problems by bcc and enviroment-modules  builds
Comment 17 Nick Clifton 2026-01-13 10:28:39 UTC
(In reply to Yanko Kaneti from comment #16)
> Did some more digging and I think the static libs, both in the tcl and
> libbpf case are broken by the recent addition of static debuginfo extraction
> in debugedit.

Interesting.  

I have also tracked the problem a little further.  The corruption happens during the execution of the __brp_strip_lto macro found in the /usr/lib/rpm/redhat/macros file.  I suspect that this means that whichever strip program is being used to strip the info is the culprit, but I have yet to confirm this.
Comment 18 David Abdurachmanov 2026-01-13 10:29:27 UTC
I have done the following change:

```
--- /root/macros.orig   2026-01-13 08:22:50.104182622 +0000
+++ redhat/macros       2026-01-13 10:22:38.039646271 +0000
@@ -356,7 +356,8 @@
 %_annotation_ldflags   %{?_lto_cflags:%{_annotation_cflags}}
 # Use the remove-section option to force the find-debuginfo script
 # to move the annobin notes into the separate debuginfo file.
-%_find_debuginfo_extra_opts %{?_annotated_build:--remove-section .gnu.build.attributes}
+# Disable processing static libraries on riscv64 to avoid damaged DWARF
+%_find_debuginfo_vendor_opts %{?_annotated_build:--remove-section .gnu.build.attributes} %[ "%{_target_cpu}" == "riscv64" ? "--no-ar-files" : "" ]
 
 # Include frame pointer information by default, except on RHEL 10 and earlier
 # On RHEL 11, we are enabling it for now, with the possibility of revoking it
```

I recompiled tcl, checked execution:

```
+ /usr/bin/find-debuginfo -j4 --strict-build-id -m -i --build-id-seed 9.0.2-1.rvre1.fc43 --unique-debug-suffix -9.0.2-1.rvre1.fc43.riscv64 --unique-debug-src-base tcl-9.0.2-1.rvre1.fc43.riscv64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 --remove-section .gnu.build.attributes --no-ar-files -S debugsourcefiles.list /home/fedora/rpmbuild/BUILD/tcl-9.0.2-build/tcl9.0.2
```

Note, --remove-section .gnu.build.attributes --no-ar-files

Installed tcl, compiled environment-modules and it worked.

I also checked environment-modules build log for:

/usr/bin/ld: /usr/lib64/libtclstub.a(tclStubLib.o): access beyond end of merged section (446)
/usr/bin/ld: /usr/lib64/libtclstub.a(tclStubLib.o): access beyond end of merged section (437)

These are gone too.

This basically reverts back to F42 debugedit behavior not to process static archives (*.a) files. It seems debugedit on environment-modules build failed on something it did before for tcl static archive.
Comment 19 Yanko Kaneti 2026-01-13 10:40:16 UTC
Well, I also tested the strip hypothesis, but I didnt' find it to be the problem.

Tested debugedit last mostly because I was not really expecting that extracting debuginfo would need to reconstruct the whole static lib, which is what find-debuginfo does.

Since this does not appear on other arches, perhaps the debugedit logic is maybe sound but there is some latent riscv64 specific binutils/ar bug that was just exposed by what debugedit does with static libs.
Comment 20 David Abdurachmanov 2026-01-13 10:47:28 UTC
Jason also looked at it.

> Running find-debuginfo on libtclstub.a is breaking everything.
> swapping out the processed/unprocessed file is all it takes to break/unbreak the environment-modules build

Which step exactly breaks things is unknown, I guess.
Comment 21 David Abdurachmanov 2026-01-13 11:39:31 UTC
Finally I modified find-debuginfo again in do_ar_file function I comment out:

              debugedit -b "$debug_base_name" -d "$debug_dest_name" \
		        -l "$SOURCEFILE" "$tmpdir/$member_dn$member_bn"

and replaced with echo command.

So basically it recreates the archive file with exactly the same object files, but does not run debugedit on them.

I did dnf distro-sync to revert previous experimental tcl builds, verified that environment modules are failing again the same way. Installed a new tcl, and rebuild environment modules again. It worked.

So debugedit is doing something to the object files in the archive that later once linked in environment modules have a broken DWARF that debugedit itself is failing on.
Comment 22 Zbigniew Jędrzejewski-Szmek 2026-01-13 14:37:11 UTC
(I know that the latest findings suggest that the issue is elsewhere, but I'll paste the comment I started writing anyway. Maybe this will help in future debugging:

The code in add-det that handles .ar files is here:
https://github.com/keszybz/add-determinism/blob/main/src/add_det/handlers/ar.rs#L59

It reads the ar file magic and then proceeds to loop over the payload. The payload header is read and adjusted without changing size, to overwrite the fixed-size mtime, uid, gid fields. The size of the payload specified in the header is then used to copy the payload verbatim. This loops until the end of the file. This processing is done by copying stuff into a temporary file which then replaces the original input if processing is successful. The total size or payload contents do not change.

I was pretty careful to check all the magics in the per-file header, so if this goes wrong, I think it's most likely that it'd just refuse to process the file and report an error.)
Comment 23 Nick Clifton 2026-01-13 14:53:32 UTC
I am pretty sure now that the problem is with the debugedit program.  So sure in fact that I have filed a PR for the bug along with a step by step reproducer:

https://sourceware.org/bugzilla/show_bug.cgi?id=33789
Comment 24 Nick Clifton 2026-01-19 11:56:16 UTC
FYI - I have posted a potential patch that might work around the debugedit problem (and save some space in the symbol table):

https://sourceware.org/pipermail/binutils/2026-January/147424.html

I am leaving it up to the Risc-V binutils maintainers to decide if it is safe and worthwhile to apply.