I am trying to use elfutils to report a core file. Unfortunately elfutils doesn't properly process all segments that appear in the core file which not only causes it to incorrectly report them when using the relevant APIs but also fails to resolve symbols or unwind where addressed are located in the incorrectly missing modules. After debugging the problem I found that the culprit is this code in dwfl_segment_report_module.c: if (r_debug_info != NULL) for (const struct r_debug_info_module *module = r_debug_info->module; module != NULL; module = module->next) if (module_start <= module->l_ld && module->l_ld < module_end) { /* L_LD read from link map must be right while DYN_VADDR is unsafe. Therefore subtract DYN_VADDR and add L_LD to get a possibly corrective displacement for all addresses computed so far. */ GElf_Addr fixup = module->l_ld - dyn_vaddr; if ((fixup & (dwfl->segment_align - 1)) == 0 && module_start + fixup <= module->l_ld && module->l_ld < module_end + fixup) { module_start += fixup; module_end += fixup; dyn_vaddr += fixup; bias += fixup; if (module->name[0] != '\0') { name = basename (module->name); name_is_final = true; } break; } } if (r_debug_info != NULL) { bool skip_this_module = false; for (struct r_debug_info_module *module = r_debug_info->module; module != NULL; module = module->next) if ((module_end > module->start && module_start < module->end) || dyn_vaddr == module->l_ld) { if (module->elf != NULL && invalid_elf (module->elf, module->disk_file_has_build_id, build_id, build_id_len)) { elf_end (module->elf); close (module->fd); module->elf = NULL; module->fd = -1; } if (module->elf != NULL) { /* Ignore this found module if it would conflict in address space with any already existing module of DWFL. */ skip_this_module = true; } } if (skip_this_module) { free (build_id); return finish (); } } This code removes all modules that "collide" with the given segments being analyzed in this function. The problem is that the value of module_end is incorrectly calculated for shared objects that are not contiguous in memory, which causes this code to believe that all modules between the **start of the first segment** and the **end of the last segment** are invalid, and their module->elf pointer is closed. For instance, here is an example in Red Hat Enterprise Linux Server release 7.9: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.9 (Maipo) uname -a Linux dev-10-98-113-219 3.10.0-1160.99.1.el7.x86_64 #1 SMP Thu Aug 10 10:46:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux # lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.9 (Maipo) Release: 7.9 Codename: Maipo # cat /proc/self/maps 00400000-0040b000 r-xp 00000000 fd:01 12760634 /usr/bin/cat 0060b000-0060c000 r--p 0000b000 fd:01 12760634 /usr/bin/cat 0060c000-0060d000 rw-p 0000c000 fd:01 12760634 /usr/bin/cat 0176b000-0178c000 rw-p 00000000 00:00 0 [heap] 7f192309b000-7f19295de000 r--p 00000000 fd:01 4300838 /usr/lib/locale/locale-archive 7f19295de000-7f19297a2000 r-xp 00000000 fd:01 8195 /usr/lib64/libc-2.17.so 7f19297a2000-7f19299a1000 ---p 001c4000 fd:01 8195 /usr/lib64/libc-2.17.so 7f19299a1000-7f19299a5000 r--p 001c3000 fd:01 8195 /usr/lib64/libc-2.17.so 7f19299a5000-7f19299a7000 rw-p 001c7000 fd:01 8195 /usr/lib64/libc-2.17.so 7f19299a7000-7f19299ac000 rw-p 00000000 00:00 0 7f19299ac000-7f19299ce000 r-xp 00000000 fd:01 8188 /usr/lib64/ld-2.17.so 7f1929bc2000-7f1929bc5000 rw-p 00000000 00:00 0 7f1929bcc000-7f1929bcd000 rw-p 00000000 00:00 0 7f1929bcd000-7f1929bce000 r--p 00021000 fd:01 8188 /usr/lib64/ld-2.17.so 7f1929bce000-7f1929bcf000 rw-p 00022000 fd:01 8188 /usr/lib64/ld-2.17.so 7f1929bcf000-7f1929bd0000 rw-p 00000000 00:00 0 7ffe8f89c000-7ffe8f8bd000 rw-p 00000000 00:00 0 [stack] 7ffe8f9c5000-7ffe8f9c7000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Notice that the /usr/lib64/ld-2.17.so shared object has splitted segments. This is just a small example, but for example when using Python it can be seen that several other shared objects are loaded in the middle: cat /proc/20119/maps 00400000-00401000 r--p 00000000 fd:01 33583914 /opt/bb/bin/python3.10 00401000-00402000 r-xp 00001000 fd:01 33583914 /opt/bb/bin/python3.10 00402000-00403000 r--p 00002000 fd:01 33583914 /opt/bb/bin/python3.10 00403000-00404000 r--p 00002000 fd:01 33583914 /opt/bb/bin/python3.10 00404000-00405000 rw-p 00003000 fd:01 33583914 /opt/bb/bin/python3.10 00f1f000-0104c000 rw-p 00000000 00:00 0 [heap] 7f71d8b84000-7f71d8c84000 rw-p 00000000 00:00 0 7f71d8c84000-7f71d8c90000 r-xp 00000000 fd:01 8213 /usr/lib64/libnss_files-2.17.so 7f71d8c90000-7f71d8e8f000 ---p 0000c000 fd:01 8213 /usr/lib64/libnss_files-2.17.so 7f71d8e8f000-7f71d8e90000 r--p 0000b000 fd:01 8213 /usr/lib64/libnss_files-2.17.so 7f71d8e90000-7f71d8e91000 rw-p 0000c000 fd:01 8213 /usr/lib64/libnss_files-2.17.so 7f71d8e91000-7f71d8f97000 rw-p 00000000 00:00 0 7f71d8f97000-7f71df4da000 r--p 00000000 fd:01 4300838 /usr/lib/locale/locale-archive 7f71df4da000-7f71df4dc000 r-xp 00000000 fd:01 8180 /usr/lib64/libfreebl3.so 7f71df4dc000-7f71df6db000 ---p 00002000 fd:01 8180 /usr/lib64/libfreebl3.so 7f71df6db000-7f71df6dc000 r--p 00001000 fd:01 8180 /usr/lib64/libfreebl3.so 7f71df6dc000-7f71df6dd000 rw-p 00002000 fd:01 8180 /usr/lib64/libfreebl3.so 7f71df6dd000-7f71df8a1000 r-xp 00000000 fd:01 8195 /usr/lib64/libc-2.17.so 7f71df8a1000-7f71dfaa0000 ---p 001c4000 fd:01 8195 /usr/lib64/libc-2.17.so 7f71dfaa0000-7f71dfaa4000 r--p 001c3000 fd:01 8195 /usr/lib64/libc-2.17.so 7f71dfaa4000-7f71dfaa6000 rw-p 001c7000 fd:01 8195 /usr/lib64/libc-2.17.so 7f71dfaa6000-7f71dfaab000 rw-p 00000000 00:00 0 7f71dfaab000-7f71dfac0000 r-xp 00000000 fd:01 115 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f71dfac0000-7f71dfcbf000 ---p 00015000 fd:01 115 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f71dfcbf000-7f71dfcc0000 r--p 00014000 fd:01 115 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f71dfcc0000-7f71dfcc1000 rw-p 00015000 fd:01 115 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f71dfcc1000-7f71dfdaa000 r-xp 00000000 fd:01 28512 /usr/lib64/libstdc++.so.6.0.19 7f71dfdaa000-7f71dffaa000 ---p 000e9000 fd:01 28512 /usr/lib64/libstdc++.so.6.0.19 7f71dffaa000-7f71dffb2000 r--p 000e9000 fd:01 28512 /usr/lib64/libstdc++.so.6.0.19 7f71dffb2000-7f71dffb4000 rw-p 000f1000 fd:01 28512 /usr/lib64/libstdc++.so.6.0.19 7f71dffb4000-7f71dffc9000 rw-p 00000000 00:00 0 7f71dffc9000-7f71e00ca000 r-xp 00000000 fd:01 8203 /usr/lib64/libm-2.17.so 7f71e00ca000-7f71e02c9000 ---p 00101000 fd:01 8203 /usr/lib64/libm-2.17.so 7f71e02c9000-7f71e02ca000 r--p 00100000 fd:01 8203 /usr/lib64/libm-2.17.so 7f71e02ca000-7f71e02cb000 rw-p 00101000 fd:01 8203 /usr/lib64/libm-2.17.so 7f71e02cb000-7f71e02d2000 r-xp 00000000 fd:01 8225 /usr/lib64/librt-2.17.so 7f71e02d2000-7f71e04d1000 ---p 00007000 fd:01 8225 /usr/lib64/librt-2.17.so 7f71e04d1000-7f71e04d2000 r--p 00006000 fd:01 8225 /usr/lib64/librt-2.17.so 7f71e04d2000-7f71e04d3000 rw-p 00007000 fd:01 8225 /usr/lib64/librt-2.17.so 7f71e04d3000-7f71e04d5000 r-xp 00000000 fd:01 8229 /usr/lib64/libutil-2.17.so 7f71e04d5000-7f71e06d4000 ---p 00002000 fd:01 8229 /usr/lib64/libutil-2.17.so 7f71e06d4000-7f71e06d5000 r--p 00001000 fd:01 8229 /usr/lib64/libutil-2.17.so 7f71e06d5000-7f71e06d6000 rw-p 00002000 fd:01 8229 /usr/lib64/libutil-2.17.so 7f71e06d6000-7f71e06d8000 r-xp 00000000 fd:01 8201 /usr/lib64/libdl-2.17.so 7f71e06d8000-7f71e08d8000 ---p 00002000 fd:01 8201 /usr/lib64/libdl-2.17.so 7f71e08d8000-7f71e08d9000 r--p 00002000 fd:01 8201 /usr/lib64/libdl-2.17.so 7f71e08d9000-7f71e08da000 rw-p 00003000 fd:01 8201 /usr/lib64/libdl-2.17.so 7f71e08da000-7f71e08f1000 r-xp 00000000 fd:01 8221 /usr/lib64/libpthread-2.17.so 7f71e08f1000-7f71e0af0000 ---p 00017000 fd:01 8221 /usr/lib64/libpthread-2.17.so 7f71e0af0000-7f71e0af1000 r--p 00016000 fd:01 8221 /usr/lib64/libpthread-2.17.so 7f71e0af1000-7f71e0af2000 rw-p 00017000 fd:01 8221 /usr/lib64/libpthread-2.17.so 7f71e0af2000-7f71e0af6000 rw-p 00000000 00:00 0 7f71e0af6000-7f71e0afe000 r-xp 00000000 fd:01 8199 /usr/lib64/libcrypt-2.17.so 7f71e0afe000-7f71e0cfd000 ---p 00008000 fd:01 8199 /usr/lib64/libcrypt-2.17.so 7f71e0cfd000-7f71e0cfe000 r--p 00007000 fd:01 8199 /usr/lib64/libcrypt-2.17.so 7f71e0cfe000-7f71e0cff000 rw-p 00008000 fd:01 8199 /usr/lib64/libcrypt-2.17.so 7f71e0cff000-7f71e0d2d000 rw-p 00000000 00:00 0 7f71e0d2d000-7f71e0d86000 r--p 00000000 fd:01 8783345 /opt/bb/lib64/libpython3.10.so.1.0 7f71e0d86000-7f71e0fa4000 r-xp 00059000 fd:01 8783345 /opt/bb/lib64/libpython3.10.so.1.0 7f71e0fa4000-7f71e109c000 r--p 00277000 fd:01 8783345 /opt/bb/lib64/libpython3.10.so.1.0 7f71e109c000-7f71e10a1000 r--p 0036e000 fd:01 8783345 /opt/bb/lib64/libpython3.10.so.1.0 7f71e10a1000-7f71e10d4000 rw-p 00373000 fd:01 8783345 /opt/bb/lib64/libpython3.10.so.1.0 7f71e10d4000-7f71e10da000 rw-p 00000000 00:00 0 7f71e10da000-7f71e10fc000 r-xp 00000000 fd:01 8188 /usr/lib64/ld-2.17.so 7f71e1143000-7f71e1144000 rw-p 00000000 00:00 0 7f71e1144000-7f71e1145000 r--p 00000000 fd:01 138412351 /opt/bb/lib/python3.10/lib-dynload/_opcode.cpython-310-x86_64-linux-gnu.so 7f71e1145000-7f71e1146000 r-xp 00001000 fd:01 138412351 /opt/bb/lib/python3.10/lib-dynload/_opcode.cpython-310-x86_64-linux-gnu.so 7f71e1146000-7f71e1147000 r--p 00002000 fd:01 138412351 /opt/bb/lib/python3.10/lib-dynload/_opcode.cpython-310-x86_64-linux-gnu.so 7f71e1147000-7f71e1148000 r--p 00002000 fd:01 138412351 /opt/bb/lib/python3.10/lib-dynload/_opcode.cpython-310-x86_64-linux-gnu.so 7f71e1148000-7f71e1149000 rw-p 00003000 fd:01 138412351 /opt/bb/lib/python3.10/lib-dynload/_opcode.cpython-310-x86_64-linux-gnu.so 7f71e1149000-7f71e1156000 r--p 00000000 fd:01 8574044 /opt/bb/lib64/libtinfo.so.5.9 7f71e1156000-7f71e1161000 r-xp 0000d000 fd:01 8574044 /opt/bb/lib64/libtinfo.so.5.9 7f71e1161000-7f71e116d000 r--p 00018000 fd:01 8574044 /opt/bb/lib64/libtinfo.so.5.9 7f71e116d000-7f71e1171000 r--p 00023000 fd:01 8574044 /opt/bb/lib64/libtinfo.so.5.9 7f71e1171000-7f71e1172000 rw-p 00027000 fd:01 8574044 /opt/bb/lib64/libtinfo.so.5.9 7f71e1172000-7f71e1187000 r--p 00000000 fd:01 8574314 /opt/bb/lib64/libreadline.so.6.3 7f71e1187000-7f71e11aa000 r-xp 00015000 fd:01 8574314 /opt/bb/lib64/libreadline.so.6.3 7f71e11aa000-7f71e11b3000 r--p 00038000 fd:01 8574314 /opt/bb/lib64/libreadline.so.6.3 7f71e11b3000-7f71e11b5000 r--p 00040000 fd:01 8574314 /opt/bb/lib64/libreadline.so.6.3 7f71e11b5000-7f71e11bb000 rw-p 00042000 fd:01 8574314 /opt/bb/lib64/libreadline.so.6.3 7f71e11bb000-7f71e11bd000 rw-p 00000000 00:00 0 7f71e11bd000-7f71e11c0000 r--p 00000000 fd:01 138414563 /opt/bb/lib/python3.10/lib-dynload/readline.cpython-310-x86_64-linux-gnu.so 7f71e11c0000-7f71e11c2000 r-xp 00003000 fd:01 138414563 /opt/bb/lib/python3.10/lib-dynload/readline.cpython-310-x86_64-linux-gnu.so 7f71e11c2000-7f71e11c4000 r--p 00005000 fd:01 138414563 /opt/bb/lib/python3.10/lib-dynload/readline.cpython-310-x86_64-linux-gnu.so 7f71e11c4000-7f71e11c5000 r--p 00006000 fd:01 138414563 /opt/bb/lib/python3.10/lib-dynload/readline.cpython-310-x86_64-linux-gnu.so 7f71e11c5000-7f71e11c6000 rw-p 00007000 fd:01 138414563 /opt/bb/lib/python3.10/lib-dynload/readline.cpython-310-x86_64-linux-gnu.so 7f71e11c6000-7f71e12f3000 rw-p 00000000 00:00 0 7f71e12f3000-7f71e12fa000 r--s 00000000 fd:01 8410475 /usr/lib64/gconv/gconv-modules.cache 7f71e12fa000-7f71e12fb000 rw-p 00000000 00:00 0 7f71e12fb000-7f71e12fc000 r--p 00021000 fd:01 8188 /usr/lib64/ld-2.17.so 7f71e12fc000-7f71e12fd000 rw-p 00022000 fd:01 8188 /usr/lib64/ld-2.17.so 7f71e12fd000-7f71e12fe000 rw-p 00000000 00:00 0 7fff8de2e000-7fff8de4f000 rw-p 00000000 00:00 0 [stack] 7fff8df86000-7fff8df88000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Notice that between the first map and the last two there are several other shared objects: 7f71e10da000-7f71e10fc000 r-xp 00000000 fd:01 8188 /usr/lib64/ld-2.17.so ... ... LOTS OF STUFF HERE ... 7f71e12fb000-7f71e12fc000 r--p 00021000 fd:01 8188 /usr/lib64/ld-2.17.so 7f71e12fc000-7f71e12fd000 rw-p 00022000 fd:01 8188 /usr/lib64/ld-2.17.so 7f71e12fd000-7f71e12fe000 rw-p 00000000 00:00 0 This causes the previous code to believe the /usr/lib64/ld-2.17.so segments spawns from 7f71e10da000 to 7f71e12fd000 and marks everything in between as invalid. I do not know exactly if is libc or the kernel what causes the linker to be splitted like this but **it seems to be the only thing I can find that has this behavior**. All other maps in memory are contiguous.
I believe your analysis of the code is correct. But I don't yet know what the correct fix for this is. We'll probably need to do multiple passes over the list to combine the segments that are part of the same ELF file. In theory we support split Dwfl_Modules for e.g kernel modules, which are ET_REL files where the sections are loaded individually. But I am not yet sure how that would translate to general user space executables/shared libraries (which are assumed to have just one base load address). Maybe we should ignore any segment not having the executable flag set?
Maybe a first attempt to improve the situation could be to not eagerly remove "colliding" modules but either warn or take a callback to act on them or some way to deactivate that behavior as currently there is nothing the user can do to fix this problem using public APIs.
Patch posted to elfutils-devel@ https://sourceware.org/pipermail/elfutils-devel/2023q4/006644.html
(In reply to Aaron Merey from comment #3) > Patch posted to elfutils-devel@ > > https://sourceware.org/pipermail/elfutils-devel/2023q4/006644.html I should also clarify that this patch just prevents eager removal of clashing modules as well as module skipping when segments are non-contiguous. It does not add warnings or callbacks for handling non-contiguous modules.
A fix for this bug has been merged into elfutils upstream in commit 2f38fa57942f9. I'm going to leave this bug open for now while the patch makes its way into Fedora and RHEL. Pablo if you get a chance, please let us know whether this fixes things on your end.
Hi, I can confirm that this indeed fixes the issue. Thanks to everyone involved. Fantastic work!
(In reply to Pablo Galindo Salgado from comment #6) > Hi, I can confirm that this indeed fixes the issue. Thanks to everyone > involved. Fantastic work! Thanks Pablo, glad to hear!
Fixed by this commit: commit 2f38fa57942f95a9ada35e6802df864747c81cce Author: Aaron Merey <amerey@redhat.com> Date: Tue Nov 21 08:56:44 2023 -0500 libdwfl: Correctly handle corefile non-contiguous segments It is possible for segments of different shared libaries to be interleaved in memory such that the segments of one library are located in between non-contiguous segments of another library. For example, this can be seen with firefox on RHEL 7.9 where multiple shared libraries could be mapped in between ld-2.17.so segments: [...] 7f0972082000-7f09720a4000 00000000 139264 /usr/lib64/ld-2.17.so 7f09720a4000-7f09720a5000 00000000 4096 /memfd:mozilla-ipc (deleted) 7f09720a5000-7f09720a7000 00000000 8192 /memfd:mozilla-ipc (deleted) 7f09720a7000-7f09720a9000 00000000 8192 /memfd:mozilla-ipc (deleted) 7f0972134000-7f0972136000 00000000 8192 /usr/lib64/firefox/libmozwayland.so 7f0972136000-7f0972137000 00002000 4096 /usr/lib64/firefox/libmozwayland.so 7f0972137000-7f0972138000 00003000 4096 /usr/lib64/firefox/libmozwayland.so 7f0972138000-7f0972139000 00003000 4096 /usr/lib64/firefox/libmozwayland.so 7f097213a000-7f0972147000 00000000 53248 /usr/lib64/firefox/libmozsqlite3.so 7f0972147000-7f097221e000 0000d000 880640 /usr/lib64/firefox/libmozsqlite3.so 7f097221e000-7f0972248000 000e4000 172032 /usr/lib64/firefox/libmozsqlite3.so 7f0972248000-7f0972249000 0010e000 4096 /usr/lib64/firefox/libmozsqlite3.so 7f0972249000-7f097224c000 0010e000 12288 /usr/lib64/firefox/libmozsqlite3.so 7f097224c000-7f0972250000 00111000 16384 /usr/lib64/firefox/libmozsqlite3.so 7f0972250000-7f0972253000 00000000 12288 /usr/lib64/firefox/liblgpllibs.so [...] 7f09722a3000-7f09722a4000 00021000 4096 /usr/lib64/ld-2.17.so 7f09722a4000-7f09722a5000 00022000 4096 /usr/lib64/ld-2.17.so dwfl_segment_report_module did not account for the possibility of interleaving non-contiguous segments, resulting in premature closure of modules as well as failing to report modules. Fix this by removing segment skipping in dwfl_segment_report_module. When dwfl_segment_report_module reported a module, it would return the index of the segment immediately following the end address of the current module. Since there's a chance that other modules might fall within this address range, dwfl_segment_report_module instead returns the index of the next segment. This patch also fixes premature module closure that can occur in dwfl_segment_report_module when interleaving non-contiguous segments are found. Previously modules with start and end addresses that overlap with the current segment would have their build-ids compared with the current segment's build-id. If there was a mismatch, that module would be closed. Avoid closing modules in this case when mismatching build-ids correspond to distinct modules. https://sourceware.org/bugzilla/show_bug.cgi?id=30975 Signed-off-by: Aaron Merey <amerey@redhat.com>