When doing some experiments to see what variables were visible at various statement lines in a program I found that the following command was MUCH slower on aarch64 and ppc64le than x86_64. On fc30 x86_64 the following command took less than 2 minutes to run: $ time stap -L 'process("./usr/bin/ld").statement("*@*:*")'|wc 71548 1016514 27678760 real 1m20.291s user 1m11.253s sys 0m8.840s On fc30 ppc64le with binaries generated with same options takes almost 20 minutes: # time stap -L 'process("./usr/bin/ld").statement("*@*:*")'|wc 77651 1170757 30577233 real 19m13.698s user 19m9.413s sys 0m4.103s Did a "perf record -a" and "perf report" for a portion of the run to see where time was being spent: # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 39K of event 'cycles' # Event count (approx.): 28867284968 # # Children Self Command Shared Object Symbol # ........ ........ ............... ........................ ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... # 98.71% 0.00% stap stap [.] query_module | ---query_module dwarf_query::handle_query_module dwarf_query::query_module_dwarf dwflpp::iterate_over_cus<void> query_cu dwflpp::iterate_over_srcfile_lines<void> | |--96.60%--query_srcfile_line | | | |--67.84%--query_statement | | | | | --67.84%--dwarf_query::add_probe_point | | | | | |--65.52%--dwarf_query::add_probe_point | | | | | | | --63.80%--dwfl_module_getsym_info | | | | | | | |--11.73%--0x7fff91a029ab | | | | | | | | | |--10.02%--gelf_getsymshndx | | | | | | | | | |--0.86%--0x7fff919c9d00 | | | | | | | | | --0.84%--0x7fff919c9d0c | | | | | | | |--11.67%--0x7fff91a02a0b | | | | | | | | | |--8.89%--gelf_getshdr | | | | | | | | | |--2.02%--0x7fff919c9ac0 | | | | | | | | | --0.75%--0x7fff919c9acc | | | | | | | |--5.80%--0x7fff91a029ff | | | | | | | | | --5.79%--elf_getscn | | | | | | | |--4.33%--0x7fff91a02f4b | | | | | | | | | |--3.17%--0x7fff91a175a3 | | | | | | | | | | | |--1.08%--0x7fff8fe93ae0 | | | | | | | | | | | --0.76%--0x7fff8fe93ae8 | | | | | | | | | --0.59%--0x7fff91a175b0 | | | | | | | |--2.41%--0x7fff91a02f78 | | | | | | | |--2.24%--0x7fff91a02b18 | | | | | | | |--1.58%--0x7fff91a029d4 | | | | | | | |--1.37%--0x7fff91a02b14 | | | | | | | |--1.32%--0x7fff91a029b8 | | | | | | | |--1.28%--0x7fff91a029f4 | | | | | | | |--1.13%--0x7fff91a02f3c | | | | | | | |--1.01%--0x7fff91a02a80 | | | | | | | |--0.77%--0x7fff91a02f54 | | | | | | | |--0.76%--0x7fff91a028fc | | | | | | | --0.59%--0x7fff91a02f4c | | | | | |--1.12%--00000021.plt_call.dwfl_module_getsym_info@@ELFUTILS_0.158 | | | | | --1.11%--uprobe_derived_probe::uprobe_derived_probe | | dwarf_derived_probe::dwarf_derived_probe | | | | | --0.90%--dwarf_derived_probe::saveargs | | | |--25.22%--dwflpp::die_has_pc | | | | | --24.94%--dwflpp::die_has_pc | | | | | --24.94%--dwarf_haspc | | | | | --23.93%--dwarf_ranges | | | | | |--13.41%--dwarf_highpc | | | | | | | |--7.17%--dwarf_attr | | | | | | | | | |--2.90%--0x7fff919cf717 | | | | | | | | | | | |--0.86%--0x7fff919e2248 | | | | | | | | | | | --0.61%--0x7fff919e23a8 | | | | | | | | | |--0.78%--0x7fff919cf734 | | | | | | | | | --0.55%--0x7fff919cf4a4 | | | | | | | |--3.55%--dwarf_lowpc | | | | | | | | | |--2.19%--dwarf_attr | | | | | | | | | | | --0.99%--0x7fff919cf717 | | | | | | | | | --0.94%--dwarf_formaddr | | | | | | | |--0.84%--dwarf_formaddr | | | | | | | --0.59%--dwarf_formudata | | | | | |--3.44%--dwarf_lowpc | | | | | | | |--2.33%--dwarf_attr | | | | | | | | | --1.00%--0x7fff919cf717 | | | | | | | --0.81%--dwarf_formaddr | | | | | --2.14%--dwarf_attr | | | | | --0.78%--0x7fff919cf717 | | | --3.23%--dwarf_query::filtered_all | | | |--1.96%--std::vector<base_func_info, std::allocator<base_func_info> >::_M_realloc_insert<base_func_info const&> | | | | | --1.46%--std::vector<base_func_info, std::allocator<base_func_info> >::_M_realloc_insert<base_func_info const&> | | | | | --1.45%--__memcpy_power7 | | | --1.19%--dwarf_query::filtered_all | | | --0.97%--__memcpy_power7 | --1.59%--dwflpp::collect_all_lines | |--1.03%--dwflpp::get_cu_lines_sorted_by_lineno | | | --0.52%--?? (inlined) | --0.56%--add_matching_lines_in_func (inlined)
On a fresh ppc64le Fedora 39 install and build of systemtap git checkout (83ea7cbc0fcfd9caf). The reproducer runs reasonably fast: # rpm -q kernel systemtap elfutils binutils binutils-debuginfo kernel-6.8.4-200.fc39.ppc64le systemtap-5.1-1.fc39.ppc64le elfutils-0.191-2.fc39.ppc64le binutils-2.40-14.fc39.ppc64le binutils-debuginfo-2.40-14.fc39.ppc64le # time stap -L 'process("/usr/bin/ld").statement("*@*:*")'|wc 11271 75465 2338858 real 0m11.103s user 0m5.482s sys 0m0.094s