Created attachment 13718 [details] perf annotation of riscv_relax_delete_bytes function After getting perf annotate working for the RISCV (https://www.spinics.net/lists/linux-perf-users/msg14852.html) I collected perf data on systemtap being built on a prototype beagle v board to see where it was spending time. I noticed that the build was spending a very large amount of time (>90%) in the code of binutils implementing the relaxation (https://www.sifive.com/blog/all-aboard-part-3-linker-relaxation-in-riscv-toolchain). This binutils appears to be a Fedora 33 build of the stock binutils-2.35-18.fc33.riscv64 (http://fedora.riscv.rocks/koji/buildinfo?buildID=192459 ). Below is beginning of the "perf report --stdio" output: # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 5M of event 'task-clock:u' # Event count (approx.): 1330053000000 # # Overhead Command Shared Object Symbol # ........ ............... ......................................... ........................................................ # 94.46% ld libbfd-2.35-18.fc33.so [.] riscv_relax_delete_bytes 1.27% ld libc-2.32.so [.] _wordcopy_fwd_dest_aligned 0.81% ld libbfd-2.35-18.fc33.so [.] _bfd_riscv_relax_section.lto_priv.0 0.18% ld libbfd-2.35-18.fc33.so [.] elf_sort_elf_symbol 0.17% ld libbfd-2.35-18.fc33.so [.] bfd_elf_final_link 0.16% ld ld.bfd [.] sha1_process_block For more detail on riscv_relax_delete_bytes ran and added as an attachment: perf annotate riscv_relax_delete_bytes --stdio > ~/riscv_relax_delete_bytes.log
This is a known problem. The linker relaxation is O(m*n) where m is the number of relocations and n is the number of symbols. So it can be very slow. Some large glibc testcases take an hour to link on an Unmatched because of linker relaxation. It needs to be rewritten to handle this better. Some other targets handle this much better. The microblaze port for instance collects relaxation changes into a table, the number of bytes to delete at what address. Then when it gets to the end of a section, it does all of the deletions and symbol value updates for the section. The RISC-V port meanwhile does the deletions and symbol value updates for each relocation as we process it, so we are moving section bytes and scanning the symbol table once for each relocation. Whereas the microblaze port only does it once per section. Fixing the RISC-V port to be more like the microblaze port is a major project. The problem has been known for the ~4 years since I started doing RISC-V work, and perhaps even longer, but we have yet to find a volunteer to do the work.
This is one of my TODOs, I used to have internal patches and try to improve this, but without well tested. The thing we need to do if probably like what we did for pcgp relaxations. But sometimes we can get more relax chances if we modifying symbol table and relocation once we delete for each relaxation pattern, I think it is a trade off. However, I do feel that the link time has become slower and slower, so probably it is time to raise the priority of this project. Anyways, before I starting this project, please feel free to take over it when anyone is interested in this issue.