Hello, When applied on big packages (e.g. libreoffice), dwz takes a very long time, while this could be parallelized. Of course the inter-ELF factorization would be difficult to parallelize, but at least runs without the -m option, and even with the -m option the first step that deduplicates in each ELF separately, could be parallelized probably quite easily. Samuel
Created attachment 13297 [details] Demonstrator patch This demonstrator patch implements a simple form of multithreading, which only works without: - multifile (-m) - hardlink (-h) - low-mem limit 0 (-l0) If a file hits the low-mem limit during the parallel phase, it's rerun in low-mem mode after the parallel phase. It passes the test-suite. There is only one thread-sanitizer warning left, for multiple assignment of dwz_oom to obstack_alloc_failed_handler. I did a build of the libreoffice package on openSUSE with dwz disabled, harvested the resulting .debug files (in total 175 files, 685MB), and did a dwz run (without multifile) using those files. With master: ... maxmem: 714956 real: 17.77 user: 15.76 system: 0.50 ... With the patch on top of master: ... maxmem: 1106516 real: 10.37 user: 20.59 system: 1.46 ... So, the trade off is as expected: faster realtime, but higher peak memory. DWZ though contains the low-mem mode to keep memory usage in check, such that dwz can be used on 32-bit systems, with still relatively large files. So the trade off on those systems may not be advantageous. We could fix this by not enabling parallel processing on such systems. OTOH, we could also spawn processes instead of threads. That means the per-process peak memory does not increase. It would also mean less messy code changes (not having to use __thread all over the place). An initial version that wouldn't deal with multifile (like this demonstrator patch) wouldn't need much changes. A version that would support multifile would need a switch to indicate the location of the dwz.debug_info etc files. So, something like: ... $ dwz -m 3 1 2 create temp dir /tmp/abcdef spawn dwz 1 --multifile-dir /tmp/abcdef spawn dwz 2 --multifile-dir /tmp/abcdef wait for 2 spawned processes to finish ... spawned dwz 1 - compressing spawned dwz 2 - compressing spawned dwz 1 - multifile write (using dir /tmp/abcdef) spawned dwz 2 - multifile write (using dir /tmp/abcdef) spawned dwz 1 - done spawned dwz 2 - done waiting done multifile optimize (using files in /tmp/abcdef) multifile read multifile finalize 1 multifile finalize 2 ...
Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html
(In reply to Tom de Vries from comment #2) > Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html And committed at https://sourceware.org/git/?p=dwz.git;a=commit;h=7755593c86b701547ec276320533efc3e4c165f3 . Note that this still does not apply when multifile is used.
For multifile, perhaps each fork could fill in its own set of multifiles and then they'd be merged together before being processed. But we need to ensure reproduceability, so the order in which the multifile chunks from different programs/shared libraries are merged back needs to be independent on the number of forks.
(In reply to Jakub Jelinek from comment #4) > For multifile, perhaps each fork could fill in its own set of multifiles and > then they'd be merged together before being processed. > But we need to ensure reproduceability, so the order in which the multifile > chunks from different programs/shared libraries are merged back needs to be > independent on the number of forks. I've posted a first parallel+multifile implementation, that does not yet have reproduceability (though it does have reproducible compression AFAIU): https://sourceware.org/pipermail/dwz/2021q1/001197.html .
(In reply to Tom de Vries from comment #3) > (In reply to Tom de Vries from comment #2) > > Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html > > And committed at > https://sourceware.org/git/?p=dwz.git;a=commit; > h=7755593c86b701547ec276320533efc3e4c165f3 . > > Note that this still does not apply when multifile is used. And committed: https://sourceware.org/git/?p=dwz.git;a=commit;h=64ea1adcda52d22f00f17e219bc8e023b62b9a03 . Now -j works for multifile as well, provided -e and -p are used.
Created attachment 13362 [details] Demonstator source file using seperate reaper/coordinator (In reply to Tom de Vries from comment #6) > Now -j works for multifile as well, provided -e and -p are used. For the last step, to make multifile work with -j without -e/-p, the communication scheme needs to be more elaborate. The parent needs to both: - reap the children - communicate with the children about the multifile It cannot do both tasks in blocking fashion. It could do them in a non-blocking fashion, but then you have busy wait, which is bad. The solution I came up with is to have the parent spawn a seperate process, the coordinator. Then the job of the parent is to reap children. The job of the coordinator is to communicate with the children about the multifile: the children request permission to contribute to the multifile, with a certain type endian/pointer-size. The coordinator replies back whether and when that's ok. When the parent reaps a child, it notifies the coordinator to ensure that the coordinator is not stuck on waiting for a request from that child.