Performance issue with systemd-coredump and container process linking 2000 shared libraries.

Romain GEISSLER romain.geissler@amadeus.com
Thu Jun 22 08:10:55 GMT 2023


> Le 21 juin 2023 à 21:39, Mark Wielaard <mark@klomp.org> a écrit :
> 
> 
> Hi Romain,
> 
> That patch looks good. It should reduce the number of calls to
> elf_getdata_rawchunk a lot. Making it less urgent that function is
> fast. But if you could test it that would be appreciated. I'll also
> test it a bit more and will probably merge it if no issues show up
> since it does seem better than keep using the linear list.
> 
> Thanks,
> 
> Mark


Hi,

So I have done some testing, running the command:

[root@fa28b3d254fd systemd]# rm -rf /var/lib/systemd/coredump/* && journalctl --flush --rotate && journalctl --vacuum-time=1s && time cat the-core | build/systemd-coredump $$ 0 0 11 1687360485 1000000000000000 localhost && journalctl

Where "the-core" is our real core dump we had in production, with around 1700+ shared libraries loaded, and the uncompressed core dump size is 2GB+.

Without the systemd patch, without the elfutils patch.
real    3m42.308s
user    3m39.579s
sys     0m2.294s


Without the systemd patch, with the elfutils patch (3 runs, first one excluded to make sure the kernel caches what it caches):
real    0m15.543s
user    0m13.662s
sys     0m2.405s

real    0m15.976s                                     
user    0m13.832s                                     
sys     0m2.481s

real    0m15.612s
user    0m13.687s                                     
sys     0m2.470s


With the systemd patch, without the elf utils patch (3 runs, first one excluded to make sure the kernel caches what it caches):
real    0m2.994s                                      
user    0m1.104s
sys     0m2.477s

real    0m3.011s                                      
user    0m1.154s                                      
sys     0m2.447s

real    0m3.009s
user    0m1.141s                                      
sys     0m2.457s


With the systemd patch, with the elf utils patch (3 runs, first one excluded to make sure the kernel caches what it caches):
real    0m2.921s                                      
user    0m1.093s
sys     0m2.399s

real    0m2.950s
user    0m1.129s                                      
sys     0m2.401s

real    0m2.933s                                      
user    0m1.136s                                      
sys     0m2.371s


Overall we can see that both fix independently fix the performance issue (yet the systemd patch seems a bit more effective), so I guess both fixes are welcome.

Mark, do you think it's worth backporting this in CentOS Steam 9/RHEL 9 on elfutils side ? If you need a ticket, we have Red Hat case 03536980 which lead to RHBZ 2215412.

Thanks !
Romain


More information about the Elfutils-devel mailing list