This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Benchmarking __libc_single_threaded
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Florian Weimer <fweimer at redhat dot com>
- Cc: nd <nd at arm dot com>, "jwakely at redhat dot com" <jwakely at redhat dot com>
- Date: Tue, 2 Jul 2019 17:06:33 +0000
- Subject: Re: Benchmarking __libc_single_threaded
Hi Florian,
I benchmarked this on several AArch64 systems. On Cortex-A72 and Cortex-A53
there is a 8-15% gain for the hidden variant, however on modern cores there is
practically no difference despite an extra 6% instructions for this benchmark.
I got stable and repeatable results in all cases.
> Basically, it demonstrates the performance overhead of passing a
> std::shared_ptr down a somewhat arbitrarily nested call chain. Only
> single-threaded mode is benchmarked, the multi-threaded mode is quite
> slow no matter what.
Indeed, the difference of the single-threaded optimization is easily 3-4 times.
The extra performance gain from the hidden DSO symbol is tiny in comparison
even on older cores and only applies to DSOs. So there isn't a justification
for the extra complexity of a per-DSO hidden symbol.
Wilco