This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Benchmarking __libc_single_threaded
- From: Florian Weimer <fweimer at redhat dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "libc-alpha\@sourceware.org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, "jwakely\@redhat.com" <jwakely at redhat dot com>
- Date: Wed, 03 Jul 2019 12:15:51 +0200
- Subject: Re: Benchmarking __libc_single_threaded
- References: <VI1PR0801MB21274EB5E810155BACC3F72683F80@VI1PR0801MB2127.eurprd08.prod.outlook.com>
* Wilco Dijkstra:
> I benchmarked this on several AArch64 systems. On Cortex-A72 and Cortex-A53
> there is a 8-15% gain for the hidden variant, however on modern cores there is
> practically no difference despite an extra 6% instructions for this benchmark.
> I got stable and repeatable results in all cases.
>
>> Basically, it demonstrates the performance overhead of passing a
>> std::shared_ptr down a somewhat arbitrarily nested call chain. Only
>> single-threaded mode is benchmarked, the multi-threaded mode is quite
>> slow no matter what.
>
> Indeed, the difference of the single-threaded optimization is easily 3-4 times.
> The extra performance gain from the hidden DSO symbol is tiny in comparison
> even on older cores and only applies to DSOs. So there isn't a justification
> for the extra complexity of a per-DSO hidden symbol.
Thanks for doing the additional benchmarking.
The global symbol approach also has the advantage that we can control
the placement of the variable (along with other rarely-written
variables) in libc.so, once we find a way to tell GCC that it shouldn't
generate code that needs a copy relocation. With the hidden symbol, we
would have to pad the flag to the size of a cache line.
Thanks,
Florian