This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Benchmarking __libc_single_threaded

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Florian Weimer <fweimer at redhat dot com>
Cc: nd <nd at arm dot com>, "jwakely at redhat dot com" <jwakely at redhat dot com>
Date: Tue, 2 Jul 2019 17:06:33 +0000
Subject: Re: Benchmarking __libc_single_threaded

Hi Florian,

I benchmarked this on several AArch64 systems. On Cortex-A72 and Cortex-A53
there is a 8-15% gain for the hidden variant, however on modern cores there is
practically no difference despite an extra 6% instructions for this benchmark.
I got stable and repeatable results in all cases.

> Basically, it demonstrates the performance overhead of passing a
> std::shared_ptr down a somewhat arbitrarily nested call chain.  Only
> single-threaded mode is benchmarked, the multi-threaded mode is quite
> slow no matter what.

Indeed, the difference of the single-threaded optimization is easily 3-4 times.
The extra performance gain from the hidden DSO symbol is tiny in comparison
even on older cores and only applies to DSOs. So there isn't a justification
for the extra complexity of a per-DSO hidden symbol.

Wilco

Follow-Ups:
- Re: Benchmarking __libc_single_threaded
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]