[PATCH 2/2] Remove x86 assembler rwlock code
Andi Kleen
andi@firstfloor.org
Mon Mar 24 22:11:00 GMT 2014
On Mon, Mar 24, 2014 at 09:43:50AM +0100, OndÅej BÃlka wrote:
> On Mon, Mar 17, 2014 at 05:01:31PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@odo.jf.intel.com>
> >
> > With the recent tuning the C version of rwlocks is basically the same
> > performance as the x86 assembler version for uncontended locks (with a
> > a few cycles near the run-to-run variability). For others it should not
> > matter anyways.
> >
> > So remove the assembler code and use the C version like other
> > architectures.
> >
> What benchmark did you used? I would be ok with this when I see data.
A simple benchmark that measured the uncontended performance.
Contended performance is not typically dominated by the actual lock
execution time.
You can see the rdlock is identical, but wrlock is ~6-9 cycles slower.
I originally spent quite some time hunting those 9 cycles, but
then I realized if I run the benchmark many times the run-to-run
variability is higher. So I don't think it's relevant.
With patch:
./obj/testrun.sh ./rwlockbench/micro
rdlock avg 104
wrlock avg 106
rdlock avg 105
wrlock avg 105
rdlock avg 104
wrlock avg 105
rdlock avg 104
...
Without:
./obj-ref/testrun.sh ./rwlockbench/micro
rdlock avg 104
wrlock avg 98
rdlock avg 104
wrlock avg 97
rdlock avg 104
wrlock avg 97
rdlock avg 104
wrlock avg 97
-Andi
More information about the Libc-alpha
mailing list