[PATCH] hppa: Optimize atomic_compare_and_exchange_val_acq

John David Anglin dave.anglin@bell.net
Thu Sep 22 21:49:00 GMT 2016


On 2016-09-22, at 4:37 PM, Carlos O'Donell wrote:

> On 09/22/2016 10:14 AM, John David Anglin wrote:
>> The attached patch replaces the conditional branch tests in
>> atomic_compare_and_exchange_val_acq with conditional instruction
>> nullification. This avoids the stalls associated with conditional
>> branches and the resulting code is shorter. There are no branches in
>> the fast path when the operation is successful.
> 
> Does this really make a measurable difference? The light-weight-syscall
> is probably the most costly part of this entire operation.

I don't have any numbers but my sense is that it does slightly improve pthread related performance.
I have a kernel patch to slightly improve the LWS code as well.

This is just a guess.  Removing the final hunk of C code may give better register allocation and reduce
stack requirements in the code using atomic_compare_and_exchange_val_acq.

> 
> If you can show there is a measurable difference I would be willing
> to accept the removal of the deadlock looping (it becomes a SIGILL
> and you have to look at the core file).

Yes.  The LWS deadlock code is not enabled in the kernel.  So, we are just wasting a couple of cycles in checking
for the deadlock error code.  This may have been helpful when the code was first written but I believe it should go now.

> 
>> The change was intended as an optimization but tst-stack4 now passes.
> 
> This is a red flag for this patch.

The patch has had a significant amount of testing and I'm sure there are no functional issues.  It is important to simplify
when possible.

Why red flag an improvement in test results?

I think it also helps with glib2.0 async-splice-output-stream test but I don't have enough build data to be
absolutely certain.  Fixing syscall cancellation has changed things a bit.

You can see the Debian glibc test results with the change here:
https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=hppa&ver=2.24-3%2Bb1&stamp=1474530506

> 
> Any idea what changes?

No.  I haven't had a chance to look at the tst-stack4 test.

> 
> The tst-stack4 test creates a bunch of threads, that all create their
> own stacks, release them (placing them on the free stack list) and then
> they get reused by new threads, all during dlopen/dlsym operation which
> is growing the DTV. We try to catch a case where the DTV size is too small
> and we overflow. I don't see how that could be related to what you have here?

I can look but I think it might be better to focus on tests that still fail.

> 
>> 2016-09-22  John David Anglin  
>> 
>> 	* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Don't include
>> 	abort-instr.h.
>> 	(EFAULT): Remove conditional define.
>> 	(ENOSYS): Likewise.
>> 	(atomic_compare_and_exchange_val_acq): Use instruction nullification
>> 	instead of conditional branch instructions.
> 
> 
> -- 
> Cheers,
> Carlos.


Dave
--
John David Anglin	dave.anglin@bell.net





More information about the Libc-alpha mailing list