[PATCH] hppa: Optimize atomic_compare_and_exchange_val_acq
John David Anglin
dave.anglin@bell.net
Thu Sep 22 21:49:00 GMT 2016
On 2016-09-22, at 4:37 PM, Carlos O'Donell wrote:
> On 09/22/2016 10:14 AM, John David Anglin wrote:
>> The attached patch replaces the conditional branch tests in
>> atomic_compare_and_exchange_val_acq with conditional instruction
>> nullification. This avoids the stalls associated with conditional
>> branches and the resulting code is shorter. There are no branches in
>> the fast path when the operation is successful.
>
> Does this really make a measurable difference? The light-weight-syscall
> is probably the most costly part of this entire operation.
I don't have any numbers but my sense is that it does slightly improve pthread related performance.
I have a kernel patch to slightly improve the LWS code as well.
This is just a guess. Removing the final hunk of C code may give better register allocation and reduce
stack requirements in the code using atomic_compare_and_exchange_val_acq.
>
> If you can show there is a measurable difference I would be willing
> to accept the removal of the deadlock looping (it becomes a SIGILL
> and you have to look at the core file).
Yes. The LWS deadlock code is not enabled in the kernel. So, we are just wasting a couple of cycles in checking
for the deadlock error code. This may have been helpful when the code was first written but I believe it should go now.
>
>> The change was intended as an optimization but tst-stack4 now passes.
>
> This is a red flag for this patch.
The patch has had a significant amount of testing and I'm sure there are no functional issues. It is important to simplify
when possible.
Why red flag an improvement in test results?
I think it also helps with glib2.0 async-splice-output-stream test but I don't have enough build data to be
absolutely certain. Fixing syscall cancellation has changed things a bit.
You can see the Debian glibc test results with the change here:
https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=hppa&ver=2.24-3%2Bb1&stamp=1474530506
>
> Any idea what changes?
No. I haven't had a chance to look at the tst-stack4 test.
>
> The tst-stack4 test creates a bunch of threads, that all create their
> own stacks, release them (placing them on the free stack list) and then
> they get reused by new threads, all during dlopen/dlsym operation which
> is growing the DTV. We try to catch a case where the DTV size is too small
> and we overflow. I don't see how that could be related to what you have here?
I can look but I think it might be better to focus on tests that still fail.
>
>> 2016-09-22 John David Anglin
>>
>> * sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Don't include
>> abort-instr.h.
>> (EFAULT): Remove conditional define.
>> (ENOSYS): Likewise.
>> (atomic_compare_and_exchange_val_acq): Use instruction nullification
>> instead of conditional branch instructions.
>
>
> --
> Cheers,
> Carlos.
Dave
--
John David Anglin dave.anglin@bell.net
More information about the Libc-alpha
mailing list