This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Single thread lowlevellock optimization



On 20/06/2017 06:51, Szabolcs Nagy wrote:
> On 16/06/17 17:26, Szabolcs Nagy wrote:
>> Do single thread lock optimization in aarch64 libc. Atomic operations
>> hurt the performance of some single-threaded programs using stdio
>> (usually getc/putc in a loop).
>>
>> Ideally such optimization should be done at a higher level and in a
>> target independent way as in
>> https://sourceware.org/ml/libc-alpha/2017-05/msg00479.html
>> but that approach will need more discussion so do it in lowlevellocks,
>> similarly to x86, until there is consensus.
>>
>> Differences compared to the current x86_64 behaviour:
>> - The optimization is not silently applied to shared locks, in that
>> case the build fails.
>> - Unlock assumes the futex value is 0 or 1, there are no waiters to
>> wake (that would not work in single thread and libc does not use
>> such locks, to be sure lll_cond* is undefed).
>>
>> This speeds up a getchar loop about 2-4x depending on the cpu,
>> while only cause around 5-10% regression for the multi-threaded case
>> (other libc internal locks are not expected to be performance
>> critical or significantly affected by this change).
>>
>> 2017-06-16  Szabolcs Nagy  <szabolcs.nagy@arm.com>
>>
>> 	* sysdeps/unix/sysv/linux/aarch64/lowlevellock.h: New file.
>>
> 
> i'd like to commit this in this release
> (and work on the more generic solution later)
> i'm waiting for feedback for a while in case somebody
> spots some issues.

This is similar to a powerpc optimization I suggested some time ago [1]
and general idea back then is any single-thread optimizations belong 
into the specific concurrent algorithms.  Tulio again tried this sometime
later [2] and Torvald reject again for the same reasons. He also pointed
out in other thread (which I can't find a link now), that this change is
potentially disruptive in case we aim for async-safe malloc (although
this is not an issue currently and I also pointed out it).

And I tend to agree with Torvald, this change is adds arch-specific
complexity and semantics that should be platform neutral and focused on
specific algorithm, like your first proposal (and I doubt such hackery
would be accepted in musl for instance).


[1] https://sourceware.org/ml/libc-alpha/2014-04/msg00137.html
[2] https://patchwork.ozlabs.org/patch/596463/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]