This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v7 2/2] Mutex: Replace trylock by read only while spinning
- From: kemi <kemi dot wang at intel dot com>
- To: 'Carlos O'Donell' <carlos at redhat dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Florian Weimer <fweimer at redhat dot com>, Rical Jason <rj at 2c3t dot io>, Glibc alpha <libc-alpha at sourceware dot org>
- Cc: Dave Hansen <dave dot hansen at linux dot intel dot com>, "Chen, Tim C" <tim dot c dot chen at intel dot com>, "Kleen, Andi" <andi dot kleen at intel dot com>, "Huang, Ying" <ying dot huang at intel dot com>, "Lu, Aaron" <aaron dot lu at intel dot com>, "Li, Aubrey" <aubrey dot li at intel dot com>
- Date: Wed, 11 Jul 2018 13:36:07 +0800
- Subject: Re: [PATCH v7 2/2] Mutex: Replace trylock by read only while spinning
- References: <1530863409-326-1-git-send-email-kemi.wang@intel.com> <1530863409-326-2-git-send-email-kemi.wang@intel.com> <66548452-766e-7c1e-57ab-6100521056c3@redhat.com> <25017BF213203E48912DB000DE5F5E1E6B8343D2@SHSMSX101.ccr.corp.intel.com> <02492c47-0a47-3d4d-7333-bec1dd2172f2@intel.com>
On 2018年07月11日 11:17, kemi wrote:
> Hi, Carlos
>>> Do you need to do a whole system performance measurement?
>>
>> I thought the data posted here is good enough to demonstrate the effectiveness of this patch.
>> But if you insist, I will try to do something to figure it out.
>>
>
> To measure a whole system performance:
>
> We run *two* processes with locking at the same time. Each process has multiple threads, and
> each thread do the following:
> a) lock
> b) delay 1000ns in the critical section
> c) unlock
> d) delay 6000ns in the non-critical section
> in a loop for 5 seconds, and measure the total iterations for each process.
>
> Each test run 4 times with stable testing result.
>
> Test result:
> threads base(CAS) head(test and CAS)
> 28 1681975 1813090(+7.8%)
>
> 56 1801954 1890549(+4.9%)
>
> ========================cut==========================================
> I also run a *single* process and post the test result as below (each test run 4 times with
> stable testing result):
> threads base(CAS) head(test and CAS)
> 28 2262848 2274362 (+0.5%)
>
> 56 1949439 1994526 (+2.3%)
>
> This is what I got in my test, this change is not a big optimization, but should have some
> limited performance improvement. Additionally, the workload I run here is not so Macro
> (Some Micro benchmarks make the size of critical section and non-critical section extremely
^~~~~~
Sorry, a typo, reminded by Ying Huang.
s/Micro/Macro
> small), so I believe this change is helpful with some practical workload.
>
>