This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v7 2/2] Mutex: Replace trylock by read only while spinning
- From: kemi <kemi dot wang at intel dot com>
- To: 'Carlos O'Donell' <carlos at redhat dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Florian Weimer <fweimer at redhat dot com>, Rical Jason <rj at 2c3t dot io>, Glibc alpha <libc-alpha at sourceware dot org>
- Cc: Dave Hansen <dave dot hansen at linux dot intel dot com>, "Chen, Tim C" <tim dot c dot chen at intel dot com>, "Kleen, Andi" <andi dot kleen at intel dot com>, "Huang, Ying" <ying dot huang at intel dot com>, "Lu, Aaron" <aaron dot lu at intel dot com>, "Li, Aubrey" <aubrey dot li at intel dot com>
- Date: Wed, 11 Jul 2018 11:17:42 +0800
- Subject: Re: [PATCH v7 2/2] Mutex: Replace trylock by read only while spinning
- References: <1530863409-326-1-git-send-email-kemi.wang@intel.com> <1530863409-326-2-git-send-email-kemi.wang@intel.com> <66548452-766e-7c1e-57ab-6100521056c3@redhat.com> <25017BF213203E48912DB000DE5F5E1E6B8343D2@SHSMSX101.ccr.corp.intel.com>
Hi, Carlos
>> Do you need to do a whole system performance measurement?
>
> I thought the data posted here is good enough to demonstrate the effectiveness of this patch.
> But if you insist, I will try to do something to figure it out.
>
To measure a whole system performance:
We run *two* processes with locking at the same time. Each process has multiple threads, and
each thread do the following:
a) lock
b) delay 1000ns in the critical section
c) unlock
d) delay 6000ns in the non-critical section
in a loop for 5 seconds, and measure the total iterations for each process.
Each test run 4 times with stable testing result.
Test result:
threads base(CAS) head(test and CAS)
28 1681975 1813090(+7.8%)
56 1801954 1890549(+4.9%)
========================cut==========================================
I also run a *single* process and post the test result as below (each test run 4 times with
stable testing result):
threads base(CAS) head(test and CAS)
28 2262848 2274362 (+0.5%)
56 1949439 1994526 (+2.3%)
This is what I got in my test, this change is not a big optimization, but should have some
limited performance improvement. Additionally, the workload I run here is not so Macro
(Some Micro benchmarks make the size of critical section and non-critical section extremely
small), so I believe this change is helpful with some practical workload.