This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH][BZ #13065] New pthread_barrier algorithm to fulfill barrier destruction requirements.
- From: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, "Carlos O'Donell" <carlos at redhat dot com>, David Miller <davem at davemloft dot net>
- Date: Wed, 23 Dec 2015 09:17:43 -0600
- Subject: Re: [PATCH][BZ #13065] New pthread_barrier algorithm to fulfill barrier destruction requirements.
- Authentication-results: sourceware.org; auth=none
- References: <1437342755 dot 19451 dot 55 dot camel at localhost dot localdomain> <1450456968 dot 26597 dot 79 dot camel at localhost dot localdomain> <56749B06 dot 1040603 at linux dot vnet dot ibm dot com> <1450730065 dot 26597 dot 127 dot camel at localhost dot localdomain> <5679DE61 dot 2060200 at linux dot vnet dot ibm dot com> <1450872503 dot 26597 dot 147 dot camel at localhost dot localdomain>
On 12/23/2015 06:08 AM, Torvald Riegel wrote:
> On Tue, 2015-12-22 at 17:36 -0600, Paul E. Murphy wrote:
>> On 12/21/2015 02:34 PM, Torvald Riegel wrote:
>>> On Fri, 2015-12-18 at 17:47 -0600, Paul E. Murphy wrote:
>>>> Otherwise, it looks good to me, and seems like a good improvement to
>>>> have. Though, a more experienced reviewer may have more to say. This
>>>> is a bit more complicated than its predecessor. I'll test it on PPC
>>>> next week.
>> Tested out fine on POWER8/PPC64LE.
> Thanks for testing!
>> I was curious what the performance
>> difference might be, so I slapped together the attached program. It
>> showed about 25% improvement with 64 thread/64 count/100000 iter input
>> on a 16 core machine.
> Nice. I'd hope that when adding proper spinning, we should be able to
> improve performance / scalability further. Do you perhaps want to add
> your test to our microbenchmarks?
I can, though it'll be on the backburner for a bit. It needs reworked to
test throughput as a meaningful iteration count is likely dependent on
the target system.
Do you have any thoughts on reasonable input values for different systems?
I.e maybe `grep -c proc /proc/cpuinfo` * 2?
Anyhow, I sent an RFC for the locking benchmarks I'd been working with, as
the question is applicable there too: