This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #13065] New pthread_barrier algorithm to fulfill barrier destruction requirements.
- From: Torvald Riegel <triegel at redhat dot com>
- To: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, "Carlos O'Donell" <carlos at redhat dot com>, David Miller <davem at davemloft dot net>
- Date: Fri, 15 Jan 2016 22:08:19 +0100
- Subject: Re: [PATCH][BZ #13065] New pthread_barrier algorithm to fulfill barrier destruction requirements.
- Authentication-results: sourceware.org; auth=none
- References: <1437342755 dot 19451 dot 55 dot camel at localhost dot localdomain> <1450456968 dot 26597 dot 79 dot camel at localhost dot localdomain> <56749B06 dot 1040603 at linux dot vnet dot ibm dot com> <1450730065 dot 26597 dot 127 dot camel at localhost dot localdomain> <5679DE61 dot 2060200 at linux dot vnet dot ibm dot com> <1450872503 dot 26597 dot 147 dot camel at localhost dot localdomain> <567ABB17 dot 7070000 at linux dot vnet dot ibm dot com>
On Wed, 2015-12-23 at 09:17 -0600, Paul E. Murphy wrote:
>
> On 12/23/2015 06:08 AM, Torvald Riegel wrote:
> > On Tue, 2015-12-22 at 17:36 -0600, Paul E. Murphy wrote:
> >>
> >> On 12/21/2015 02:34 PM, Torvald Riegel wrote:
> >>> On Fri, 2015-12-18 at 17:47 -0600, Paul E. Murphy wrote:
> >>>> Otherwise, it looks good to me, and seems like a good improvement to
> >>>> have. Though, a more experienced reviewer may have more to say. This
> >>>> is a bit more complicated than its predecessor. I'll test it on PPC
> >>>> next week.
> >>>
> >>> Thanks!
> >>
> >> Tested out fine on POWER8/PPC64LE.
> >
> > Thanks for testing!
> >
> >> I was curious what the performance
> >> difference might be, so I slapped together the attached program. It
> >> showed about 25% improvement with 64 thread/64 count/100000 iter input
> >> on a 16 core machine.
> >
> > Nice. I'd hope that when adding proper spinning, we should be able to
> > improve performance / scalability further. Do you perhaps want to add
> > your test to our microbenchmarks?
> >
>
> I can, though it'll be on the backburner for a bit. It needs reworked to
> test throughput as a meaningful iteration count is likely dependent on
> the target system.
>
> Do you have any thoughts on reasonable input values for different systems?
> I.e maybe `grep -c proc /proc/cpuinfo` * 2?
As many threads as cores is probably a good measurement, as is a
one-thread test to get the single-thread overhead. More tests with
thread counts inbetween those two could be useful as well to catch
scalability issues.
I'm not sure cores*2 is a good test. We could do that to cover
oversubscribed systems (ie, threads > ("logical") cores) a bit and catch
performance pathologies there, but I don't think we should give
workloads with oversubscription a lot of weight; if one oversubscribes
and then uses barriers or locks, there will likely be performance
problems that a good barrier/lock/... can't fix completely.