This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Single threaded stdio optimization

From: Carlos O'Donell <carlos at redhat dot com>
To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Siddhesh Poyarekar <siddhesh at gotplt dot org>, Joseph Myers <joseph at codesourcery dot com>
Cc: nd at arm dot com, GNU C Library <libc-alpha at sourceware dot org>, "triegel at redhat dot com" <triegel at redhat dot com>
Date: Fri, 30 Jun 2017 12:07:52 -0400
Subject: Re: [PATCH v2] Single threaded stdio optimization
Authentication-results: sourceware.org; auth=none
References: <594AA0A4.7010600@arm.com> <alpine.DEB.2.20.1706211651110.3924@digraph.polyomino.org.uk> <594B92ED.6060809@arm.com> <2dc4477c-c349-86b1-f8a5-95d69c397f24@gotplt.org> <98711966-b1c2-1234-ebee-301e5102aa46@gotplt.org> <8e7e9ce1-4f50-217e-b99f-e640fcca5490@redhat.com> <595640FB.8030904@arm.com> <648bb45e-9573-3328-b575-ff68970ddc5e@redhat.com> <595662AC.7080607@arm.com>

On 06/30/2017 10:39 AM, Szabolcs Nagy wrote:
> On 30/06/17 14:16, Carlos O'Donell wrote:
>> On 06/30/2017 08:15 AM, Szabolcs Nagy wrote:
>>> i didn't dig into the root cause of the regression (or
>>> why is static linking slower?), i would not be too
>>> worried about it since the common case for hot stdio
>>> loops is in single thread processes where even on x86
>>> the patch gives >2x speedup.
>>
>> Regardless of the cause, the 15% regression on x86 MT performance
>> is serious, and I see no reason to push this into glibc 2.26. 
>> We can add it any time in 2.27, or the distros can pick it up with
>> a backport.
>>
>> I would like to see a better characterization of the regression before
>> accepting this patch.
>>
>> While I agree that common case for hot stdio loops is non-MT, there
>> are still MT cases, and 15% is a large double-digit loss.
>>
>> Have you looked at the assembly differences? What is the compiler
>> doing differently?
>>
>> When our a user asks "Why is my MT stdio 15% slower?" We owe them an
>> answer that is clear and concise.
>>
> 
> sorry the x86 measurement was bogus because only
> the high level code thought it's multithreaded, the
> lowlevellock code thought it's single threaded so
> there were no atomic ops executed in the stdio_mt case

OK.

> with atomics the orig performance is significantly
> slower so the regression relative to that is small in %.
> 
> if i create a dummy thread (to measure true mt
> behaviour, same loop count):
> 
> time $orig/lib64/ld-2.25.90.so --library-path $orig/lib64 ./getchar_mt
> 20.31user 0.11system 0:20.47elapsed 99%CPU (0avgtext+0avgdata 2416maxresident)k
> 0inputs+0outputs (0major+180minor)pagefaults 0swaps
> time $stdio/lib64/ld-2.25.90.so --library-path $stdio/lib64 ./getchar_mt
> 20.72user 0.03system 0:20.79elapsed 99%CPU (0avgtext+0avgdata 2400maxresident)k
> 0inputs+0outputs (0major+179minor)pagefaults 0swaps
> 
> the relative diff is 2% now, but notice that the
> abs diff went down too (which points to uarch issue
> in the previous measurement).

OK. This is much better.

> perf stat indicates that there are 15 vs 16 branches
> in the loop (so my patch indeed adds one branch
> but there are plenty branches already) the instruction
> count goes from 43 to 45 per loop iteration
> (flag check + branch).
> 
> in my previous measurements, how can +1 branch
> decrease the performance >10% when there are
> already >10 branches (and several other insns)
> is something the x86 uarchitects could explain.
> 
> in summary the patch trades 2% mt performance to
> 2x non-mt performance on this x86 cpu.
 
Excellent, this is exactly the analysis I was looking for, and this kind
of result is something that can make sense to our users.

I'm OK with the patch for 2.26.

-- 
Cheers,
Carlos.

References:
- [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Joseph Myers
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Siddhesh Poyarekar
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Siddhesh Poyarekar
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Carlos O'Donell
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Carlos O'Donell
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]