This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Single threaded stdio optimization

From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
To: Carlos O'Donell <carlos at redhat dot com>, Siddhesh Poyarekar <siddhesh at gotplt dot org>, Joseph Myers <joseph at codesourcery dot com>
Cc: nd at arm dot com, GNU C Library <libc-alpha at sourceware dot org>, "triegel at redhat dot com" <triegel at redhat dot com>
Date: Fri, 30 Jun 2017 13:15:55 +0100
Subject: Re: [PATCH v2] Single threaded stdio optimization
Authentication-results: sourceware.org; auth=none
Authentication-results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=arm.com;
Nodisclaimer: True
References: <594AA0A4.7010600@arm.com> <alpine.DEB.2.20.1706211651110.3924@digraph.polyomino.org.uk> <594B92ED.6060809@arm.com> <2dc4477c-c349-86b1-f8a5-95d69c397f24@gotplt.org> <98711966-b1c2-1234-ebee-301e5102aa46@gotplt.org> <8e7e9ce1-4f50-217e-b99f-e640fcca5490@redhat.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 29/06/17 13:12, Carlos O'Donell wrote:
> On 06/29/2017 08:01 AM, Siddhesh Poyarekar wrote:
>> On Thursday 29 June 2017 05:11 PM, Siddhesh Poyarekar wrote:
>>> The patch looks OK except for the duplication (and a missing comment
>>> below), which looks a bit clumsy.  How about something like this instead:
>>>
>>>   bool need_lock = _IO_need_lock (fp);
>>>
>>>   if (need_lock)
>>>     _IO_flockfile (fp);
>>>   result = _IO_ferror_unlocked (fp);
>>>   if (need_lock)
>>>     _IO_funlockfile (fp);
>>>
>>>   return result;
>>>
>>> You could probably make some kind of a macro out of this, I haven't
>>> looked that hard.
>>
>> I forgot that Torvald had commented (off-list, the thread broke somehow)
>> that it would be important to try and measure how much worse this makes
>> the multi-threaded case worse.
> 
> +1
> 
> If we are going to optimize the single threaded case we need to know what
> impact this has on the multi-threaded case.
> 

$orig == current
$stdio == my patch
$stdio_mt == my patch but 'needs lock' flag is set so multithread path is taken

on two particular aarch64 cpus with a particular loop count:

cpu1
time $orig/lib64/ld-2.25.90.so --library-path $orig/lib64 ./getchar
8.08user 0.04system 0:08.12elapsed 100%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps
time $stdio/lib64/ld-2.25.90.so --library-path $stdio/lib64 ./getchar
1.07user 0.04system 0:01.11elapsed 99%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps
time $stdio_mt/lib64/ld-2.25.90.so --library-path $stdio_mt/lib64 ./getchar
7.87user 0.00system 0:07.88elapsed 99%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps

cpu2
time $orig/lib64/ld-2.25.90.so --library-path $orig/lib64 ./getchar
8.11user 0.04system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps
time $stdio/lib64/ld-2.25.90.so --library-path $stdio/lib64 ./getchar
2.29user 0.06system 0:02.35elapsed 99%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps
time $stdio_mt/lib64/ld-2.25.90.so --library-path $stdio_mt/lib64 ./getchar
8.12user 0.03system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+40minor)pagefaults 0swaps

on a particular x86_64 cpu with particular loop count:

time $orig/lib64/ld-2.25.90.so --library-path $orig/lib64 ./getchar
5.89user 0.07system 0:05.98elapsed 99%CPU (0avgtext+0avgdata 2000maxresident)k
0inputs+0outputs (0major+153minor)pagefaults 0swaps
time $stdio/lib64/ld-2.25.90.so --library-path $stdio/lib64 ./getchar
2.66user 0.06system 0:02.73elapsed 99%CPU (0avgtext+0avgdata 2032maxresident)k
0inputs+0outputs (0major+155minor)pagefaults 0swaps
time $stdio_mt/lib64/ld-2.25.90.so --library-path $stdio_mt/lib64 ./getchar
6.76user 0.08system 0:06.87elapsed 99%CPU (0avgtext+0avgdata 2032maxresident)k
0inputs+0outputs (0major+155minor)pagefaults 0swaps

in summary: on aarch64 i see no regression (in some case stdio_mt
become faster, can happen since the layout of the code changed)
on this particular x86 cpu stdio_mt has a close to 15% regression.

i don't believe the big regression on x86 is valid, it could
be that the benchmark just got past some cpu internal limit
or the code got aligned differently, in fact if i static link
the exact same code then on the same cpu i get

time ./getchar_static-orig
6.60user 0.05system 0:06.66elapsed 99%CPU (0avgtext+0avgdata 912maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
time ./getchar_static-stdio
2.24user 0.08system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 896maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
time ./getchar_static-stdio_mt
6.50user 0.06system 0:06.57elapsed 99%CPU (0avgtext+0avgdata 896maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps

i.e. now it is faster from the branch! (both measurements
are repeatable)

i didn't dig into the root cause of the regression (or
why is static linking slower?), i would not be too
worried about it since the common case for hot stdio
loops is in single thread processes where even on x86
the patch gives >2x speedup.

Follow-Ups:
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Carlos O'Donell
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Torvald Riegel

References:
- [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Joseph Myers
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Szabolcs Nagy
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Siddhesh Poyarekar
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Siddhesh Poyarekar
- Re: [PATCH v2] Single threaded stdio optimization
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]