This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] Single threaded stdio optimization
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Fri, 30 Jun 2017 17:00:20 +0000
- Subject: Re: [PATCH v2] Single threaded stdio optimization
- Authentication-results: sourceware.org; auth=none
- Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
- Nodisclaimer: True
- References: <AM5PR0802MB26101886575C9707508B1CFC83D30@AM5PR0802MB2610.eurprd08.prod.outlook.com>,<1498838603.11227.24.camel@redhat.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Torvald Riegel wrote:
> I have always argued that we should do this kind of optimization in the
> clients, so at the higher levels. So we are in agreement here :)
Yes it is always best to do these optimizations at the highest possible level -
the performance gains for getc/putc show that clearly.
Note there are also optimizations that can be done on the locking primitives,
for example before Szabolc's patch, we execute this always before even trying
the lock:
10: b9400000 ldr w0, [x0]
14: 37780260 tbnz w0, #15, 60 <_IO_getc+0x60>
18: f9404660 ldr x0, [x19,#136]
1c: d53bd054 mrs x20, tpidr_el0
20: d11bc294 sub x20, x20, #0x6f0
24: f9400401 ldr x1, [x0,#8]
28: eb14003f cmp x1, x20
2c: 54000140 b.eq 54 <_IO_getc+0x54>
This already has a path that bypasses the locking completely (some files don't need
locks), so this could be merged with the new single-threaded check.
It also checks for a recursive lock first, however assuming this is very rare, trying
the lock first would be faster. Overall a lot of the code bloat is due to having to deal
with possible recursion (and it seems these paths are not marked as unlikely).
All this would be trivial to improve if only the locking code was written in a maintainable
form - the assembly code much easier to understand than the source code...
Wilco