This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 1/25] Remove nested functions: crypt/md5-crypt.c
- From: Konstantin Serebryany <konstantin dot s dot serebryany at gmail dot com>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: OndÅej BÃlka <neleai at seznam dot cz>, "Joseph S. Myers" <joseph at codesourcery dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 20 May 2014 19:45:39 +0400
- Subject: Re: [PATCH 1/25] Remove nested functions: crypt/md5-crypt.c
- Authentication-results: sourceware.org; auth=none
- References: <CAGQ9bdzqT1EyXYMwACrHpPU=vPjM_b72LJRjb7BW_OzJRXG3bw at mail dot gmail dot com> <20140520131314 dot GB14500 at spoyarek dot pnq dot redhat dot com> <CAGQ9bdw=-pzbi00gn5t_W8pXBjU0edFvLGMNUc+o=g9og=f9+Q at mail dot gmail dot com> <20140520144605 dot GE14500 at spoyarek dot pnq dot redhat dot com> <Pine dot LNX dot 4 dot 64 dot 1405201453390 dot 29858 at digraph dot polyomino dot org dot uk> <20140520152031 dot GA13093 at domone dot podge> <20140520152722 dot GG14500 at spoyarek dot pnq dot redhat dot com>
On Tue, May 20, 2014 at 7:27 PM, Siddhesh Poyarekar <firstname.lastname@example.org> wrote:
> On Tue, May 20, 2014 at 05:20:31PM +0200, OndÅej BÃlka wrote:
>> Yes I wanted to write that benchmarking this change is probably
>> pointless as gcc should optimize them in same way (if there is
>> consistently difference in one way we could report that as gcc regression.)
>> Also perf data show that its a cold function so you should improve
>> readability not performance (measuring difference is possible but you
>> need to run that for few days and wait until standard deviation drops
>> sufficiently to get statistically significant result).
>> About only way this could make difference is if when it gets inlined in
>> one way and does not in other.
>> That is out of scope of microbenchmarks, instruction cache misses easily
>> could make a function slower in wild but you would measure that inlined
>> function is faster when you would call it in tight loop.
> Fair enough. I'll push the patch if Konstantin shows that the
> generated code is not drastically different.
Just a data point; I am not sure if it proves anything.
If I add __attribute__ ((noinline)) to the definition of b64_from_24bit,
the two binaries become more similar.
The non-nested variant is a bit longer as it needs 6 instructions to
pass the parameters:
15a9: 0f b6 8d 87 fe ff ff movzbl -0x179(%rbp),%ecx
15b0: 0f b6 95 81 fe ff ff movzbl -0x17f(%rbp),%edx
15b7: 41 b9 04 00 00 00 mov $0x4,%r9d
15bd: 44 0f b6 85 8d fe ff movzbl -0x173(%rbp),%r8d
15c5: 48 89 de mov %rbx,%rsi
15c8: 4c 89 f7 mov %r14,%rdi
15cb: e8 40 fb ff ff callq 1110 <b64_from_24bit>
The nested variant needs 5:
15bf: 0f b6 95 7e fe ff ff movzbl -0x182(%rbp),%edx
15c6: 0f b6 b5 78 fe ff ff movzbl -0x188(%rbp),%esi
15cd: 49 89 da mov %rbx,%r10
15d0: 0f b6 bd 72 fe ff ff movzbl -0x18e(%rbp),%edi
15d7: b9 04 00 00 00 mov $0x4,%ecx
15dc: e8 2f fb ff ff callq 1110
With inlining (that happens for me only in the non-nested variant)
this is irrelevant.