This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: Optimized strcmp for POWER8/PPC64
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: Ondrej Bilka <ondra at kam dot mff dot cuni dot cz>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Wed, 07 Jan 2015 15:10:25 -0200
- Subject: Re: [PATCH] powerpc: Optimized strcmp for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <54AD48A6 dot 9000105 at linux dot vnet dot ibm dot com> <20150107163026 dot GA24290 at kam dot mff dot cuni dot cz>
On 07-01-2015 14:30, Ondrej Bilka wrote:
>> + /* For short string up to 16 bytes, load both s1 and s2 using
>> + unaligned dwords and compare. */
>> + ld r8,0(r3)
>> + ld r10,0(r4)
>> + li r9,0
>> + cmpb r7,r8,r9
>> + cmpdi cr7,r7,0
>> + mr r9,r7
>> + bne cr7,L(null_found)
>> + cmpld cr7,r8,r10
>> + bne cr7,L(different)
>> + ld r8,8(r3)
>> + ld r10,8(r4)
>> + cmpb r9,r8,r7
>> + cmpdi cr7,r9,r0
>> + bne cr7,L(null_found)
>> + cmpld cr7,r8,r10
>> + bne cr7,L(different)
>> + addi r7,r3,16
>> + addi r4,r4,16
> It makes no sense to do two separate checks which create pretty unpredictable branches.
>
> Just or these two check and look at first nonzero byte. Either they differ at that offset or both are zero and easily get result from that.
>
Which two checks are you referring exactly? The first two:
+ ld r8,0(r3)
+ ld r10,0(r4)
+ li r9,0
+ cmpb r7,r8,r9
+ cmpdi cr7,r7,0
+ mr r9,r7
+ bne cr7,L(null_found)
+ cmpld cr7,r8,r10
+ bne cr7,L(different)
First cmpb instruction is not a branch instruction (and thus has no affect on
branch prediction). Also, in this code is it has to check for NULL first
before start to check different bytes at second dword. For instance, for
strings:
source1:
0x3fffb7fbffe8: 0x01 0x18 0x2f 0x46 0x5d 0x74 0x0c 0x23
0x3fffb7fbfff0: 0x00 0x17 0xa5 0xa5 0xa5 0xa5 0xa5 0xa5
source2:
0x3fffb7f7ffe8: 0x01 0x18 0x2f 0x46 0x5d 0x74 0x0c 0x23
0x3fffb7f7fff0: 0x00 0x18 0x5a 0x5a 0x5a 0x5a 0x5a 0x5a
If it omits the first '\0' check the second check (0x3fffb7fbfff0 vs
0x3fffb7f7fff0) will indicate different string and handled by L(different)
label. And the result would be -1 (0x17-0x18), instead of intended 0.
I could add another '\0' on L(different) label, but it will be just move
the branch to elsewhere, not eliminating it.