This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] faster strcmp by avoiding sse42.
- From: Richard Henderson <rth at twiddle dot net>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 08 Aug 2013 09:44:00 -1000
- Subject: Re: [RFC] faster strcmp by avoiding sse42.
- References: <20130806213033 dot GA5290 at domone dot kolej dot mff dot cuni dot cz> <20130807122803 dot GC5794 at domone dot kolej dot mff dot cuni dot cz>
On 08/07/2013 02:28 AM, OndÅej BÃlka wrote:
> .L17:
> addq $64, %rdi
> addq $64, %rsi
> .L12:
> movdqu (%rsi), %xmm4
> pcmpeqb (%rdi), %xmm4
> pminub (%rdi), %xmm4
> movdqu 16(%rsi), %xmm3
> pcmpeqb 16(%rdi), %xmm3
> pminub 16(%rdi), %xmm3
> movdqu 32(%rsi), %xmm2
> pcmpeqb 32(%rdi), %xmm2
> pminub 32(%rdi), %xmm2
> movdqu 48(%rsi), %xmm0
> pcmpeqb 48(%rdi), %xmm0
> pminub 48(%rdi), %xmm0
> pminub %xmm4, %xmm0
> pminub %xmm3, %xmm0
> pminub %xmm2, %xmm0
> pcmpeqb %xmm6, %xmm0
> pmovmskb %xmm0, %eax
> testl %eax, %eax
> je .L17
> jmp .L15
Surely you can do better by dropping the movdqu from the loop
entirely, and instead re-read from memory on the cleanup path.
r~