PATCH: Optimize memcmp for ia32

Jakub Jelinek jakub@redhat.com
Tue Feb 10 16:56:00 GMT 2004


On Wed, Feb 04, 2004 at 04:11:26PM -0800, H. J. Lu wrote:
> This patch optimizes memcmp for ia32. I got average speeup by around
> 400%.

If not anything else, you should certainly handle PIC vs. !PIC differently
(for !PIC you don't need to call thunk etc.).
Also, why do you need to use %ebx register when for example %eax is always
available?
Why do you need 4 separate L(Nbytes) sequences, the only difference between
them is in the last few instructions?  The bigger the routine is, the more
other instructions will be kicked out of the caches (especially for a
routine which is not the topmost in the benchmarks).
I'd say avoiding the table_32bytes table altogether, using just one of the
4 sequences (with adjusted start) and computing the jump destination in
registers shouldn't slow things down.
And if you really need the table, shouldn't it go into .rodata and not
.text?

	Jakub



More information about the Libc-alpha mailing list