PATCH: Optimize memcmp for ia32

H. J. Lu hjl@lucon.org
Tue Feb 10 17:18:00 GMT 2004


On Tue, Feb 10, 2004 at 03:48:19PM +0100, Jakub Jelinek wrote:
> On Wed, Feb 04, 2004 at 04:11:26PM -0800, H. J. Lu wrote:
> > This patch optimizes memcmp for ia32. I got average speeup by around
> > 400%.
> 
> If not anything else, you should certainly handle PIC vs. !PIC differently
> (for !PIC you don't need to call thunk etc.).

I can change it.

> Also, why do you need to use %ebx register when for example %eax is always
> available?

I will take a look.

> Why do you need 4 separate L(Nbytes) sequences, the only difference between
> them is in the last few instructions?  The bigger the routine is, the more
> other instructions will be kicked out of the caches (especially for a
> routine which is not the topmost in the benchmarks).
> I'd say avoiding the table_32bytes table altogether, using just one of the
> 4 sequences (with adjusted start) and computing the jump destination in
> registers shouldn't slow things down.

The adjustement may cause the slow down. With the jump table, we don't
need to adjust anything at all for memoy block smaller than 32 bytes.
That is where the speedup comes from.

> And if you really need the table, shouldn't it go into .rodata and not
> .text?

I will do that.

BTW, I will be out of office until Feb. 23.


H.J.



More information about the Libc-alpha mailing list