This is the mail archive of the
mailing list for the glibc project.
Re: PATCH: Optimize memcmp for ia32
- From: Jakub Jelinek <jakub at redhat dot com>
- To: "H. J. Lu" <hjl at lucon dot org>
- Cc: GNU C Library <libc-alpha at sources dot redhat dot com>
- Date: Tue, 10 Feb 2004 15:48:19 +0100
- Subject: Re: PATCH: Optimize memcmp for ia32
- References: <20040205001126.GA24827@lucon.org>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Wed, Feb 04, 2004 at 04:11:26PM -0800, H. J. Lu wrote:
> This patch optimizes memcmp for ia32. I got average speeup by around
If not anything else, you should certainly handle PIC vs. !PIC differently
(for !PIC you don't need to call thunk etc.).
Also, why do you need to use %ebx register when for example %eax is always
Why do you need 4 separate L(Nbytes) sequences, the only difference between
them is in the last few instructions? The bigger the routine is, the more
other instructions will be kicked out of the caches (especially for a
routine which is not the topmost in the benchmarks).
I'd say avoiding the table_32bytes table altogether, using just one of the
4 sequences (with adjusted start) and computing the jump destination in
registers shouldn't slow things down.
And if you really need the table, shouldn't it go into .rodata and not