faster memset
Eric Blake
ebb9@byu.net
Sun Jun 1 19:58:00 GMT 2008
Jeff Johnston <jjohnstn <at> redhat.com> writes:
>
> Patch checked in. Thanks.
>
> * libc/machine/i386/memset.S (memset): [!__OPTIMIZE_SIZE__]:
> Pre-align pointer so unaligned stores aren't penalized. Prefer
> 8-byte over 4-byte alignment. Reduce register pressure.
I'm checking in this followup as obvious. Without this patch, x86 memset
(p,0xabcdef80,16) fills memory with 0xffff8080 rather than 0x80808080, because
cbw sign-extends rather than zero-extends. I had it in my local tree shortly
after my first email, but Jeff beat me to the original commit.
2008-05-28 Eric Blake <ebb9@byu.net>
Fix bug in previous patch.
* libc/machine/i386/memset.S (memset): Mask second arg correctly.
Index: libc/machine/i386/memset.S
===================================================================
RCS file: /cvs/src/src/newlib/libc/machine/i386/memset.S,v
retrieving revision 1.4
diff -u -p -r1.4 memset.S
--- libc/machine/i386/memset.S 26 May 2008 23:23:15 -0000 1.4
+++ libc/machine/i386/memset.S 28 May 2008 13:58:20 -0000
@@ -19,7 +19,7 @@ SYM (memset):
movl esp,ebp
pushl edi
movl 8(ebp),edi
- movl 12(ebp),eax
+ movzbl 12(ebp),eax
movl 16(ebp),ecx
cld
@@ -27,7 +27,6 @@ SYM (memset):
/* Less than 16 bytes won't benefit from the 'rep stosl' loop. */
cmpl $16,ecx
jbe .L19
- cbw
testl $7,edi
je .L10
More information about the Newlib
mailing list