Re: faster memset

Jeff Johnston <jjohnstn <at>> writes:

> Patch checked in.  Thanks.

> 	* libc/machine/i386/memset.S (memset): [!__OPTIMIZE_SIZE__]:
> 	Pre-align pointer so unaligned stores aren't penalized.  Prefer
> 	8-byte over 4-byte alignment.  Reduce register pressure.

I'm checking in this followup as obvious.  Without this patch, x86 memset
(p,0xabcdef80,16) fills memory with 0xffff8080 rather than 0x80808080, because 
cbw sign-extends rather than zero-extends.  I had it in my local tree shortly 
after my first email, but Jeff beat me to the original commit.

2008-05-28  Eric Blake  <>

	Fix bug in previous patch.
	* libc/machine/i386/memset.S (memset): Mask second arg correctly.

Index: libc/machine/i386/memset.S
RCS file: /cvs/src/src/newlib/libc/machine/i386/memset.S,v
retrieving revision 1.4
diff -u -p -r1.4 memset.S
--- libc/machine/i386/memset.S	26 May 2008 23:23:15 -0000	1.4
+++ libc/machine/i386/memset.S	28 May 2008 13:58:20 -0000
@@ -19,7 +19,7 @@ SYM (memset):
 	movl esp,ebp
 	pushl edi
 	movl 8(ebp),edi
-	movl 12(ebp),eax
+	movzbl 12(ebp),eax
 	movl 16(ebp),ecx
@@ -27,7 +27,6 @@ SYM (memset):
 /* Less than 16 bytes won't benefit from the 'rep stosl' loop.  */
 	cmpl $16,ecx
 	jbe .L19
-	cbw
 	testl $7,edi
 	je .L10

