memcpy performance (fwd)

Joel Sherrill
Tue Dec 9 10:23:00 GMT 1997

I thought I would pass this on.  Does the new version of memcpy do much
better than this?

---------- Forwarded message ----------
Date: Tue,  9 Dec 97 12:03:28 -0600
From: Eric Norum <eric@skatter.USask.Ca>
Subject: Re: memcpy performance

It's even worse than just a byte-by-byte copy!

On the 971024 snapshot (gen68360 BSP) a call to memcpy produces:
	1) A call to bcopy
	2) The bcopy routine links a stack frame and calls memmove
	3) The memmove routine:
		a) links a stack frame
		b) checks for overlap
		c) does a byte-by-byte copy
		   5 instructions/byte on a CPU32 processor!
There's a heck a of a lot of unnecessary code here:
	Two extra function calls
	Two extra stack frames
	Extra code to check for overlap
	A very inefficient loop

Processor-independent improvements required:
	1) There should be an explicit memcpy routine.
	2) The library should be compiled with aggressive optimization.
Processor-dependent improvements that would be nice:	
M68k - The loop in memmove should be done in such a way that  
processors like the CPU32 can go into loop mode.

Now all we need is a willing volunteer......

Eric Norum                       
Saskatchewan Accelerator Laboratory        Phone: (306) 966-6308
University of Saskatchewan                 FAX:   (306) 966-6058
Saskatoon, Canada.

More information about the Newlib mailing list