memcpy performance (fwd)
Tue Dec 9 10:23:00 GMT 1997
I thought I would pass this on. Does the new version of memcpy do much
better than this?
---------- Forwarded message ----------
Date: Tue, 9 Dec 97 12:03:28 -0600
From: Eric Norum <eric@skatter.USask.Ca>
Subject: Re: memcpy performance
It's even worse than just a byte-by-byte copy!
On the 971024 snapshot (gen68360 BSP) a call to memcpy produces:
1) A call to bcopy
2) The bcopy routine links a stack frame and calls memmove
3) The memmove routine:
a) links a stack frame
b) checks for overlap
c) does a byte-by-byte copy
5 instructions/byte on a CPU32 processor!
There's a heck a of a lot of unnecessary code here:
Two extra function calls
Two extra stack frames
Extra code to check for overlap
A very inefficient loop
Processor-independent improvements required:
1) There should be an explicit memcpy routine.
2) The library should be compiled with aggressive optimization.
Processor-dependent improvements that would be nice:
M68k - The loop in memmove should be done in such a way that
processors like the CPU32 can go into loop mode.
Now all we need is a willing volunteer......
Eric Norum firstname.lastname@example.org
Saskatchewan Accelerator Laboratory Phone: (306) 966-6308
University of Saskatchewan FAX: (306) 966-6058
More information about the Newlib