This is the mail archive of the newlib@sourceware.cygnus.com mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: memcpy performance (fwd)



Watch out for those performance minded RTEMS users. You will hear about a
wasted cycle for sure. :)

Here is Eric's feedback on what toolset/arguments he was using.  

FYI he ported the KA9Q and Linux TCP/IP stacks to RTEMS, the FP
trap code required for the 68040, implemented the termios console
support, and written the 68360 BSP. He is pretty swift. :)

--joel


---------- Forwarded message ----------
Date: Tue,  9 Dec 97 16:09:55 -0600
From: Eric Norum <eric@skatter.usask.ca>
To: Joel Sherrill <joel@OARcorp.com>
Subject: Re: memcpy performance

You wrote:
> What args did you give to gcc for the case you reported on the
> list? One of the new Cygnus newlib maintainers wants to know. And
> before they ask what version of gcc are you using. <
>
> I am getting pretty good responses this week from the Cygnus sde of
> the world.

m68k-rtems-gcc --version
egcs-2.90.04 970901 (gcc2-970802 experimental)


Here's how memcpy.c gets compiled.

/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/gcc/xgcc  
-B/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/gcc/  
-idirafter  
/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/m68k-rtems/newlib/targ-include  
-idirafter  
/shareNeXT/OS4.2/RTEMS/src/tools-970904/src/newlib/libc/include  
-nostdinc -O2 -g -pipe  -m68332  -O2 -DHAVE_GETTIMEOFDAY  
-DMALLOC_PROVIDED -DEXIT_PROVIDED -DMISSING_SYSCALL_NAMES  
-DSIGNAL_PROVIDED -DREENTRANT_SYSCALLS_PROVIDED -fno-builtin  
-I/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/m68k-rtems/newlib/./targ-include  
-I/shareNeXT/OS4.2/RTEMS/src/tools-970904/src/newlib/./libc/include  
-c ../../../../../../src/newlib/libc/string/memcpy.c

This produces the 5-instruction/byte copy:
0xe2ea <memcpy+22>:     moveb %a1@+,%a0@+
0xe2ec <memcpy+24>:     movel %d1,%d0
0xe2ee <memcpy+26>:     subql #1,%d1
0xe2f0 <memcpy+28>:     tstl %d0
0xe2f2 <memcpy+30>:     bnes 0xe2ea <memcpy+22>

Changing the memcpy source to:
        if (len) {
                do {
                        *ap++ = *bp++;
                } while (--len);
        }
improves the loop to:
.L9:
        move.b (%a0)+,(%a1)+
        subq.l #1,%d0
        jbne .L9
No loop mode, but certainly a lot faster!

The  `memcpy turns into bcopy which calls memmove' problem is  
because of the way the compiler was built.  The  
-DTARGET_MEM_FUNCTIONS=1 flag should be used (or set up when the  
compiler is configured).  Perhaps this change could make it into the  
next tools distribution.

---
Eric Norum                                 eric@skatter.usask.ca
Saskatchewan Accelerator Laboratory        Phone: (306) 966-6308
University of Saskatchewan                 FAX:   (306) 966-6058
Saskatoon, Canada.