This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses


https://sourceware.org/bugzilla/show_bug.cgi?id=25131

--- Comment #16 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to helugang from comment #15)
> (In reply to Adhemerval Zanella from comment #13)
> > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > default the ifunc selector won't be used and instead
> > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > implementation.
> 
> Hi, Adhemerval
> With --disable-multi-arch configured,default memcpy() in
> /sysdeps/arm/memcpy.S works and the performance is worse than which
> __memcpy_vfp does.
> 
> (gdb) bt
> #0  memcpy () at ../sysdeps/arm/memcpy.S:64
> #1  0xb6e44e18 in __GI___mempcpy (dest=dest@entry=0xbefffb39,
> 	memcpy_vfp  vs memcpy in /sysdeps/arm/memcpy.S 	
> memcpy_1k_libmicro
> 	380.95153	449.6348
> 	380.96255	449.6283
> 	380.93659	449.6548
> 	380.95293	449.5687
> 	380.96071	449.6273
> memcpy_800k_libmicro
> 	279131.15619	352820.5405
> 	286185.56789	353633.2598
> 	294103.06891	376391.7133
> 	285128.08039	389628.5461
> 	287601.79983	350731.9007
> ./memcpy_1m_libmicro
> 	460492.84609	527388.05897
> 	450131.39191	550715.5123
> 	443782.37169	536471.34533
> 	472516.11484	532723.39375
> 	494700.10173	537163.65616
> 
> I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> default the ifunc selector won't be used and instead
> sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> implementation.
> >>About the __memcpy_neon ,it does’t work after -mfpu=neon is set,could you help to check whether  rebuilding glibc Is not enough to use  __memcpy_neon or __memcpy_arm.

To understand this microbench:

> 	memcpy_vfp  vs memcpy in /sysdeps/arm/memcpy.S 	
> memcpy_1k_libmicro
> 	380.95153	449.6348

Is the first column the result from memcpy_vfp and the second for the
sysdeps/arm/memcpy.S? About the value, lower values are better or worse? 

The memcpy implementation from glibc 2.11 is essentially the
sysdeps/arm/memcpy.S from master with the fixes:

  55668624cf2 - arm: Use push/pop mnemonics:                      no code
change expected.
  01b32e7361d - Add CFI statements to ARM's assembly code:        no code
change expected.
  81cb7a0b2b6 - Remove sfi_* annotations from ARM assembly files: no code
change expected.
  55668624cf2 - arm: Use push/pop mnemonics:                      no code
change expected.
  9e1d4ac924d - ARM: Support avoiding pc as destination register: it will use
BX iff compiler set __ARM_ARCH_4T__ and __THUMB_INTERWORK__
  bb48a26acf9 - ARM_BX_ALIGN_LOG2:                                internal
alignment changes.
  298e5d56dca - ARM: Fix memcpy & memmove for [ARM_ALWAYS_BX]:    active only
if ARM_ALWAYS_BX is set

The ARM_ALWAYS_BX is not set internally by glibc (not sure how it is activated
in fact), so what I can think it might changing performance is bb48a26acf9.

So are you saying that memcpy_vfp is worse than memcpy from glibc 2.11 and
sysdeps/arm/memcpy.S is also worse than glibc 2.11?

Could you also check the results for the
sysdeps/arm/armv7/multiarch/memcpy_arm.S ? You might need to hack glibc to
enable with your build by comment sysdeps/arm/armv7/multiarch/ifunc-memcpy.h to
always return return OPTIMIZE (arm).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]