This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses


https://sourceware.org/bugzilla/show_bug.cgi?id=25131

--- Comment #20 from helugang <helugang at huawei dot com> ---
(In reply to Adhemerval Zanella from comment #16)
> (In reply to helugang from comment #15)
> > (In reply to Adhemerval Zanella from comment #13)
> > > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > > default the ifunc selector won't be used and instead
> > > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > > implementation.
> > 
> > Hi, Adhemerval
> > With --disable-multi-arch configured,default memcpy() in
> > /sysdeps/arm/memcpy.S works and the performance is worse than which
> > __memcpy_vfp does.
> > 
> > (gdb) bt
> > #0  memcpy () at ../sysdeps/arm/memcpy.S:64
> > #1  0xb6e44e18 in __GI___mempcpy (dest=dest@entry=0xbefffb39,
> > 	memcpy_vfp  vs memcpy in /sysdeps/arm/memcpy.S 	
> > memcpy_1k_libmicro
> > 	380.95153	449.6348
> > 	380.96255	449.6283
> > 	380.93659	449.6548
> > 	380.95293	449.5687
> > 	380.96071	449.6273
> > memcpy_800k_libmicro
> > 	279131.15619	352820.5405
> > 	286185.56789	353633.2598
> > 	294103.06891	376391.7133
> > 	285128.08039	389628.5461
> > 	287601.79983	350731.9007
> > ./memcpy_1m_libmicro
> > 	460492.84609	527388.05897
> > 	450131.39191	550715.5123
> > 	443782.37169	536471.34533
> > 	472516.11484	532723.39375
> > 	494700.10173	537163.65616
> > 
> > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > default the ifunc selector won't be used and instead
> > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > implementation.
> > >>About the __memcpy_neon ,it does’t work after -mfpu=neon is set,could you help to check whether  rebuilding glibc Is not enough to use  __memcpy_neon or __memcpy_arm.
> 
> To understand this microbench:
> 
> > 	memcpy_vfp  vs memcpy in /sysdeps/arm/memcpy.S 	
> > memcpy_1k_libmicro
> > 	380.95153	449.6348
> 
> Is the first column the result from memcpy_vfp and the second for the
> sysdeps/arm/memcpy.S? About the value, lower values are better or worse? 
> 
> The memcpy implementation from glibc 2.11 is essentially the
> sysdeps/arm/memcpy.S from master with the fixes:
> 
>   55668624cf2 - arm: Use push/pop mnemonics:                      no code
> change expected.
>   01b32e7361d - Add CFI statements to ARM's assembly code:        no code
> change expected.
>   81cb7a0b2b6 - Remove sfi_* annotations from ARM assembly files: no code
> change expected.
>   55668624cf2 - arm: Use push/pop mnemonics:                      no code
> change expected.
>   9e1d4ac924d - ARM: Support avoiding pc as destination register: it will
> use BX iff compiler set __ARM_ARCH_4T__ and __THUMB_INTERWORK__
>   bb48a26acf9 - ARM_BX_ALIGN_LOG2:                                internal
> alignment changes.
>   298e5d56dca - ARM: Fix memcpy & memmove for [ARM_ALWAYS_BX]:    active
> only if ARM_ALWAYS_BX is set
>   
> The ARM_ALWAYS_BX is not set internally by glibc (not sure how it is
> activated in fact), so what I can think it might changing performance is
> bb48a26acf9.
> 
> So are you saying that memcpy_vfp is worse than memcpy from glibc 2.11 and
> sysdeps/arm/memcpy.S is also worse than glibc 2.11?
> 
> Could you also check the results for the
> sysdeps/arm/armv7/multiarch/memcpy_arm.S ? You might need to hack glibc to
> enable with your build by comment sysdeps/arm/armv7/multiarch/ifunc-memcpy.h
> to always return return OPTIMIZE (arm).

Hi,Adhemerval
We have checked the result of memcpy_arm,the performance is worse than
memcpy_vfp, you can see the attachment 12093 .

About >bb48a26acf9 - ARM_BX_ALIGN_LOG2: internal alignment changes.
We have tried to revert the change in sysdeps/arm/memcpy.S and test result is
better than before in memcpy_1m_libmicro,but still not as well as glibc2.11.
You can see the attachment revert-arm-bx-align2.

About the memcpy_neon seems not work on our platform,will double check.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]