This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses
- From: "helugang at huawei dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Thu, 28 Nov 2019 03:53:54 +0000
- Subject: [Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses
- Auto-submitted: auto-generated
- References: <bug-25131-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=25131
--- Comment #17 from helugang <helugang at huawei dot com> ---
(In reply to Adhemerval Zanella from comment #16)
> (In reply to helugang from comment #15)
> > (In reply to Adhemerval Zanella from comment #13)
> > > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > > default the ifunc selector won't be used and instead
> > > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > > implementation.
> >
> > Hi, Adhemerval
> > With --disable-multi-arch configured,default memcpy() in
> > /sysdeps/arm/memcpy.S works and the performance is worse than which
> > __memcpy_vfp does.
> >
> > (gdb) bt
> > #0 memcpy () at ../sysdeps/arm/memcpy.S:64
> > #1 0xb6e44e18 in __GI___mempcpy (dest=dest@entry=0xbefffb39,
> > memcpy_vfp vs memcpy in /sysdeps/arm/memcpy.S
> > memcpy_1k_libmicro
> > 380.95153 449.6348
> > 380.96255 449.6283
> > 380.93659 449.6548
> > 380.95293 449.5687
> > 380.96071 449.6273
> > memcpy_800k_libmicro
> > 279131.15619 352820.5405
> > 286185.56789 353633.2598
> > 294103.06891 376391.7133
> > 285128.08039 389628.5461
> > 287601.79983 350731.9007
> > ./memcpy_1m_libmicro
> > 460492.84609 527388.05897
> > 450131.39191 550715.5123
> > 443782.37169 536471.34533
> > 472516.11484 532723.39375
> > 494700.10173 537163.65616
> >
> > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > default the ifunc selector won't be used and instead
> > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > implementation.
> > >>About the __memcpy_neon ,it does’t work after -mfpu=neon is set,could you help to check whether rebuilding glibc Is not enough to use __memcpy_neon or __memcpy_arm.
>
> To understand this microbench:
>
> > memcpy_vfp vs memcpy in /sysdeps/arm/memcpy.S
> > memcpy_1k_libmicro
> > 380.95153 449.6348
>
> Is the first column the result from memcpy_vfp and the second for the
> sysdeps/arm/memcpy.S?
-Yes.
>About the value, lower values are better or worse?
-Lower value is better because it means time cost,the unit is nsec.
>
> The memcpy implementation from glibc 2.11 is essentially the
> sysdeps/arm/memcpy.S from master with the fixes:
>
> 55668624cf2 - arm: Use push/pop mnemonics: no code
> change expected.
> 01b32e7361d - Add CFI statements to ARM's assembly code: no code
> change expected.
> 81cb7a0b2b6 - Remove sfi_* annotations from ARM assembly files: no code
> change expected.
> 55668624cf2 - arm: Use push/pop mnemonics: no code
> change expected.
> 9e1d4ac924d - ARM: Support avoiding pc as destination register: it will
> use BX iff compiler set __ARM_ARCH_4T__ and __THUMB_INTERWORK__
> bb48a26acf9 - ARM_BX_ALIGN_LOG2: internal
> alignment changes.
> 298e5d56dca - ARM: Fix memcpy & memmove for [ARM_ALWAYS_BX]: active
> only if ARM_ALWAYS_BX is set
>
> The ARM_ALWAYS_BX is not set internally by glibc (not sure how it is
> activated in fact), so what I can think it might changing performance is
> bb48a26acf9.
>
> So are you saying that memcpy_vfp is worse than memcpy from glibc 2.11 and
> sysdeps/arm/memcpy.S is also worse than glibc 2.11?
-Yes.
>
> Could you also check the results for the
> sysdeps/arm/armv7/multiarch/memcpy_arm.S ? You might need to hack glibc to
> enable with your build by comment sysdeps/arm/armv7/multiarch/ifunc-memcpy.h
> to always return return OPTIMIZE (arm).
-OK,we'll check.
Thanks a lot for your kindly support!
--
You are receiving this mail because:
You are on the CC list for the bug.