This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses
- From: "adhemerval.zanella at linaro dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Tue, 26 Nov 2019 18:13:09 +0000
- Subject: [Bug string/25131] memcpy perfomance problem with ARM 32 A9be due to high cache-misses
- Auto-submitted: auto-generated
- References: <bug-25131-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=25131
--- Comment #16 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to helugang from comment #15)
> (In reply to Adhemerval Zanella from comment #13)
> > I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> > default the ifunc selector won't be used and instead
> > sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> > implementation.
>
> Hi, Adhemerval
> With --disable-multi-arch configured,default memcpy() in
> /sysdeps/arm/memcpy.S works and the performance is worse than which
> __memcpy_vfp does.
>
> (gdb) bt
> #0 memcpy () at ../sysdeps/arm/memcpy.S:64
> #1 0xb6e44e18 in __GI___mempcpy (dest=dest@entry=0xbefffb39,
> memcpy_vfp vs memcpy in /sysdeps/arm/memcpy.S
> memcpy_1k_libmicro
> 380.95153 449.6348
> 380.96255 449.6283
> 380.93659 449.6548
> 380.95293 449.5687
> 380.96071 449.6273
> memcpy_800k_libmicro
> 279131.15619 352820.5405
> 286185.56789 353633.2598
> 294103.06891 376391.7133
> 285128.08039 389628.5461
> 287601.79983 350731.9007
> ./memcpy_1m_libmicro
> 460492.84609 527388.05897
> 450131.39191 550715.5123
> 443782.37169 536471.34533
> 472516.11484 532723.39375
> 494700.10173 537163.65616
>
> I forgot to mention that if compiler also targets armv7 with -mfpu=neon as
> default the ifunc selector won't be used and instead
> sysdeps/arm/armv7/multiarch/memcpy_neon.S will set as the default
> implementation.
> >>About the __memcpy_neon ,it does’t work after -mfpu=neon is set,could you help to check whether rebuilding glibc Is not enough to use __memcpy_neon or __memcpy_arm.
To understand this microbench:
> memcpy_vfp vs memcpy in /sysdeps/arm/memcpy.S
> memcpy_1k_libmicro
> 380.95153 449.6348
Is the first column the result from memcpy_vfp and the second for the
sysdeps/arm/memcpy.S? About the value, lower values are better or worse?
The memcpy implementation from glibc 2.11 is essentially the
sysdeps/arm/memcpy.S from master with the fixes:
55668624cf2 - arm: Use push/pop mnemonics: no code
change expected.
01b32e7361d - Add CFI statements to ARM's assembly code: no code
change expected.
81cb7a0b2b6 - Remove sfi_* annotations from ARM assembly files: no code
change expected.
55668624cf2 - arm: Use push/pop mnemonics: no code
change expected.
9e1d4ac924d - ARM: Support avoiding pc as destination register: it will use
BX iff compiler set __ARM_ARCH_4T__ and __THUMB_INTERWORK__
bb48a26acf9 - ARM_BX_ALIGN_LOG2: internal
alignment changes.
298e5d56dca - ARM: Fix memcpy & memmove for [ARM_ALWAYS_BX]: active only
if ARM_ALWAYS_BX is set
The ARM_ALWAYS_BX is not set internally by glibc (not sure how it is activated
in fact), so what I can think it might changing performance is bb48a26acf9.
So are you saying that memcpy_vfp is worse than memcpy from glibc 2.11 and
sysdeps/arm/memcpy.S is also worse than glibc 2.11?
Could you also check the results for the
sysdeps/arm/armv7/multiarch/memcpy_arm.S ? You might need to hack glibc to
enable with your build by comment sysdeps/arm/armv7/multiarch/ifunc-memcpy.h to
always return return OPTIMIZE (arm).
--
You are receiving this mail because:
You are on the CC list for the bug.