This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark
On 17/10/2018 05:45, Feng Xue wrote:
> Although prefetch load in previous version can benefit performance, it might cause a segfault. Thus, this patch removed that to ensure correct behaviour.
>
> Feng
> ---
>
> This version uses general register based memory instruction to load
> data, because vector register based is slightly slower in skylark.
>
> Character-matching is performed on 16-byte (both size and alignment)
> memory block in parallel each iteration.
Do you have numbers on much improvement this yields on skylark (using at
least glibc own benchtests)? Also, why use 16-bytes in loop instead of
default 32 (in your case basically unrolling the loop)?
I am asking because it seems that slower neon units seems to be a common
thing in recent chips, so one option would to instead of create a 'skylark'
variant, we add a 'no-neon' instead.