This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark



On 17/10/2018 05:45, Feng Xue wrote:
> Although prefetch load in previous version can benefit performance, it might cause a segfault. Thus, this patch removed that to ensure correct behaviour.
> 
> Feng
> ---
> 
> This version uses general register based memory instruction to load
> data, because vector register based is slightly slower in skylark.
> 
> Character-matching is performed on 16-byte (both size and alignment)
> memory block in parallel each iteration.

Do you have numbers on much improvement this yields on skylark (using at
least glibc own benchtests)? Also, why use 16-bytes in loop instead of
default 32 (in your case basically unrolling the loop)? 

I am asking because it seems that slower neon units seems to be a common
thing in recent chips, so one option would to instead of create a 'skylark'
variant, we add a 'no-neon' instead.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]