This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark



On 17/10/2018 14:20, Wilco Dijkstra wrote:
> Hi,
> 
>> Do you have numbers on much improvement this yields on skylark (using at
>> least glibc own benchtests)? Also, why use 16-bytes in loop instead of
>> default 32 (in your case basically unrolling the loop)? 
> 
> Since it aligns early the overhead of aligning to 32 bytes would be even higher.
> The approach strlen uses (first iteration unaligned) is much faster.
> 
>> I am asking because it seems that slower neon units seems to be a common
>> thing in recent chips, so one option would to instead of create a 'skylark'
>> variant, we add a 'no-neon' instead.
> 
> It's likely more about unaligned access performance - even an old Cortex-A72 does
> much better using the Neon version. Both memchr variants can be optimized further.
> In general it seems better indeed to use "generic" and "simd" in the names rather than
> obscure microarchitecture names.

Do you mean unaligned neon memory operations? And I agree that if the idea is
provide a generic and neon version, a better name scheme should be used.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]