This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: nd <nd at arm dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
- Date: Wed, 17 Oct 2018 14:43:06 -0300
- Subject: Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark
- References: <DB5PR08MB10302C248EFF8897BDDA490483FF0@DB5PR08MB1030.eurprd08.prod.outlook.com>
On 17/10/2018 14:20, Wilco Dijkstra wrote:
>> Do you have numbers on much improvement this yields on skylark (using at
>> least glibc own benchtests)? Also, why use 16-bytes in loop instead of
>> default 32 (in your case basically unrolling the loop)?
> Since it aligns early the overhead of aligning to 32 bytes would be even higher.
> The approach strlen uses (first iteration unaligned) is much faster.
>> I am asking because it seems that slower neon units seems to be a common
>> thing in recent chips, so one option would to instead of create a 'skylark'
>> variant, we add a 'no-neon' instead.
> It's likely more about unaligned access performance - even an old Cortex-A72 does
> much better using the Neon version. Both memchr variants can be optimized further.
> In general it seems better indeed to use "generic" and "simd" in the names rather than
> obscure microarchitecture names.
Do you mean unaligned neon memory operations? And I agree that if the idea is
provide a generic and neon version, a better name scheme should be used.