This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: libc-alpha at sourceware dot org
Date: Wed, 17 Oct 2018 09:17:13 -0300
Subject: Re: [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark
References: <HK0PR02MB2769D21CFB882C906DF6FBB484FF0@HK0PR02MB2769.apcprd02.prod.outlook.com>

On 17/10/2018 05:45, Feng Xue wrote:
> Although prefetch load in previous version can benefit performance, it might cause a segfault. Thus, this patch removed that to ensure correct behaviour.
> 
> Feng
> ---
> 
> This version uses general register based memory instruction to load
> data, because vector register based is slightly slower in skylark.
> 
> Character-matching is performed on 16-byte (both size and alignment)
> memory block in parallel each iteration.

Do you have numbers on much improvement this yields on skylark (using at
least glibc own benchtests)? Also, why use 16-bytes in loop instead of
default 32 (in your case basically unrolling the loop)? 

I am asking because it seems that slower neon units seems to be a common
thing in recent chips, so one option would to instead of create a 'skylark'
variant, we add a 'no-neon' instead.

References:
- [PATCH v2 3/3] aarch64: Optimized memchr specific to AmpereComputing skylark
  - From: Feng Xue

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]