This is the mail archive of the
mailing list for the glibc project.
Re: [RFC PATCH] aarch64: improve memset
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, OndÅej BÃlka <neleai at seznam dot cz>, Marcus Shawcroft <marcus dot shawcroft at arm dot com>
- Date: Tue, 17 Feb 2015 18:41:28 -0800
- Subject: Re: [RFC PATCH] aarch64: improve memset
- Authentication-results: sourceware.org; auth=none
- References: <539BF47F dot 3030907 at twiddle dot net>
On Sat, Jun 14, 2014 at 12:06 AM, Richard Henderson <email@example.com> wrote:
> The major idea here is to use IFUNC to check the zva line size once, and use
> that to select different entry points. This saves 3 branches during startup,
> and allows significantly more flexibility.
> Also, I've cribbed several of the unaligned store ideas that Ondrej has done
> with the x86 versions.
> I've done some performance testing using cachebench, which suggests that the
> unrolled memset_zva_64 path is 1.5x faster than the current memset at 1024
> bytes and above. The non-zva path appears to be largely unchanged.
> I'd like to use some of Ondrej's benchmarks+data, but I couldn't locate them in
> a quick search of the mailing list. Pointers?
Yes I have a performance regression on ThunderX with this patch and
the newer versions still. Due to the placement of subs in the inner
most of loop of the non-zero case. Around a 20% regression.