This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC PATCH] aarch64: improve memset


> Richard Henderson wrote:
> On 11/05/2014 03:35 PM, Will Newton wrote:
> > On 30 September 2014 12:03, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
> >> On 14 June 2014 08:06, Richard Henderson <rth@twiddle.net> wrote:
> >>> The major idea here is to use IFUNC to check the zva line size once, and use
> >>> that to select different entry points.  This saves 3 branches during startup,
> >>> and allows significantly more flexibility.
> >>>
> >>> Also, I've cribbed several of the unaligned store ideas that Ondrej has done
> >>> with the x86 versions.
> >>>
> >>> I've done some performance testing using cachebench, which suggests that the
> >>> unrolled memset_zva_64 path is 1.5x faster than the current memset at 1024
> >>> bytes and above.  The non-zva path appears to be largely unchanged.
> >>
> >>
> >> OK Thanks /Marcus
> >
> > It looks like this patch has slipped through the cracks. Richard, are
> > you happy to apply this or do you think it warrants further
> > discussion?
> 
> Sorry for the radio silence.
> 
> Just before I went to apply it I thought I spotted a bug that would affect
> ld.so.  I haven't had time to make sure one way or another.

I've got a few comments on this patch:

* Do we really need variants for cache line sizes that are never going to be used?
  I'd say just support 64 and 128, and default higher sizes to no_zva.

* Why special case line size=64 only? Unrolling might not help for 128 but should not
  harm either, and the alignment overhead only increases with larger line sizes, so you
  want to bypass the zva code in all cases if N < 3-4x line size.

* Is the no-ifunc variant still required/used? We're now having at least 4 different
  variants which all need to be tested and maintained...

* Finally, which version is used when linking statically? I presume there is some 
  makefile magic that causes the no-zva version to be used, however that might not be 
  optimal for all targets.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]