This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] PowerPC: stpcpy optimization for PPC64/POWER7
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Mon, 16 Sep 2013 20:33:47 +0200
- Subject: Re: [PATCH] PowerPC: stpcpy optimization for PPC64/POWER7
- Authentication-results: sourceware.org; auth=none
- References: <523715EE dot 9070408 at linux dot vnet dot ibm dot com>
On Mon, Sep 16, 2013 at 11:30:06AM -0300, Adhemerval Zanella wrote:
> Hi all,
>
> Following Alan Modra suggestion, it is a stpcpy optimization patch for PPC64.
> This patch optimizes the default PPC64 by adding doubleword stores/loads
> increasing aligned throughput for large sizes.
>
A obvious question here is why it needs be keept separate from strcpy
implementation. A sysdeps/powerpc/powerpc64/st[pr]cpy.S are quite
similar and I do not see a sysdeps/powerpc/powerpc64/power7/strcpy.S
file. Would same optimization apply to strcpy?
Also I noted in implementation that provided that if we handle case of
writing less than 8 bytes separately a best way how finish on x64 would
be compute end and do overlapping store of last 8 bytes.
There are two things I do not know, first one is computing end, on wikipedia
I found that it this could be handled by cntlz on mask and dividing it by 8.
Second is how slow are overlapping stores versus branch misprediction.
You need a benchmark that will vary sizes to check this, I could supply
one.