This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Improve strcat
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Andreas Schwab <schwab at linux-m68k dot org>, libc-alpha at sourceware dot org
- Date: Tue, 8 Oct 2013 17:00:12 +0200
- Subject: Re: [RFC] Improve strcat
- Authentication-results: sourceware.org; auth=none
- References: <20130909161112 dot GB23047 at domone dot kolej dot mff dot cuni dot cz> <mvmbo42dkiq dot fsf at hawking dot suse dot de> <20130909171703 dot GA32141 at domone dot kolej dot mff dot cuni dot cz> <87ob81c1yk dot fsf at igel dot home> <20130909191829 dot GA997 at domone dot kolej dot mff dot cuni dot cz> <522E28E9 dot 5000709 at redhat dot com> <20130910142117 dot GB6536 at domone dot kolej dot mff dot cuni dot cz> <20130910202844 dot GA11358 at domone dot kolej dot mff dot cuni dot cz> <20130911102311 dot GA22325 at domone dot kolej dot mff dot cuni dot cz> <52472FA0 dot 2070906 at redhat dot com>
On Sat, Sep 28, 2013 at 03:36:00PM -0400, Carlos O'Donell wrote:
> On 09/11/2013 06:23 AM, OndÅej BÃlka wrote:
> > On Tue, Sep 10, 2013 at 10:28:44PM +0200, OndÅej BÃlka wrote:
> >> Hi Carlos,
> >> Here is strcpy with comments. To get structure I decided to include
> >> ssse3 loop in this patch. If you are ok with splitting to loop header
> >> an ssse3 could be reviewed separately.
> >> I ommited actual strcat calls as I have patch that uses them ready and
> >> it needs bit of code movement.
> > For strcat there was one optimization oppurtunity left - find trailing
> > zeros in source and destination in parallel. This patch does exactly
> > that.
> > This allows us to directly jump to code that copies given amount of
> > bytes so I put strcat implementation to file strcpy-sse2-unaligned-v2.S.
> > I do not handle strncat yet, so I copied old strcat*.S to strncat*.S
> > I did not optimize instruction scheduling yet to make code easier to
> > read.
> > Results of benchmark are here.
> > http://kam.mff.cuni.cz/~ondra/benchmark_string/strcat_profile.html
> > Comments?
> Typo? STRAT? See below.
Yes, it is rfc as I wanted mostly show direction. On benchmarks it was
slower for gcc so header needs more work.
> > ---
> > sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S | 280 +------
> > sysdeps/x86_64/multiarch/strcat-ssse3.S | 868 +-------------------
> > .../x86_64/multiarch/strcpy-sse2-unaligned-v2.S | 217 ++++-
> > sysdeps/x86_64/multiarch/strcpy-ssse3-v2.S | 2 +-
> > sysdeps/x86_64/multiarch/strncat-sse2-unaligned.S | 285 ++++++-
> > sysdeps/x86_64/multiarch/strncat-ssse3.S | 869 ++++++++++++++++++++-
> > 6 files changed, 1364 insertions(+), 1157 deletions(-)
> > diff --git a/sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S b/sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S
> > index 028c6d3..03c1f18 100644
> > --- a/sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S
> > +++ b/sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S
> Please remind me why we're keeping this file around if we
> the implementation is in strcpy-sse2-unaligned-v2.S?
Build from clean state failed without them, I did not know yet where exactly is problem.