This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.

From: David Miller <davem at redhat dot com>
To: jose dot marchesi at oracle dot com
Cc: libc-alpha at sourceware dot org
Date: Mon, 21 Mar 2016 14:45:58 -0400 (EDT)
Subject: Re: [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.
Authentication-results: sourceware.org; auth=none
References: <1458584233-15225-1-git-send-email-jose dot marchesi at oracle dot com> <1458584233-15225-4-git-send-email-jose dot marchesi at oracle dot com>

From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Date: Mon, 21 Mar 2016 11:17:13 -0700

> +#define ST_CHUNK	24   /* multiple of 4 due to loop unrolling */
> +#define MIN_LOOP	32830
> +#define MIN_ZERO	256

There is no way you should need 32K of length to justify using the
cache line clearing stores for the non-bzero case.

There is also no reason you need to use the %asi register at all.

Please see the Niagara4 memset where we use the immediate ASI value
stores.

Once you get rid of all of the %asi accesses, those metrics above
can be decreased significantly.

These comments apply equally for your memcpy implementation as well.

You should avoid %asi register accesses at all costs, and there are
enough scratch registers to hold the pointer offsets in the inner
loop.

Again, the Niagara4 memcpy implementation should be your guide for
these sorts of things.

Thanks.

References:
- [PATCH V3 0/3] SPARC M7 optimized memcpy/memmove/memset routines.
  - From: Jose E. Marchesi
- [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.
  - From: Jose E. Marchesi

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]