This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.
- From: David Miller <davem at redhat dot com>
- To: jose dot marchesi at oracle dot com
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 21 Mar 2016 14:45:58 -0400 (EDT)
- Subject: Re: [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.
- Authentication-results: sourceware.org; auth=none
- References: <1458584233-15225-1-git-send-email-jose dot marchesi at oracle dot com> <1458584233-15225-4-git-send-email-jose dot marchesi at oracle dot com>
From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Date: Mon, 21 Mar 2016 11:17:13 -0700
> +#define ST_CHUNK 24 /* multiple of 4 due to loop unrolling */
> +#define MIN_LOOP 32830
> +#define MIN_ZERO 256
There is no way you should need 32K of length to justify using the
cache line clearing stores for the non-bzero case.
There is also no reason you need to use the %asi register at all.
Please see the Niagara4 memset where we use the immediate ASI value
stores.
Once you get rid of all of the %asi accesses, those metrics above
can be decreased significantly.
These comments apply equally for your memcpy implementation as well.
You should avoid %asi register accesses at all costs, and there are
enough scratch registers to hold the pointer offsets in the inner
loop.
Again, the Niagara4 memcpy implementation should be your guide for
these sorts of things.
Thanks.