This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH V3 3/3] sparc: M7 optimized memcpy/mempcpy/memmove/memset/bzero.


From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Date: Mon, 21 Mar 2016 11:17:13 -0700

> +#define ST_CHUNK	24   /* multiple of 4 due to loop unrolling */
> +#define MIN_LOOP	32830
> +#define MIN_ZERO	256

There is no way you should need 32K of length to justify using the
cache line clearing stores for the non-bzero case.

There is also no reason you need to use the %asi register at all.

Please see the Niagara4 memset where we use the immediate ASI value
stores.

Once you get rid of all of the %asi accesses, those metrics above
can be decreased significantly.

These comments apply equally for your memcpy implementation as well.

You should avoid %asi register accesses at all costs, and there are
enough scratch registers to hold the pointer offsets in the inner
loop.

Again, the Niagara4 memcpy implementation should be your guide for
these sorts of things.

Thanks.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]