Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6

On Fri, Dec 19, 2014 at 03:26:44PM -0800, Steve Ellcey  wrote:
> Here is the last of my patches for mips32r6/mips64r6 support.  It updates
> memset to use byte copies instead of stl or str to align the destination
> because those instructions are not supported in mips32r6 or mips64r6.
> It also avoids using the 'prepare for store' prefetch hint because that
> is not supported on mips32r6 or mips64r6 either.
> Tested with the mips32r6/mips64r6 GCC, binutils and qemu simulator.
> OK to checkin?
> Steve Ellcey
>  	PTR_ADDU a0,a0,t2
> +#else /* R6_CODE */
> +	andi	t2,a0,7
> +	lapc	t9,L(atable)
> +	PTR_LSA	t9,t2,t9,2
> +	jrc	t9
> +L(atable):
> +	bc	L(aligned)
> +	bc	L(lb7)

That could be performance regression, test if its faster than existing
loop on unpredictable branches [B]

Also try if just branches are better, like in following c code [A]

Table lookup could be even slower in real workloads as it adds latency
when table is not in cache.

>From practical standpoint realigning code looks like dead code, on x64 
83% percents of calls are 16 byte aligned and I cannot find application
that makes call unaligned to 8 bytes.

You will get better speedup by adding a check if its already aligned and
moving realignment code to bottom of file to improve instruction cache


if (((int) x) & 1)
  *x = mask;
x &= ~1;

if (((int) x) & 2)
  *((uint16_t*) x) = mask;
x &= ~2;

if (((int) x) & 4)
  *((uint32_t*) x) = mask;
x &= ~4;


#include <string.h>

int main (int x)
  char foo[100];
  int i; 
  for (i = 0; i < 100000000; i++)
    memset (foo + (i % 16), 1, 32 - (i % 16));
  return foo[17];

