This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Steve Ellcey <sellcey at imgtec dot com>
Cc: libc-alpha at sourceware dot org
Date: Sat, 20 Dec 2014 10:09:33 +0100
Subject: Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6
Authentication-results: sourceware.org; auth=none
References: <2923c970-026c-4e00-be7a-0650e82421b5 at BAMAIL02 dot ba dot imgtec dot org>

On Fri, Dec 19, 2014 at 03:26:44PM -0800, Steve Ellcey  wrote:
> Here is the last of my patches for mips32r6/mips64r6 support.  It updates
> memset to use byte copies instead of stl or str to align the destination
> because those instructions are not supported in mips32r6 or mips64r6.
> It also avoids using the 'prepare for store' prefetch hint because that
> is not supported on mips32r6 or mips64r6 either.
> 
> Tested with the mips32r6/mips64r6 GCC, binutils and qemu simulator.
> 
> OK to checkin?
> 
> Steve Ellcey
> sellcey@imgtec.com
> 
> 
>  	PTR_ADDU a0,a0,t2
> +#else /* R6_CODE */
> +	andi	t2,a0,7
> +	lapc	t9,L(atable)
> +	PTR_LSA	t9,t2,t9,2
> +	jrc	t9
> +L(atable):
> +	bc	L(aligned)
> +	bc	L(lb7)

That could be performance regression, test if its faster than existing
loop on unpredictable branches [B]

Also try if just branches are better, like in following c code [A]

Table lookup could be even slower in real workloads as it adds latency
when table is not in cache.

>From practical standpoint realigning code looks like dead code, on x64 
83% percents of calls are 16 byte aligned and I cannot find application
that makes call unaligned to 8 bytes.

You will get better speedup by adding a check if its already aligned and
moving realignment code to bottom of file to improve instruction cache
usage.

[A]

if (((int) x) & 1)
  *x = mask;
x &= ~1;

if (((int) x) & 2)
  *((uint16_t*) x) = mask;
x &= ~2;

if (((int) x) & 4)
  *((uint32_t*) x) = mask;
x &= ~4;



[B]

#include <string.h>

int main (int x)
{
  char foo[100];
  int i; 
  for (i = 0; i < 100000000; i++)
    memset (foo + (i % 16), 1, 32 - (i % 16));
        
  return foo[17];
}

References:
- [Patch, MIPS] Modify memset.S for mips32r6/mips64r6
  - From: Steve Ellcey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]