This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch][Aarch64] memcpy IFUNC for Cavium ThunderX2

From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
To: sellcey at cavium dot com, libc-alpha <libc-alpha at sourceware dot org>
Cc: nd at arm dot com
Date: Fri, 16 Feb 2018 18:39:24 +0000
Subject: Re: [Patch][Aarch64] memcpy IFUNC for Cavium ThunderX2
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
Nodisclaimer: True
References: <1518653077.14236.76.camel@cavium.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 15/02/18 00:04, Steve Ellcey wrote:

This patch adds a new memcpy ifunc for Cavium ThunderX2.  The difference
between this and the Thunderx version is in the prefetching.  ThunderX2
has different cache characteristics and so uses a different prefetching
strategy.  Note that I prefetch past the end of the buffer being copied
but my understanding is that that is legal and should never generate any
errors.  I tried adding code to not prefetch past the end of the source
but those changes slowed down memcpy so I did not include them.

I did not copy memcpy_thunderx.S to memcpy_thunderx2.S but just use
memcpy_thunderx2.S to set some macros and then include memcpy_thunderx.S.
This is to reduce duplicate code.

I have attached the memcpy benchmark output files from a ThunderX2 run,
the main differences are in bench-memcpy-large.out.

Tested with no regressions, OK to checkin?


the code looks ok, and it is ok to commit if you think this gives
benefit on thunderx2 (it should not affect other targets other
than code bloat).

i prefer not to add a new memcpy every time there is a new uarch,
so i think in the long term old ones should be removed or merged
(i'm not yet sure what's the right policy here, e.g. if a target
is not available to anyone in the community for benchmarking it
will be removed or if there is not enough performance benefit).

i don't see a huge performance difference in the benchmark logs
and there are a few weird cases e.g. in bench-memcpy.out

    {
     "length": 1888,
     "align1": 0,
     "align2": 59,
     "timings": [151.016, 1905.47, 150.547, 257.656, 147.969, 151.172]
    },

the memcpy_thunderx2 is very slow (and memcpy_falkor is the fastest).

Steve Ellcey
sellcey@cavium.com


2018-02-14  Steve Ellcey  <sellcey@cavium.com>

	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
	Add memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC):
	Increment to 4.
	(__libc_ifunc_impl_list): Add __memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/memcpy.c (libc_ifunc): Add IS_THUNDERX2
	and IS_THUNDERX2PA checks.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S (USE_THUNDERX2):
	Use macro to set name appropriately.
	(memcpy): Use USE_THUNDERX2 macro to modify prefetches.
	* sysdeps/aarch64/multiarch/memcpy_thunderx2.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_THUNDERX2PA):
	New macro.
	(IS_THUNDERX2): New macro.

Follow-Ups:
- Re: [Patch][Aarch64] memcpy IFUNC for Cavium ThunderX2
  - From: Steve Ellcey

References:
- [Patch][Aarch64] memcpy IFUNC for Cavium ThunderX2
  - From: Steve Ellcey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]