This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch] aarch64: ThunderX2 specific memcpy and memmove

From: "Saharoy, Saikat" <Saikat dot Saharoy at cavium dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Tue, 30 May 2017 23:19:55 +0000
Subject: Re: [Patch] aarch64: ThunderX2 specific memcpy and memmove
Authentication-results: sourceware.org; auth=none
Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=cavium.com;
References: <AM5PR0802MB261046BF7DC3C6A0112BAC8183F00@AM5PR0802MB2610.eurprd08.prod.outlook.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Hi Wilco,

Thanks for your comments and suggestions.

The memcpy/memmove implementation for ThunderX2 have been fixed to pass glibc tests. The issue was missing return of the destination address, which is checked by glibc tests (sorry about the oversight).

We are working on further modifications to reduce 'branch' overhead and will resubmit the patch when ready.

Thanks,

Saikat
________________________________________
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Sent: Tuesday, May 30, 2017 5:23:24 AM
To: Saharoy, Saikat; libc-alpha@sourceware.org
Cc: nd
Subject: Re: [Patch] aarch64: ThunderX2 specific memcpy and memmove

Hi Saikat,

Is this the correct patch you intended to post? I'm asking because
neither memcpy nor memmove pass any tests (it's obvious from the
patch that the overlap case is completely missing from memmove),
the code is absolutely humongous (17KB!!!) and very inefficient due to
using inline assembler.

For any resubmission I'd recommend you first run all GLIBC tests and
benchmarks plus do a SPEC comparison run against the generic memcpy.

Due to the huge size and large number of branches I would expect it
to be significantly slower in the real world, even if it seems faster in
some microbenchmarks. glibc/benchtests/bench-memcpy-random is a
quick way to verify this as it is based on the memcpy size distribution
in SPEC.

Cheers,
Wilco

References:
- Re: [Patch] aarch64: ThunderX2 specific memcpy and memmove
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]