This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: 'GNU C Library' <libc-alpha at sourceware dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: nd <nd at arm dot com>
Date: Fri, 14 Dec 2018 19:39:47 +0000
Subject: Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h

Hi,

I finally had a chance to benchmark this on a recent core using the traces posted
here: https://www.sourceware.org/ml/libc-alpha/2018-12/msg00492.html
I used the same trace for sincosf but this requires a small wrapper to accurately
measure latency and throughput - I'll post the patch for this next week.

On Sandy Bridge I get 10% gain in throughput with the vector FMA version of sincosf
(at the cost of ~2% extra latency), but unfortunately it's 5% slower on AArch64
and latency increases by 10% due to GCC using some unnecessary DUPs and
lane inserts. Sinf and cosf are also affected by the data layout change, throughput
and latency are 1.7% and 1.0% worse respectively.

So it seems the best approach would be to make the vector version conditional on
a macro defined by the target so each target can choose the fastest variant.

Wilco

Follow-Ups:
- Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]