This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

From: Richard Henderson <rth at twiddle dot net>
To: munroesj at linux dot vnet dot ibm dot com
Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Carlos Eduardo Seo <cseo at linux dot vnet dot ibm dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
Date: Tue, 30 Jun 2015 07:49:59 +0100
Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
Authentication-results: sourceware.org; auth=none
References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <5576FC80 dot 1090806 at arm dot com> <1433862393 dot 21101 dot 9 dot camel at sjmunroe-ThinkPad-W500> <5591239A dot 9030907 at twiddle dot net> <1435603025 dot 5485 dot 23 dot camel at oc7878010663>

On 06/29/2015 07:37 PM, Steven Munroe wrote:

On Mon, 2015-06-29 at 11:53 +0100, Richard Henderson wrote:

On 06/09/2015 04:06 PM, Steven Munroe wrote:

On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:


On 08/06/15 22:03, Carlos Eduardo Seo wrote:

The proposed patch adds a new feature for powerpc. In order to get
faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
This enables users to write versioned code based on the HWCAP bits
without going through the overhead of reading them from the auxiliary
vector.

i assume this is for multi-versioning.


The intent is for the compiler to implement the equivalent of
__builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
is RISC so we use the HWCAP. The trick to access the HWCAP[2]
efficiently as getauxv and scanning the auxv is too slow for inline
optimizations.


There is getauxval(), which doesn't scan auxv for HWCAP[2], but rather reads
the variables private to glibc that already contain this information.  That
ought to be fast enough for the builtin, rather than consuming space in the TCB.


Richard I do not understand how a 38 instruction function accessed via a
PLT call stub (minimum 4 additional instructions) is equivalent or "as
good as" a single in-line load instruction.

Even with best case path for getauxval HWCAP2 we are at 14 instructions
with exposure to 3 different branch miss predicts. And that is before
the application can execute its own __builtin_cpu_supports() test.

Lets look at a real customer example. The customer wants to use the P8
128-bit add/sub but also wants to be able to unit test code on existing
P7 machines. Which results in something like this:

static inline vui32_t
vec_addcuq (vui32_t a, vui32_t b)
{
         vui32_t t;

                 if (__builtin_cpu_supports("PPC_FEATURE2_HAS_VSXâ))
                 {

                         __asm__(
                             "vaddcuq %0,%1,%2;"
                             : "=v" (t)
                             : "v" (a),
                               "v" (b)
                             : );

...


So it is clear to me that executing 14+ instruction to decide if I can
optimize to use new single instruction optimization is not a good deal.

This is a horrible way to use this builtin. In the same way that using ifunc atthis level would also be horrible.

Even supposing that this builtin uses a single load, you've at least doubledthe overhead of using the insn. The user really should be aware of this andmanually hoist this check much farther up the call chain. At which point thedifference between 2 cycles for a load and 40 cycles for a call is immaterial.

And if the user is really concerned about unit tests, surely ifdefs are moreappropriate for this situation. At the moment one can only test the P7 path onP7 and the P8 path on P8; better if one can also test the P7 path on P8.

r~

Follow-Ups:
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe

References:
- [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Carlos Eduardo Seo
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Szabolcs Nagy
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Richard Henderson
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]