This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.


On Wed, 2015-06-10 at 14:50 +0200, OndÅej BÃlka wrote:
> On Tue, Jun 09, 2015 at 11:01:24AM -0500, Steven Munroe wrote:
> > On Tue, 2015-06-09 at 17:42 +0200, OndÅej BÃlka wrote:
> > > On Tue, Jun 09, 2015 at 10:06:33AM -0500, Steven Munroe wrote:
> > > > On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> > > > > 
> > > > > On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > > > > > The proposed patch adds a new feature for powerpc. In order to get
> > > > > > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > > > > > This enables users to write versioned code based on the HWCAP bits
> > > > > > without going through the overhead of reading them from the auxiliary
> > > > > > vector.
> > > > 
> > > > > i assume this is for multi-versioning.
> > > > 
> > > > The intent is for the compiler to implement the equivalent of
> > > > __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> > > > is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> > > > efficiently as getauxv and scanning the auxv is too slow for inline
> > > > optimizations.
> > > > 
>Snip

After all was said and done, much more was said then done ....

Sorry I have been on vacation and them catching up on day job from being
on vacation. 

But i think we need to reset the discussion and hopefully eliminate some
misconceptions:

1) This is not about the clever things what this clever things that this
community knows how to do, it is what the average Linux application
developer is willing to learn and use. 

I have tried to get them to use; CPU Platform libraries (library search
based on AT_PLATFORM). the AuxV and HWCAP directly, and use IFUNC. They
will not do this. 

They think this is all silly and too complicated. But we still want them
to exploit features of the latest processor while continuing to run on
existing processors in the field. Processor architectures evolve and we
have to give them a simple mechanism that they will actually use, to
handle this.  __builtin_cpu_supports() seems to be something they will
use.

2) This is not about exposing a private GLIBC resource (TCB) to the the
compiler. The TCB and TLS is part of the Platform ABI and must be known,
used, and understood by the compiler (GCC, LLVM, ...) binutils,
debuggers, etc in addition to GLIBC:

Power Architecture 64-Bit ELF V2 ABI Specification, OpenPOWER ABI for
Linux Supplement: Section 3.7.2 TLS Runtime Handling

This and other useful documents are available from the OpenPOWER
Foundation: http://openpowerfoundation.org/

If you look, you will see that TCB slots have already been allocated to
support other PowerISA specific features like; Event Based Branching,
Dynamic System Optimization, and Target Address Save. Recently we added
split-stack support for the GO language that required a TCB slot. So
adding a double word slot to cache AT_HWCAP and AT_HWCAP2 is no big
deal.

So far this all fits nicely in a single 128 byte cache-line. The TLS ABI
(which I defined back in back in 2004) reserved a full 4KB for the TCB
and extensions.

This all was not done lightly and was discussed extensively with the
appropriate developers in the corresponding projects. You all may not
have seen this because GLIBC not directly involved except as the owner
of ./sysdeps/powerpc/nptl/tls.h

The only reason we raised this discussion here because we wanted to
publish a platform specific API
in ./sysdeps/unix/sysv/linux/powerpc/bits/ppc.h to make is easier for
the compilers to access it. And we felt it would be rude not discuss
this with the community.

3) I would think that the platform maintainers would have the standing
to implement their own platform ABI? Perhaps the project maintainers
would like to weigh in on this?

4) I have ask Carlos Seo to develop some micro benchmarks to illuminate
the performance implications of the various alternatives to the direct
TCB access proposal. If necessarily we can provide detail cycle accurate
instruction pipeline timings. 









Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]