This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Additional targets for powerpc


Introduction

It has become increasingly clear that PowerPC processor family needs 
additional (machine) targets for the Linux distributions. At present, 
Linux only has two targets (powerpc32 and powerpc64, powerpc is synonym 
for powerpc32) for PowerPC. These targets only address the two operating 
modes (32- and 64-bit) and don't address the wide range of processor 
families and chips available. With only one target per mode, we are forced 
to compile for a common subset of powerpc instructions and default 
instruction scheduling. 

As we have PowerPC processors ranging from embedded systems to large 
servers this means we are sacrificing performance for commonality even 
when it is not strictly required. The PowerPC architecture has been around 
a long time, and consequently the common subset (for tuning and 
instructions) has become less and less relevant for systems that have 
actually shipped in the last few years.  Moreover, a common subset 
prevents exploitation of microarchitectural differences between Power4, 
Power5, and ppc970.  Addition of a new processor called Power6 
(http://www-128.ibm.com/developerworks/power/newto.html) may engender even 
more microarchitectural differences. 

The goal of this proposal is to:

* Improve application performance on current distributions
* Allow applications running on a machine to exploit the CPU-specific 
tuned libraries available on that machine
* Provide a general framework for CPU-specific tuning for the PowerPC 
architecture 

Approach

The approach we are proposing is to:

* Allow multiple processor specific (performance tuned) assembler 
implementations of core memory and string functions (memcpy, memset, 
memcmp, ...). 
* Allow multiple processor specific (performance tuned) implementations of 
 Math library (libm) functions. 
* Allow the compiler (and assembler implementations) to use new 
instructions beyond the common powerpc subset (-mcpu=).
* Allow the compiler to tune (schedule instructions) for specific 
processor families (-mtune=). 
* Allow the tuning of various glibc functions based on the processor 
family. For example the malloc DEFAULT_MMAP_THRESHOLD should be higher on 
a POWER4/5 server. 
* Allow distros to build multiple (processor tuned) versions of the glibc 
libraries and install the correct version on the target system.

The intent is to be similar to IA32 with the i386/i486/i586/i686/i786 and 
sparc with the sparc/sparcv8/sparcv9/sparcv9b machine targets. The added 
twist for powerpc is biarch support for the powerpc32 and powerpc64 ABIs. 
So each 64-bit machine target needs a suffix to distinguish the 32- and 
64-bit ABIs. Sparc is similar with the sparc64/sparc64b machine targets 
but the issue is more pervasive in PowerPC because all POWER3/4/5 (and 
970) machines are 64-bit implementations that support both ABIs, but may 
require different tunning.

So I am proposing to add new "machine" targets to the powerpc family. The 
target names will follow the POWER3, POWER4, POWER5, ... naming of the 
current IBM Server brands and add a _32/_64 suffix to support biarch 
systems.

Retain (compatible with all existing Linux on Power systems)
 powerpc (a synonym for powerpc32)
 powerpc32
 powerpc64
And add
 power4_32
 power4_64
 power5_32
 power5_64
 ppc970_32
 ppc970_64
Or alternatively
 powerpc32_power4
 powerpc64_power4 
 powerpc32_power5
 powerpc64_power5
 powerpc32_970
 powerpc64_970 I see no need to support a separate (from existing 
powerpc32/64)  POWER3 and RS64IV targets at this time.  The POWER3 systems 
are quite old and the RS64IV systems are "strongly storage consistent" 
machines.  The POWER4, POWER5, and PPC970 processors allow "weak storage 
consistency" and are more aggressively piped for out-of-order instruction 
execution. This is difference requires very different instruction 
scheduling for optimal performance. 
Glibc and other package changes

The changes needed to enable additional targets for glibc include:

* Add the new machine targets to ./scripts/config.sub (and in autoconf)
* Update the base_machine and machine mapping for the new targets in 
./configure.in
* Add the new target patterns to ./shlib-versions and 
./ntpl/shlib-versions
* Provide additional ./scripts/data/c++-types-power*-linux-gnu.data files 
to match the new machine targets.
* Update the ./abilist/* files to cover the new machine targets.

The various targets need to be represented in the CVS directory structure 
of glibc. Each of the new targets we are proposing support both 32- and 
64-bit mode compatible with the current powerpc32 or powerpc64 targets. 

So the current directory structure will be extended above the current 
powerpc[32|64] directories. For example: directory 
./sysdeps/powerpc/powerpc32 contains 32-bit implementations common to 
powerpc, while ./sysdeps/powerpc/powerpc32/power4 contains 32-bit 
implementations that can use instructions or optimizations available on 
POWER4 processors. Similarly for 64-bit; 
./sysdeps/powerpc/powerpc64/power4. And finally the directory 
./sysdeps/powerpc/powerpc32/powerpc64 could contain 32-bit code that uses 
instructions only available on 64-bit powerpc implementations.

The config.guess script is a bit problematic but not strictly required to 
support this proposal. Config.guess depends on "uname --machine" to guess 
the machine target. However the powerpc64 kernel currently reports "ppc64" 
for all models. So without changes to the kernel to report different 
machine strings or enhance the uname command to report useful "-processor" 
data, updating config.guess is mote. This is not critical as a biarch 
glibc build should not depend on config.quess anyway and other projects 
will be safe with the default powerpc/powerpc64 targets.

Finally we need to provide more information in the Aux Vector AT_HWCAP. 
The AT_HWCAP is used by rpm to select libraries to match the processor at 
install (at least for i[34567]86 Linux systems). We will need to add 
AT_HWCAP flags to allow rpm to do the same for powerpc.

Detail discussion

Note: I am ignoring the little-endian variants powerpcle/powerpc64le 
because I don't know of any one building those for Linux.

Note: I am not ignoring the Apple G5 in this discussion. The IBM970 chip 
core is derived from the POWER4+ core, so any tuning (-mtune=power4) for 
POWER4 benefits the G5 for both 32- and 64-bit applications. But this 
tuning would not benefit 32-bit applications running on a G3's or G4's. 
The processors (G3 vs G4 vs G5) are from different manufacturers and have 
very different internal structures (micro-architectures). 

The ppc970 processor raises an interesting question. If the ppc970 
resembles the POWER4+, do we need separate (from power4) target for 
ppc970? The ppc970 is a 64-bit implementation based on the POWER4+, with 
the addition of the Altivec vector SIMD instructions (two additional 
execution pipelines). Our analysis is that glibc (libc, libm, libpthread, 
...) would not benefit from direct exploitation of the Altivec instruction 
set. So a power4 target would be enough for glibc. 

While our current proposal is focused on glibc, other libraries/projects 
(gd, jpeg, libtiff, mad, ...) might benefit from using Altivec. This will 
become more attractive in the gcc-4.1 timeframe where autovectorization 
will be fully functional. So we should add ppc970 targets for 
completeness. 

In the PowerPC Architecture there are several FPU instructions that are 
listed as "optional" but implemented on all current 64-bit hardware. There 
are also instructions that are defined only for 64-bit hardware and usable 
in 32-bit mode. 

Optional Instructions:
Store Floating-Point as Integer Word Indexed (stfiwx)
Floating Square Root (fsqrt)
Floating Square Root Single (fsqrts)
Floating Reciprocal Estimate Single (fres)
Floating Reciprocal Square Root Estimate (frsqrte)
Floating Select (fsel)
64-bit hardware only instructions, usable in 32-bit mode:
Floating Convert To Integer Doubleword (fctid)
Floating Convert To Integer Doubleword with round toward Zero (fctidz)
Floating Convert From Integer Doubleword (fcfid)
Instructions added for POWER5:
Bytewise popcount (popcntbd)
Floating Reciprocal Estimate Double (fre)
Data Cache Block Flush Local (dcbfl)

With the current generic powerpc targets these instruction are not 
generated by gcc.

We also need to identify the processor type from the AT_HWCAP aux vector. 
For example we could use the following:

HWCAP bits
Processor type
PPC_FEATURE_POWER4
 power4
PPC_FEATURE_POWER5
 power5
PPC_FEATURE_POWER4 + PPC_FEATURE_HAS_ALTIVEC
 ppc970

Note: PPC_FEATURE_64 is an existing bit that is set for all 64-bit powerpc 
kernels. PPC_FEATURE_HAS_ALTIVEC is an existing bit that is set for the 
970 processors. 

One problem remains. For glibc at least the i[34567]86 and 
sparc[v8,v9,v9b] targets allow for customized assembler implements for 
each variant, but this does not result in adjustments on the gcc -mcpu, 
-mtune options. The directory structure above does allow the opportunity 
to add Makefile fragments in architecture specific directories. For 
example; add a Makefile fragment to ./sysdeps/powerpc/powerpc64/power5 
with the line:

+cflags += -mcpu=power4 -mtune=power5
These +cflags options are applied to all *.c complies in the builds but 
not *.S compiles. This would allow gcc to use all instructions available 
in the PowerPC architecture and instruction scheduling appropriate for the 
POWER5 processor. The resulting code would still be portable to POWER4 and 
PPC970 systems.

Another example, add a Makefile fragment to 
./sysdeps/powerpc/powerpc64/ppc_970 with the line:

+cflags += -mcpu=970 -maltivec -mabi=altivec
This would enable the full PowerPC instruction architecture plus 
VMX/Altivec. GCC would be allowed to use VMX instructions even in code 
that did not explicitly use altivec.h types via autovectorization. The 
resulting libraries would not be portable to POWER4/5 systems but would be 
optimized for the 970 (IBM JS20 and Apple G5).


Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]