This is the mail archive of the
mailing list for the binutils project.
QNX PIC for Mips - RFC
- From: "Graeme Peterson" <gp at qnx dot com>
- To: binutils at sources dot redhat dot com
- Date: Wed, 24 Jul 2002 14:14:18 -0400 (EDT)
- Subject: QNX PIC for Mips - RFC
I am working on rolling in the QNX Neutrino support
for arm, mips, ppc, and sh4.
Here is a doc describing the QNX PIC convention for
Mips. Any and all feedback appreciated. I am particularly
interested in how best to implement this for submission.
This is a preliminary document describing the calling convention
used by PIC code on QNX/Neutrino running on the MIPS.
The MIPS ABI describes a calling-convention for implementing Position-
Independent Code ('PIC'). While the ABI calling convention is well-established,
it has a couple of drawbacks which make it less than ideal for use in an
embedded environment. These are listed below:
1. Because of the way an ABI PIC function determines the address of its
Global Offset Table ('GOT'), it requires its own address to be passed
in register $25 on function entry. The practical consequence of this
requirement is that all functions (in both the PIC libraries
and the executable) have to do indirect calls, i.e. calls through
register $25. This means that all code must be compiled PIC (and pay the size
penalty of PIC code).
2. The GCC compiler/assembler does not do a great job at code generation for
MIPS 'abicalls' code (PIC code). Unnessecary NOP's are being inserted, and the
function prologue always contains the code to compute the GOT address, even
if there were no GOT references within that function.
The first point was particularly troublesome, as it meant that all applications
using a shared library would have to be compiled PIC, which results in
a significant code size increase.
For both of these reasons, we decided that simple modifications to the
calling convention could solve the first problem. While coding the
new calling convention in GCC, we also implemented various optimizations, which
reduced the code expansion of PIC code. The following sections describe the
new MIPS PIC convention, hereafter called "QNX PIC".
B. QNX PIC calling convention on MIPS
The calling convention for PIC code follows the ABI spec for register
assignement, stack layout and parameter passing. However, it differs from
the ABI in the following respects:
1. PIC code should never damage the gp ($28) register.
2. PIC code reserves register s7 ($23) to store the address of its GOT.
All symbol references within that PIC module ("library") are made
through the GOT, and are thus addressed as offsets from s7.
3. Every PIC function which needs to access a symbol from the GOT should
load register s7 at the end of the function prologue, before any GOT symbols
are accessed. The code used to load s7 with the address of the GOT is as
0: lui $s7, %gothi
addiu $s7, $s7, %gotlo
add $s7, $s7, $ra
The %gothi / %gotlo pair are special relocations output by the assembler.
Since the above code implicitely destroys $ra and $s7, they must be saved
in the function prologue prior to the loading of the GOT.
4. All function calls from a PIC function have to be indirect calls, done
through a register. However, this does not have to be $25 as in ABI PIC
la $t3, printf
Note that the notation "printf@got" simply means "offset of address
entry for printf in GOT".
5. All global data references also have to be done through the got, i.e.:
With the changes above, QNX PIC code is truly relocatable, and does not
require the calling code to be compiled PIC. Thus, the non-library code
(the "executables") can be normally-compiled MIPS objects.
In order for the executable and the library to share global data, we
must define a new copy relocation type. This is similar to what is
already defined in the X86 and PPC ABIs. The new relocation is defined
#define R_MIPS_QNX_COPY 126
An R_MIPS_QNX_COPY relocation is emitted by the linker whenever a data symbol
defined in a shared library is used in an executable. It results in space
being allocated for this symbol in the executable's bss. At process startup,
the dynamic linker copies the data from the library to the process, and
ensures that all library code points to the executable's copy of the
D. Calling library functions from the main executable
Calling functions in the library from non-PIC code (i.e. from the main
executable) must be done through stubs. These are generated automatically
by the linker for any function that is located in a shared library
and is called by the main executable. The stub's purpose is to load that
function's address from the executable's GOT, and then jump to the
function. For example, if the executable calls printf(), then the following
stub will be generated (and the executable will actually call this stub
instead of directly calling printf):
lw $25, printf@got($gp)
E. Toolchain modifications
In order to implement QNX PIC code generation, the following modifications
to the toolchain were needed:
Modify cc1 so that, when the -mqnxpic option is passed, it generates
code which follows the above calling convention. Note that the code
to compute the GOT address in the function prologue
is generated by the assembler. The compiler
outputs the ".cpload" pseudo-op, which the assembler expands.
The compiler also instructs the assembler to generate QNX PIC
code by emmitting the ".set qnxpiccalls" at the beginning of
every assembly file. An example of
cc1 output for QNX PIC code is shown below:
.file 1 "test.c"
.frame $fp,72,$31 # vars= 32, regs= 4/0, args= 24, extra= 0
.cpload $31 # Psuedo-op to load GOT ptr into s7
Thus, registers which need to be saved are pushed on the stack
in the function prologue, including $ra and $s7 which are destroyed
by the ".cpload" pseudo-op.
The GNU assembler ("GAS") was also modified to generate QNX PIC code.
As mentionned above, the ".set qnxpiccalls" pseudo-op can be used to
indicate to the assembler that QNX PIC code is being generated. The
assembler will also expand the ".cpload" pseudo-op into the right
code sequence (including the appropriate relocations).
The assembler's behavior with respect to global symbols
defined in the current source file was modified. The default
behavior is for the assembler to emit a single "section" GOT symbol
for the file's global data, and compute address of the data symbols
as offsets from that section symbol. This has the advantage of saving
GOT entries for global symbols which are only used in the source file
where they are defined, but has the disadvantage that it is impossible
to override which copy of a given global symbol that source file
point to. Thus, when several libraries define the same data symbol,
it may not be possible to have all functions point to the same copy
of that symbol. In the case of QNX PIC code, all global symbols
get a distinct GOT entry, which solves that problem.
Modifications were also done so that GAS did not emit unneccesary
nop's when generating code for mips2+ CPU's. Other optimizations also
included replacing the "nop" in the ".cpload" pseudo-op by an appropriate
op-code, if one was found in the function prologue. The output from GAS
for the above assembly code is shown below:
sw $s0,56($sp) # Assembler optimization
0: lui $s7,0x0 # GOTHI
addiu $s7,$s7,0 # GOTLO
lw $s0,0($s7) # GOT16: offset of printf in GOT
Modifications were also done to 'ld', the GNU linker. These include
generating the R_MIPS_QNX_COPY relocations. The second was to have
the linker generate the proper stubs.
E. Toolchain optimizations
GCC code generation was optimized in several ways:
- Calls to static functions within the same modules are done
using a branch ('bal') instead of a jump. This is implicitely
- Do not output the .cpload ipseudo-op (to load the GOT address into
s7) for functions that do not require it. This includes leaf
functions that do not reference any global data, non-leaf
functions who only call themselves recursively, and functions
which only call static functions in the same module.
- Allow GCC to optimize the filling of the branch delay slot for
QNX PIC code.
- Have GCC explicitely load funtion adresses into a register and
do jumps through that register, instead of having the assembler
expand this. This allows GCC to do commom subexpression
elimination of function adresses, and also allows the GCC
scheduler to do the address load a few cycles
before the jump.