This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
QNX PIC for Mips - RFC

From: "Graeme Peterson" <gp at qnx dot com>
To: binutils at sources dot redhat dot com
Date: Wed, 24 Jul 2002 14:14:18 -0400 (EDT)
Subject: QNX PIC for Mips - RFC
Hi, all.

I am working on rolling in the QNX Neutrino support 
for arm, mips, ppc, and sh4.  

Here is a doc describing the QNX PIC convention for
Mips.  Any and all feedback appreciated.  I am particularly
interested in how best to implement this for submission.

Thanks.
GP

=============================


This is a preliminary document describing the calling convention 
used by PIC code on QNX/Neutrino running on the MIPS.



A. Introduction

The MIPS ABI describes a calling-convention for implementing Position-
Independent Code ('PIC'). While the ABI calling convention is well-established, 
it has a couple of drawbacks which make it less than ideal for use in an
embedded environment. These are listed below:

1. Because of the way an ABI PIC function determines the address of its 
Global Offset Table ('GOT'), it requires its own address to be passed
in register $25 on function entry. The practical consequence of this 
requirement is that all functions (in both the PIC libraries 
and the executable) have to do indirect calls, i.e. calls through
register $25. This means that all code must be compiled PIC (and pay the size
penalty of PIC code).

2. The GCC compiler/assembler does not do a great job at code generation for 
MIPS 'abicalls' code (PIC code). Unnessecary NOP's are being inserted, and the
function prologue always contains the code to compute the GOT address, even
if there were no GOT references within that function.

The first point was particularly troublesome, as it meant that all applications
using a shared library would have to be compiled PIC, which results in
a significant code size increase.

For both of these reasons, we decided that simple modifications to the 
calling convention could solve the first problem. While coding the
new calling convention in GCC, we also implemented various optimizations, which
reduced the code expansion of PIC code. The following sections describe the
new MIPS PIC convention, hereafter called "QNX PIC".


B. QNX PIC calling convention on MIPS

The calling convention for PIC code follows the ABI spec for register 
assignement, stack layout and parameter passing. However, it differs from
the ABI in the following respects:

1. PIC code should never damage the gp ($28) register.

2. PIC code reserves register s7 ($23) to store the address of its GOT.
   All symbol references within that PIC module ("library") are made 
   through the GOT, and are thus addressed as offsets from s7. 

3. Every PIC function which needs to access a symbol from the GOT should
   load register s7 at the end of the function prologue, before any GOT symbols
   are accessed. The code used to load s7 with the address of the GOT is as 
   follows:

   bltzal 	$0,0
   nop
0: lui		$s7, %gothi
   addiu	$s7, $s7, %gotlo
   add		$s7, $s7, $ra

   The %gothi / %gotlo pair are special relocations output by the assembler.

   Since the above code implicitely destroys $ra and $s7, they must be saved 
   in the function prologue prior to the loading of the GOT. 

4. All function calls from a PIC function have to be indirect calls, done
   through a register. However, this does not have to be $25 as in ABI PIC
   code:

   la 	$t3, printf
   jalr	$t3 

   which becomes:

   lw	$t3,printf@got($s7)
   jalr	$t3

   Note that the notation "printf@got" simply means "offset of address 
   entry for printf in GOT".

5. All global data references also have to be done through the got, i.e.:

   lw	$t1,myglobal@got($s7)
   lw	$t0,0($t1)

With the changes above, QNX PIC code is truly relocatable, and does not
require the calling code to be compiled PIC. Thus, the non-library code 
(the "executables") can be normally-compiled MIPS objects.


C. Relocations

In order for the executable and the library to share global data, we
must define a new copy relocation type. This is similar to what is
already defined in the X86 and PPC ABIs. The new relocation is defined
as follows:

#define R_MIPS_QNX_COPY 126

An R_MIPS_QNX_COPY relocation is emitted by the linker whenever a data symbol
defined in a shared library is used in an executable. It results in space
being allocated for this symbol in the executable's bss. At process startup,
the dynamic linker copies the data from the library to the process, and 
ensures that all library code points to the executable's copy of the
symbol.


D. Calling library functions from the main executable

Calling functions in the library from non-PIC code (i.e. from the main 
executable) must be done through stubs. These are generated automatically 
by the linker for any function that is located in a shared library
and is called by the main executable. The stub's purpose is to load that
function's address from the executable's GOT, and then jump to the 
function. For example, if the executable calls printf(), then the following
stub will be generated (and the executable will actually call this stub
instead of directly calling printf):

	printf_stub:
		lw	$25, printf@got($gp)
		jr	$25
		nop



E. Toolchain modifications

In order to implement QNX PIC code generation, the following modifications 
to the toolchain were needed:

1. CC1:
	Modify cc1 so that, when the -mqnxpic option is passed, it generates
	code which follows the above calling convention. Note that the code
	to compute the GOT address in the function prologue 
	is generated by the assembler. The compiler
	outputs the ".cpload" pseudo-op, which the assembler expands. 
	The compiler also instructs the assembler to generate QNX PIC 
	code by emmitting the ".set qnxpiccalls" at the beginning of 
	every assembly file. An example of
	cc1 output for QNX PIC code is shown below:

__________________________________________ 
	.file	1 "test.c"
	.qnxpiccalls
gcc2_compiled.:
__gnu_compiled_c:
	.globl	main
	.ent	main
main:
	.frame	$fp,72,$31		# vars= 32, regs= 4/0, args= 24, extra= 0
	subu	$sp,$sp,72
	sw	$ra,68($sp)
	sw	$fp,64($sp)
	sw	$s7,60($sp)
	sw	$s0,56($sp)
	move	$fp,$sp
	.cpload $31			# Psuedo-op to load GOT ptr into s7
	la	$16,printf
	jal	$31,$16
__________________________________________
	
	Thus, registers which need to be saved are pushed on the stack
	in the function prologue, including $ra and $s7 which are destroyed
	by the ".cpload" pseudo-op.

2. GAS
	The GNU assembler ("GAS") was also modified to generate QNX PIC code.
	As mentionned above, the ".set qnxpiccalls" pseudo-op can be used to
	indicate to the assembler that QNX PIC code is being generated. The
	assembler will also expand the ".cpload" pseudo-op into the right 
	code sequence (including the appropriate relocations). 

	The assembler's behavior with respect to global symbols 
	defined in the current source file was modified. The default
	behavior is for the assembler to emit a single "section" GOT symbol
	for the file's global data, and compute address of the data symbols
	as offsets from that section symbol. This has the advantage of saving
	GOT entries for global symbols which are only used in the source file
	where they are defined, but has the disadvantage that it is impossible
	to override which copy of a given global symbol that source file
	point to. Thus, when several libraries define the same data symbol, 
	it may not be possible to have all functions point to the same copy 
	of that symbol. In the case of QNX PIC code, all global symbols
	get a distinct GOT entry, which solves that problem.
	
	Modifications were also done so that GAS did not emit unneccesary
	nop's when generating code for mips2+ CPU's. Other optimizations also
	included replacing the "nop" in the ".cpload" pseudo-op by an appropriate
	op-code, if one was found in the function prologue. The output from GAS
	for the above assembly code is shown below:

--------------------------------------
	addiu	$sp,$sp,-72
	sw		$ra,68($sp)
	sw		$fp,64($sp)
	sw		$s7,60($sp)
	bltzal	$zero,0f
	sw		$s0,56($sp)			# Assembler optimization
0:	lui		$s7,0x0				# GOTHI
	addiu	$s7,$s7,0				# GOTLO
	addu	$s7,$s7,$ra
	lw		$s0,0($s7)			# GOT16: offset of printf in GOT
	jalr	$s0
---------------------------------------


3. LD
	Modifications were also done to 'ld', the GNU linker. These include
	generating the R_MIPS_QNX_COPY relocations. The second was to have
	the linker generate the proper stubs.  


E. Toolchain optimizations

	GCC code generation was optimized in several ways:

	- Calls to static functions within the same modules are done 
	using a branch ('bal') instead of a jump. This is implicitely 
	position-independent.

	- Do not output the .cpload ipseudo-op (to load the GOT address into 
	s7) for functions that do not require it. This includes leaf 
	functions that do not reference any global data, non-leaf 
	functions who only call themselves recursively, and functions 
	which only call static functions in the same module.

	- Allow GCC to optimize the filling of the branch delay slot for
	QNX PIC code.

	- Have GCC explicitely load funtion adresses into a register and 
	do jumps through that register, instead of having the assembler 
	expand this. This allows GCC to do commom subexpression 
	elimination of function adresses, and also allows the GCC 
	scheduler to do the address load a few cycles 
	before the jump.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]