====
MIPS non-PIC ABI specification
====

Introduction
----

This document describes the specification of the new MIPS ABI to provide
absolute (non-PIC) addressing as used for Linux applications on most
architectures.  MIPS currently uses the existing psABI that mandates
compilation of applications as position-independent code.

The intention is that this extension to the ABI will be a strict
superset of the existing MIPS o32 psABI for non-PIC executables, and will
not break compatibility with legacy PIC object files, allowing
interlinking of new-model and legacy object files both statically and
dynamically (apart from ld.so, of course).

This document does not cover n32 and n64 ABIs; they are expected to be
a straightforward extension of the same design.

At this time we do not propose any change to the position-independent
addressing conventions used by shared objects. Similarly,
position-independent executables compiled with '-fpie' -- as required
for address space randomisation in "hardened" Linux distributions --
shall continue to use the existing psABI addressing and calling
mechanisms.

Identification of Object Files
----

Object files which use this new ABI extension will need to be
identifiable. They will have EF_MIPS_CPIC set and EF_MIPS_PIC
clear in the ELF header's e_flags field. The dynamic linker can
identify new-model executables which use the PLT mechanism by the
existence of DT_JMPREL tag in the dynamic table. It is also suggested
that the EI_ABIVERSION entry in the ELF header ident be incremented
from 0 to 1 for such executables, so that existing dynamic linkers
will refuse to link them, and display a "helpful" error message rather
than linking them incorrectly and having the application crash.

[Ed. note: this does not actually work with glibc's ld.so for
executables; it does not check the ABI version of the executable, or
checks it too late.]

Procedure Linkage Table
----

The Procedure Linkage Table (PLT) consists of a set of stubs generated
by the static linker to stand in for external functions that are in a
shared object. They can be called using an absolute JAL instuction and
then redirect the call from the executable to the actual function via
a pointer in the PLT GOT (the .got.plt section which holds 32-bit
function pointers only).

The PLT is output to the .plt section, which section should be aligned
to a 32 byte boundary so that all PLT entries occupy no more than one
cache line.

The PLT GOT holds function addresses used by the PLT stubs, and the
PLT GOT entries shall be initialised by the static linker to point to
the PLT header (i.e. the base of the .plt section).  In this way the
first call to an external function will invoke the dynamic linker to
resolve the symbol and update the corresponding PLT GOT entry; the
next call will then jump from the PLT straight to the function,
avoiding the dynamic linker.

In the existing version of the ABI, as implemented by glibc,
the first two GOT entries are reserved:

  GOT[0]
	Pointer to dynamic linker's GOT resolver which takes a dynamic
	symbol index argument.

  GOT[1]
	Pointer to this object's link map

In this ABI, the GOT layout will remain the same.  The first two entries
in the PLT GOT will be reserved as follows:

  PLTGOT[0]
	Pointer to dynamic linker's PLT resolver (which takes a PLT
	index argument instead of the dynamic symbol index used by the
	GOT resolver).

  PLTGOT[1]
	Pointer to this object's link map.

Since PLT entries use absolute addresses to access the PLT GOT, the
PLT GOT does not need to be located within 32K of the _gp symbol.
Indeed it would be better to prevent the PLT GOT from occupying this
scarce resource in the address map.  There is no requirement for the
PLT GOT and GOT to be consecutive.

For each PLT entry a R_MIPS_JUMP_SLOT relocation entry shall be output
to the dynamic .rel.plt section: the relocation entry's dynamic symbol
index specifies the symbol to which the PLT entry refers, and the
offset field holds the address of the PLT entry. An addend is never
required (so we remain with REL relocs).  The PLT index passed
by the PLT to the dynamic linker is both an index into the array of
jump slot relocations, and can be transformed into
an index into the PLT GOT by adding two (corresponding to the
reserved PLT resolver and link map slots at PLTGOT[0] and
PLTGOT[1]).  Dynamic
symbol table entries referenced only by jump slot or copy relocations
shall precede the "GOT mapped" symbols whose first index is specified
by the DT_MIPS_GOTSYM dynamic table entry.

PLT Header
----

The first entry in the PLT handles the first call to a PLT only, and
is 32 bytes in size::

  PLT0:	lui	gp, %hi(.got.plt)		# linker needs address of 
  	addiu	gp, %lo(.got.plt)		#  .got.plt to find link map
  	lw	t9, 0(gp)			# PLTGOT[0] == &_dl_runtime_pltresolve()
  	move	t7,ra				# linker needs caller's address
  	jalr	t9				# call _dl_runtime_pltresolve()
  	nop					# bdslot
   	nop					# spare
  	nop					# spare

PLT Type A
''''

If the maximum PLT index is less than or equal to 65535, then a
minimum length PLT of 16 bytes can be generated::

  PLT1:	lui	t7, %hi(%pltgot(name1))	# high PLT GOT pointer
  	lw	t9, %lo(%pltgot(name1))(t7)	# load func pointer from PLT GOT
  	ori	t8, $0, index1			# load plt index (ldslot)
  	jr	t9				# jump to func
  PLT2:	lui	t7, %hi(%pltgot(name2)		# (bdslot)
  	lw	t9, %lo(%pltgot(name2))(t7)
  	ori	t8, $0, index2
  	jr	t9
  PLT3:	...
  PLTn:	nop; nop; nop; nop

(Note that this is effectively pseudocode; the assembler does not need
modifying to understand "%pltgot(...)" since these instructions will
be directly written out by the linker.)

PLT Type B
''''

When the maximum PLT index is greater than 65535, a large PLT is
required, rounded up to 32 bytes in length::

  PLT1:	lui	t7, %hi(%pltgot(name1))		# high PLT GOT pointer
  	lw	t9, %lo(%pltgot(name1))(t7)	# load func pointer from PLT GOT
  	lui	t8, index1>>16			# load hi plt index (ldslot)
  	jr	t9				# jump to func
  	ori	 t8, t8, index1&0xffff		# load lo plt index (bdslot)
  	nop
  	nop
  	nop

Writable PLT Fixup
----

PLT Type C
''''

After resolving the symbol and updating the PLT GOT, then if the PLT
is in a writable section, the dynamic linker shall patch the PLT to use
the absolute address of the function, thereby avoiding the PLT GOT
reference, as follows. The dynamic linker can detect a writable PLT by
the existence of a non-null DT_MIPS_RWPLT entry in the dynamic table::

  PLT1:	lui	t9, %hi(name1)
  	addiu	t9, %lo(name1)
  	jr	t9
  	nop

PLT Type D
''''

Furthermore if the address at which the function is loaded lies within
the same 256MB segment as the PLT entry, then it can avoid the
indirect jump also::

  PLT1:	lui	t9, %hi(name1)
  	j	name1
  	addiu	t9, %lo(name1)
  	nop

Note that the base MIPS32 and MIPS64 MMU does not provide a
"no-execute" bit, and therefore cannot support the "least privilege"
page protection model required by "Hardened" Linux features such as
Exec Shield and PAX. [Actually the SmartMIPS ASE specifies the
execute-inhibit (XI) bit, but that's only available in the 4KSd core.]
However the static linker should be capable of generating a
non-writable (secure) PLT and GOT to conform with SELinux
restrictions, and on a SmartMIPS core this could be used to prevent
writable data areas from becoming executable. This would be at the
cost of some loss of performance for external function calls.

Function addresses
----

To allow comparison of function addresses to work as expected, it is
necessary for the executable and all shared objects to see the same
function address. If the executable takes the address of an external
function it will generate a PLT entry for that function, and that PLT
entry must then be the canonical address for the function throughout
the program.

Taking the address of an external function in a non-PIC executable
will result in a symbol table entry with type STT_FUNC and section
index of SHN_UNDEF, but with a non-zero st_value field that holds the
address of the function's PLT entry; furthermore the new STO_MIPS_PLT
bit shall be set in the symbol's st_other field. If the function's
address is not referenced (i.e. the function is only ever called by
the executable), then the symbol's st_value field will be zero and the
STO_MIPS_PLT bit clear.

The dynamic linker will use an undefined function symbol table entry
with STO_MIPS_PLT set to resolve all references to that symbol in
preference to the actual definition of that symbol, except when
resolving an R_MIPS_JUMP_SLOT relocation.

Note that this is the opposite behaviour to the legacy MIPS psABI
where an undefined function symbol table entry with a zero st_value
field indicates that there is an address reference to the function and
the dynamic linker must resolve the symbol immediately upon loading;
and where undefined function entries are always ignored when searching
for a symbol definition.

Dynamic Section
----

Dynamic section entries give information to the dynamic linker. Some
of the information is processor-specific, including the interpretation
of some entries in the dynamic structure. The following new or changed
dynamic table entries are required by the extended ABI:

  DT_JMPREL (23)
	Previously unused for MIPS, now points to the first jump-slot
	relocation in the dynamic relocation table (i.e. the base of
	.rel.plt).

  DT_PLTREL (20)
	Previously unused for MIPS, now with a value of DT_REL indicating
	that DT_JMPREL points to REL relocations.

  DT_PLTRELSZ (2)
	Previously unused for MIPS, now holding the size of .rel.plt in
	bytes.

  DT_MIPS_PLTGOT (0x70000032)
	(New) Points to the base of the PLT GOT (.got.plt section),
	since it may not be contiguous with the traditional GOT (.got
	section). The standard DT_PLTGOT entry points to the base of
	the GOT.

  DT_MIPS_RWPLT (0x70000034)
	(New) Points to the base of the PLT when the PLT is writable;
	for a non-writable PLT it is omitted or has a zero value.

The dynamic symbol table may have undefined function entries with the
following bit set in the st_other field:

  STO_MIPS_PLT (0x8)
	 (New) Symbol value is the address of a PLT entry.

The dynamic relocation table may now contain two new relocation types
generated by the static linker:

  R_MIPS_COPY (126)
	 A data copy relocation.

  R_MIPS_JUMP_SLOT (127)
	 A PLT relocation.

External Data
----

If a non-PIC executable contains a reference to a data symbol in a
shared object, then the static linker shall allocate space for that
symbol in the executable's writable .dynbss (or .dynsbss) section, and
output an R_MIPS_COPY relocation entry to the dynamic relocation
section.  The offset field of the relocation entry gives the address
of the data in the .dynbss section. During execution the dynamic
linker will copy any initial data associated with the shared object's
symbol to the location specified by the offset, and point all GOT
entries that refer to that symbol to the executable's copy.

Large Code Size
----

The 26-bit offset of a MIPS absolute JAL and J instruction would limit
the executable's code (including the PLT) to fit in a single 256MB
address segment. That's sufficient for most embedded applications, but
could be exceeded by some larger "server" applications. This may be
handled by explicitly compiling large applications with '-mlongcalls'.

A more elegant solution would be for the linker to automatically
insert trampolines when a call site and the function (or its PLT) are
not within the same 256MB segment, similar to the mechanism used for
the PPC32 architecture.  This may be implemented at a later date and
has no ABI implications.

Small Data
----

An optimisation available to statically-linked "bare iron"
applications is to place data with size no greater than some threshold
(default 8 bytes) in a small data section, where it can be referenced
using short offsets from the $gp register. In Dhrystone the lack of
small data addressing accounts for approximately one eighth of the 30%
performance differential between bare-iron and Linux.

Enabling small data addressing for non-PIC executables will enable
some but not all of this performance to be regained, particularly in
functions which reference many small global variables. Because shared
libraries use the $gp register to hold their GOT pointer, the register
will not be constant throughout the application, so the compiler must
reload the small data pointer whenever required by a function. Note
that "small" external data must be allocated in the executable's
.dynsbss section, instead of the .dynbss section.

Since this is a local optimisation the compiler may use an arbitrary
register to hold the small data pointer: it could be any call-clobbered
register, or a call-saved register if its use crosses a function
call. 

The compiler might choose not to use a small data pointer register if
it can determine that there is only one reference to small data in a
function, in which case it will be faster to use an absolute
address. 

For non-PIC executables the compiler may now consider $gp to be a
call-clobbered register that it is free to allocate for any purpose.

Legacy psABI support
----

While new-model code will use the PLT to reference external functions,
any legacy PIC code with which it is statically linked should continue
to use the linker-generated call stubs in the .MIPS.stubs section,
rather than referencing the new-model PLT. This is to avoid the
penalty of a double indirection when calling the function:
i.e. calling indirectly via the GOT to the PLT, and then the PLT
calling the actual function via the PLT GOT.

The exception to this is if the non-PIC code references the same
function, in which case the PIC code must generate a local GOT entry
which points to the associated PLT entry. [A possible optimisation, if
we are willing to have both a PLT GOT and GOT entry referencing the
same function, is to only point the GOT to the PLT only if there are
relocations other than R_MIPS_26, R_MIPS_CALL16 or R_MIPS_GOT16
referencing the function, and otherwise use a global GOT entry
pointing directly to the function.]

Similarly for access to external data, if the non-PIC code generates
an R_MIPS_COPY relocation for a symbol, then PIC code referencing the
same symbol must allocate a local GOT entry pointing to the
executable's copy of the data in .dynbss or .dynsbss. Otherwise a
global GOT entry shall be allocated to point to the symbol.

Finally, if the non-PIC executable references a function in the
statically-linked PIC code, then it will be necessary for the linker
to allocate a call stub which first loads the $t9 register with the
function's address, for use by non-PIC caller.  The call stub would
look like PLT style C or D above, and could be allocated in the PLT or
.MIPS.stubs section, or any other part of the text section. If the
function is globally binding, and is referenced by a non-PIC, non-call
relocation, then its symbol table entry must point to the call stub,
so that the stub is the canonical address of the function.