==== MIPS non-PIC ABI specification ==== Introduction ---- This document describes the specification of the new MIPS ABI to provide absolute (non-PIC) addressing as used for Linux applications on most architectures. MIPS currently uses the existing psABI that mandates compilation of applications as position-independent code. The intention is that this extension to the ABI will be a strict superset of the existing MIPS o32 psABI for non-PIC executables, and will not break compatibility with legacy PIC object files, allowing interlinking of new-model and legacy object files both statically and dynamically (apart from ld.so, of course). This document does not cover n32 and n64 ABIs; they are expected to be a straightforward extension of the same design. At this time we do not propose any change to the position-independent addressing conventions used by shared objects. Similarly, position-independent executables compiled with '-fpie' -- as required for address space randomisation in "hardened" Linux distributions -- shall continue to use the existing psABI addressing and calling mechanisms. Identification of Object Files ---- Object files which use this new ABI extension will need to be identifiable. They will have EF_MIPS_CPIC set and EF_MIPS_PIC clear in the ELF header's e_flags field. The dynamic linker can identify new-model executables which use the PLT mechanism by the existence of DT_JMPREL tag in the dynamic table. It is also suggested that the EI_ABIVERSION entry in the ELF header ident be incremented from 0 to 1 for such executables, so that existing dynamic linkers will refuse to link them, and display a "helpful" error message rather than linking them incorrectly and having the application crash. [Ed. note: this does not actually work with glibc's ld.so for executables; it does not check the ABI version of the executable, or checks it too late.] Procedure Linkage Table ---- The Procedure Linkage Table (PLT) consists of a set of stubs generated by the static linker to stand in for external functions that are in a shared object. They can be called using an absolute JAL instuction and then redirect the call from the executable to the actual function via a pointer in the PLT GOT (the .got.plt section which holds 32-bit function pointers only). The PLT is output to the .plt section, which section should be aligned to a 32 byte boundary so that all PLT entries occupy no more than one cache line. The PLT GOT holds function addresses used by the PLT stubs, and the PLT GOT entries shall be initialised by the static linker to point to the PLT header (i.e. the base of the .plt section). In this way the first call to an external function will invoke the dynamic linker to resolve the symbol and update the corresponding PLT GOT entry; the next call will then jump from the PLT straight to the function, avoiding the dynamic linker. In the existing version of the ABI, as implemented by glibc, the first two GOT entries are reserved: GOT[0] Pointer to dynamic linker's GOT resolver which takes a dynamic symbol index argument. GOT[1] Pointer to this object's link map In this ABI, the GOT layout will remain the same. The first two entries in the PLT GOT will be reserved as follows: PLTGOT[0] Pointer to dynamic linker's PLT resolver (which takes a PLT index argument instead of the dynamic symbol index used by the GOT resolver). PLTGOT[1] Pointer to this object's link map. Since PLT entries use absolute addresses to access the PLT GOT, the PLT GOT does not need to be located within 32K of the _gp symbol. Indeed it would be better to prevent the PLT GOT from occupying this scarce resource in the address map. There is no requirement for the PLT GOT and GOT to be consecutive. For each PLT entry a R_MIPS_JUMP_SLOT relocation entry shall be output to the dynamic .rel.plt section: the relocation entry's dynamic symbol index specifies the symbol to which the PLT entry refers, and the offset field holds the address of the PLT entry. An addend is never required (so we remain with REL relocs). The PLT index passed by the PLT to the dynamic linker is both an index into the array of jump slot relocations, and can be transformed into an index into the PLT GOT by adding two (corresponding to the reserved PLT resolver and link map slots at PLTGOT[0] and PLTGOT[1]). Dynamic symbol table entries referenced only by jump slot or copy relocations shall precede the "GOT mapped" symbols whose first index is specified by the DT_MIPS_GOTSYM dynamic table entry. PLT Header ---- The first entry in the PLT handles the first call to a PLT only, and is 32 bytes in size:: PLT0: lui gp, %hi(.got.plt) # linker needs address of addiu gp, %lo(.got.plt) # .got.plt to find link map lw t9, 0(gp) # PLTGOT[0] == &_dl_runtime_pltresolve() move t7,ra # linker needs caller's address jalr t9 # call _dl_runtime_pltresolve() nop # bdslot nop # spare nop # spare PLT Type A '''' If the maximum PLT index is less than or equal to 65535, then a minimum length PLT of 16 bytes can be generated:: PLT1: lui t7, %hi(%pltgot(name1)) # high PLT GOT pointer lw t9, %lo(%pltgot(name1))(t7) # load func pointer from PLT GOT ori t8, $0, index1 # load plt index (ldslot) jr t9 # jump to func PLT2: lui t7, %hi(%pltgot(name2) # (bdslot) lw t9, %lo(%pltgot(name2))(t7) ori t8, $0, index2 jr t9 PLT3: ... PLTn: nop; nop; nop; nop (Note that this is effectively pseudocode; the assembler does not need modifying to understand "%pltgot(...)" since these instructions will be directly written out by the linker.) PLT Type B '''' When the maximum PLT index is greater than 65535, a large PLT is required, rounded up to 32 bytes in length:: PLT1: lui t7, %hi(%pltgot(name1)) # high PLT GOT pointer lw t9, %lo(%pltgot(name1))(t7) # load func pointer from PLT GOT lui t8, index1>>16 # load hi plt index (ldslot) jr t9 # jump to func ori t8, t8, index1&0xffff # load lo plt index (bdslot) nop nop nop Writable PLT Fixup ---- PLT Type C '''' After resolving the symbol and updating the PLT GOT, then if the PLT is in a writable section, the dynamic linker shall patch the PLT to use the absolute address of the function, thereby avoiding the PLT GOT reference, as follows. The dynamic linker can detect a writable PLT by the existence of a non-null DT_MIPS_RWPLT entry in the dynamic table:: PLT1: lui t9, %hi(name1) addiu t9, %lo(name1) jr t9 nop PLT Type D '''' Furthermore if the address at which the function is loaded lies within the same 256MB segment as the PLT entry, then it can avoid the indirect jump also:: PLT1: lui t9, %hi(name1) j name1 addiu t9, %lo(name1) nop Note that the base MIPS32 and MIPS64 MMU does not provide a "no-execute" bit, and therefore cannot support the "least privilege" page protection model required by "Hardened" Linux features such as Exec Shield and PAX. [Actually the SmartMIPS ASE specifies the execute-inhibit (XI) bit, but that's only available in the 4KSd core.] However the static linker should be capable of generating a non-writable (secure) PLT and GOT to conform with SELinux restrictions, and on a SmartMIPS core this could be used to prevent writable data areas from becoming executable. This would be at the cost of some loss of performance for external function calls. Function addresses ---- To allow comparison of function addresses to work as expected, it is necessary for the executable and all shared objects to see the same function address. If the executable takes the address of an external function it will generate a PLT entry for that function, and that PLT entry must then be the canonical address for the function throughout the program. Taking the address of an external function in a non-PIC executable will result in a symbol table entry with type STT_FUNC and section index of SHN_UNDEF, but with a non-zero st_value field that holds the address of the function's PLT entry; furthermore the new STO_MIPS_PLT bit shall be set in the symbol's st_other field. If the function's address is not referenced (i.e. the function is only ever called by the executable), then the symbol's st_value field will be zero and the STO_MIPS_PLT bit clear. The dynamic linker will use an undefined function symbol table entry with STO_MIPS_PLT set to resolve all references to that symbol in preference to the actual definition of that symbol, except when resolving an R_MIPS_JUMP_SLOT relocation. Note that this is the opposite behaviour to the legacy MIPS psABI where an undefined function symbol table entry with a zero st_value field indicates that there is an address reference to the function and the dynamic linker must resolve the symbol immediately upon loading; and where undefined function entries are always ignored when searching for a symbol definition. Dynamic Section ---- Dynamic section entries give information to the dynamic linker. Some of the information is processor-specific, including the interpretation of some entries in the dynamic structure. The following new or changed dynamic table entries are required by the extended ABI: DT_JMPREL (23) Previously unused for MIPS, now points to the first jump-slot relocation in the dynamic relocation table (i.e. the base of .rel.plt). DT_PLTREL (20) Previously unused for MIPS, now with a value of DT_REL indicating that DT_JMPREL points to REL relocations. DT_PLTRELSZ (2) Previously unused for MIPS, now holding the size of .rel.plt in bytes. DT_MIPS_PLTGOT (0x70000032) (New) Points to the base of the PLT GOT (.got.plt section), since it may not be contiguous with the traditional GOT (.got section). The standard DT_PLTGOT entry points to the base of the GOT. DT_MIPS_RWPLT (0x70000034) (New) Points to the base of the PLT when the PLT is writable; for a non-writable PLT it is omitted or has a zero value. The dynamic symbol table may have undefined function entries with the following bit set in the st_other field: STO_MIPS_PLT (0x8) (New) Symbol value is the address of a PLT entry. The dynamic relocation table may now contain two new relocation types generated by the static linker: R_MIPS_COPY (126) A data copy relocation. R_MIPS_JUMP_SLOT (127) A PLT relocation. External Data ---- If a non-PIC executable contains a reference to a data symbol in a shared object, then the static linker shall allocate space for that symbol in the executable's writable .dynbss (or .dynsbss) section, and output an R_MIPS_COPY relocation entry to the dynamic relocation section. The offset field of the relocation entry gives the address of the data in the .dynbss section. During execution the dynamic linker will copy any initial data associated with the shared object's symbol to the location specified by the offset, and point all GOT entries that refer to that symbol to the executable's copy. Large Code Size ---- The 26-bit offset of a MIPS absolute JAL and J instruction would limit the executable's code (including the PLT) to fit in a single 256MB address segment. That's sufficient for most embedded applications, but could be exceeded by some larger "server" applications. This may be handled by explicitly compiling large applications with '-mlongcalls'. A more elegant solution would be for the linker to automatically insert trampolines when a call site and the function (or its PLT) are not within the same 256MB segment, similar to the mechanism used for the PPC32 architecture. This may be implemented at a later date and has no ABI implications. Small Data ---- An optimisation available to statically-linked "bare iron" applications is to place data with size no greater than some threshold (default 8 bytes) in a small data section, where it can be referenced using short offsets from the $gp register. In Dhrystone the lack of small data addressing accounts for approximately one eighth of the 30% performance differential between bare-iron and Linux. Enabling small data addressing for non-PIC executables will enable some but not all of this performance to be regained, particularly in functions which reference many small global variables. Because shared libraries use the $gp register to hold their GOT pointer, the register will not be constant throughout the application, so the compiler must reload the small data pointer whenever required by a function. Note that "small" external data must be allocated in the executable's .dynsbss section, instead of the .dynbss section. Since this is a local optimisation the compiler may use an arbitrary register to hold the small data pointer: it could be any call-clobbered register, or a call-saved register if its use crosses a function call. The compiler might choose not to use a small data pointer register if it can determine that there is only one reference to small data in a function, in which case it will be faster to use an absolute address. For non-PIC executables the compiler may now consider $gp to be a call-clobbered register that it is free to allocate for any purpose. Legacy psABI support ---- While new-model code will use the PLT to reference external functions, any legacy PIC code with which it is statically linked should continue to use the linker-generated call stubs in the .MIPS.stubs section, rather than referencing the new-model PLT. This is to avoid the penalty of a double indirection when calling the function: i.e. calling indirectly via the GOT to the PLT, and then the PLT calling the actual function via the PLT GOT. The exception to this is if the non-PIC code references the same function, in which case the PIC code must generate a local GOT entry which points to the associated PLT entry. [A possible optimisation, if we are willing to have both a PLT GOT and GOT entry referencing the same function, is to only point the GOT to the PLT only if there are relocations other than R_MIPS_26, R_MIPS_CALL16 or R_MIPS_GOT16 referencing the function, and otherwise use a global GOT entry pointing directly to the function.] Similarly for access to external data, if the non-PIC code generates an R_MIPS_COPY relocation for a symbol, then PIC code referencing the same symbol must allocate a local GOT entry pointing to the executable's copy of the data in .dynbss or .dynsbss. Otherwise a global GOT entry shall be allocated to point to the symbol. Finally, if the non-PIC executable references a function in the statically-linked PIC code, then it will be necessary for the linker to allocate a call stub which first loads the $t9 register with the function's address, for use by non-PIC caller. The call stub would look like PLT style C or D above, and could be allocated in the PLT or .MIPS.stubs section, or any other part of the text section. If the function is globally binding, and is referenced by a non-PIC, non-call relocation, then its symbol table entry must point to the call stub, so that the stub is the canonical address of the function.