6.12 ld and PowerPC64 64-bit ELF Support

--stub-group-size

Long branch stubs, PLT call stubs and TOC adjusting stubs are placed by ld in stub sections located between groups of input sections. ‘--stub-group-size’ specifies the maximum size of a group of input sections handled by one stub section. Since branch offsets are signed, a stub section may serve two groups of input sections, one group before the stub section, and one group after it. However, when using conditional branches that require stubs, it may be better (for branch prediction) that stub sections only serve one group of input sections. A negative value for ‘N’ chooses this scheme, ensuring that branches to stubs always use a negative offset. Two special values of ‘N’ are recognized, ‘1’ and ‘-1’. These both instruct ld to automatically size input section groups for the branch types detected, with the same behaviour regarding stub placement as other positive or negative values of ‘N’ respectively.

Note that ‘--stub-group-size’ does not split input sections. A single input section larger than the group size specified will of course create a larger group (of one section). If input sections are too large, it may not be possible for a branch to reach its stub.

--emit-stub-syms

This option causes ld to label linker stubs with a local symbol that encodes the stub type and destination.

--dotsyms
--no-dotsyms

These two options control how ld interprets version patterns in a version script. Older PowerPC64 compilers emitted both a function descriptor symbol with the same name as the function, and a code entry symbol with the name prefixed by a dot (‘.’). To properly version a function ‘foo’, the version script thus needs to control both ‘foo’ and ‘.foo’. The option ‘--dotsyms’, on by default, automatically adds the required dot-prefixed patterns. Use ‘--no-dotsyms’ to disable this feature.

--save-restore-funcs
--no-save-restore-funcs

These two options control whether PowerPC64 ld automatically provides out-of-line register save and restore functions used by ‘-Os’ code. The default is to provide any such referenced function for a normal final link, and to not do so for a relocatable link.

--no-tls-optimize

PowerPC64 ld normally performs some optimization of code sequences used to access Thread-Local Storage. Use this option to disable the optimization.

--tls-get-addr-optimize
--no-tls-get-addr-optimize

These options control how PowerPC64 ld uses a special stub to call __tls_get_addr. PowerPC64 glibc 2.22 and later support an optimization that allows the second and subsequent calls to __tls_get_addr for a given symbol to be resolved by the special stub without calling in to glibc. By default the linker enables generation of the stub when glibc advertises the availability of __tls_get_addr_opt. Using --tls-get-addr-optimize with an older glibc won’t do much besides slow down your applications, but may be useful if linking an application against an older glibc with the expectation that it will normally be used on systems having a newer glibc. --tls-get-addr-regsave forces generation of a stub that saves and restores volatile registers around the call into glibc. Normally, this is done when the linker detects a call to __tls_get_addr_desc. Such calls then go via the register saving stub to __tls_get_addr_opt. --no-tls-get-addr-regsave disables generation of the register saves.

--no-opd-optimize

PowerPC64 ld normally removes .opd section entries corresponding to deleted link-once functions, or functions removed by the action of ‘--gc-sections’ or linker script /DISCARD/. Use this option to disable .opd optimization.

--non-overlapping-opd

Some PowerPC64 compilers have an option to generate compressed .opd entries spaced 16 bytes apart, overlapping the third word, the static chain pointer (unused in C) with the first word of the next entry. This option expands such entries to the full 24 bytes.

--no-toc-optimize

PowerPC64 ld normally removes unused .toc section entries. Such entries are detected by examining relocations that reference the TOC in code sections. A reloc in a deleted code section marks a TOC word as unneeded, while a reloc in a kept code section marks a TOC word as needed. Since the TOC may reference itself, TOC relocs are also examined. TOC words marked as both needed and unneeded will of course be kept. TOC words without any referencing reloc are assumed to be part of a multi-word entry, and are kept or discarded as per the nearest marked preceding word. This works reliably for compiler generated code, but may be incorrect if assembly code is used to insert TOC entries. Use this option to disable the optimization.

--no-inline-optimize

PowerPC64 ld normally replaces inline PLT call sequences marked with R_PPC64_PLTSEQ, R_PPC64_PLTCALL, R_PPC64_PLT16_HA and R_PPC64_PLT16_LO_DS relocations by a number of nops and a direct call when the function is defined locally and can’t be overridden by some other definition. This option disables that optimization.

--no-multi-toc

If given any toc option besides -mcmodel=medium or -mcmodel=large, PowerPC64 GCC generates code for a TOC model where TOC entries are accessed with a 16-bit offset from r2. This limits the total TOC size to 64K. PowerPC64 ld extends this limit by grouping code sections such that each group uses less than 64K for its TOC entries, then inserts r2 adjusting stubs between inter-group calls. ld does not split apart input sections, so cannot help if a single input file has a .toc section that exceeds 64K, most likely from linking multiple files with ld -r. Use this option to turn off this feature.

--no-toc-sort

By default, ld sorts TOC sections so that those whose file happens to have a section called .init or .fini are placed first, followed by TOC sections referenced by code generated with PowerPC64 gcc’s -mcmodel=small, and lastly TOC sections referenced only by code generated with PowerPC64 gcc’s -mcmodel=medium or -mcmodel=large options. Doing this results in better TOC grouping for multi-TOC. Use this option to turn off this feature.

--plt-align
--no-plt-align

Use these options to control whether individual PLT call stubs are aligned to a 32-byte boundary, or to the specified power of two boundary when using --plt-align=. A negative value may be specified to pad PLT call stubs so that they do not cross the specified power of two boundary (or the minimum number of boundaries if a PLT stub is so large that it must cross a boundary). By default PLT call stubs are aligned to 32-byte boundaries.

--plt-static-chain
--no-plt-static-chain

Use these options to control whether PLT call stubs load the static chain pointer (r11). ld defaults to not loading the static chain since there is never any need to do so on a PLT call.

--plt-thread-safe
--no-plt-thread-safe

With power7’s weakly ordered memory model, it is possible when using lazy binding for ld.so to update a plt entry in one thread and have another thread see the individual plt entry words update in the wrong order, despite ld.so carefully writing in the correct order and using memory write barriers. To avoid this we need some sort of read barrier in the call stub, or use LD_BIND_NOW=1. By default, ld looks for calls to commonly used functions that create threads, and if seen, adds the necessary barriers. Use these options to change the default behaviour.

--plt-localentry
--no-localentry

ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, and that have no requirement on r2 (the TOC/GOT pointer) or r12, and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. The optimization is attractive, with up to 40% reduction in execution time for a small function, but can result in symbol interposition failures. Also, minor changes in a shared library, including system libraries, can cause a function that was localentry:0 to become localentry:8. This will result in a dynamic loader complaint and failure to run. The option is experimental, use with care. --no-plt-localentry is the default.

--power10-stubs
--no-power10-stubs

When PowerPC64 ld links input object files containing relocations used on power10 prefixed instructions it normally creates linkage stubs (PLT call and long branch) using power10 instructions for @notoc PLT calls where r2 is not known. The power10 notoc stubs are smaller and faster, so are preferred for power10. --power10-stubs and --no-power10-stubs allow you to override the linker’s selection of stub instructions. --power10-stubs=auto allows the user to select the default auto mode.