ldand PowerPC64 64-bit ELF Support
Long branch stubs, PLT call stubs and TOC adjusting stubs are placed
ld in stub sections located between groups of input sections.
‘--stub-group-size’ specifies the maximum size of a group of input
sections handled by one stub section. Since branch offsets are signed,
a stub section may serve two groups of input sections, one group before
the stub section, and one group after it. However, when using
conditional branches that require stubs, it may be better (for branch
prediction) that stub sections only serve one group of input sections.
A negative value for ‘N’ chooses this scheme, ensuring that
branches to stubs always use a negative offset. Two special values of
‘N’ are recognized, ‘1’ and ‘-1’. These both instruct
ld to automatically size input section groups for the branch types
detected, with the same behaviour regarding stub placement as other
positive or negative values of ‘N’ respectively.
Note that ‘--stub-group-size’ does not split input sections. A single input section larger than the group size specified will of course create a larger group (of one section). If input sections are too large, it may not be possible for a branch to reach its stub.
This option causes
ld to label linker stubs with a local
symbol that encodes the stub type and destination.
These two options control how
ld interprets version patterns
in a version script. Older PowerPC64 compilers emitted both a
function descriptor symbol with the same name as the function, and a
code entry symbol with the name prefixed by a dot (‘.’). To
properly version a function ‘foo’, the version script thus needs
to control both ‘foo’ and ‘.foo’. The option
‘--dotsyms’, on by default, automatically adds the required
dot-prefixed patterns. Use ‘--no-dotsyms’ to disable this
These two options control whether PowerPC64
provides out-of-line register save and restore functions used by
‘-Os’ code. The default is to provide any such referenced
function for a normal final link, and to not do so for a relocatable
ld normally performs some optimization of code
sequences used to access Thread-Local Storage. Use this option to
disable the optimization.
These options control how PowerPC64
ld uses a special
stub to call __tls_get_addr. PowerPC64 glibc 2.22 and later support
an optimization that allows the second and subsequent calls to
__tls_get_addr for a given symbol to be resolved by the special
stub without calling in to glibc. By default the linker enables
generation of the stub when glibc advertises the availability of
Using --tls-get-addr-optimize with an older glibc won’t do
much besides slow down your applications, but may be useful if linking
an application against an older glibc with the expectation that it
will normally be used on systems having a newer glibc.
--tls-get-addr-regsave forces generation of a stub that saves
and restores volatile registers around the call into glibc. Normally,
this is done when the linker detects a call to __tls_get_addr_desc.
Such calls then go via the register saving stub to __tls_get_addr_opt.
--no-tls-get-addr-regsave disables generation of the
ld normally removes
.opd section entries
corresponding to deleted link-once functions, or functions removed by
the action of ‘--gc-sections’ or linker script
Use this option to disable
Some PowerPC64 compilers have an option to generate compressed
.opd entries spaced 16 bytes apart, overlapping the third word,
the static chain pointer (unused in C) with the first word of the next
entry. This option expands such entries to the full 24 bytes.
ld normally removes unused
entries. Such entries are detected by examining relocations that
reference the TOC in code sections. A reloc in a deleted code section
marks a TOC word as unneeded, while a reloc in a kept code section
marks a TOC word as needed. Since the TOC may reference itself, TOC
relocs are also examined. TOC words marked as both needed and
unneeded will of course be kept. TOC words without any referencing
reloc are assumed to be part of a multi-word entry, and are kept or
discarded as per the nearest marked preceding word. This works
reliably for compiler generated code, but may be incorrect if assembly
code is used to insert TOC entries. Use this option to disable the
ld normally replaces inline PLT call sequences
R_PPC64_PLT16_LO_DS relocations by
a number of
nops and a direct call when the function is defined
locally and can’t be overridden by some other definition. This option
disables that optimization.
If given any toc option besides
-mcmodel=large, PowerPC64 GCC generates code for a TOC model
entries are accessed with a 16-bit offset from r2. This limits the
total TOC size to 64K. PowerPC64
ld extends this limit by
grouping code sections such that each group uses less than 64K for its
TOC entries, then inserts r2 adjusting stubs between inter-group
ld does not split apart input sections, so cannot
help if a single input file has a
.toc section that exceeds
64K, most likely from linking multiple files with
Use this option to turn off this feature.
ld sorts TOC sections so that those whose file
happens to have a section called
placed first, followed by TOC sections referenced by code generated
with PowerPC64 gcc’s
-mcmodel=small, and lastly TOC sections
referenced only by code generated with PowerPC64 gcc’s
-mcmodel=large options. Doing this
results in better TOC grouping for multi-TOC. Use this option to turn
off this feature.
Use these options to control whether individual PLT call stubs are
aligned to a 32-byte boundary, or to the specified power of two
boundary when using
--plt-align=. A negative value may be
specified to pad PLT call stubs so that they do not cross the
specified power of two boundary (or the minimum number of boundaries
if a PLT stub is so large that it must cross a boundary). By default
PLT call stubs are aligned to 32-byte boundaries.
Use these options to control whether PLT call stubs load the static
chain pointer (r11).
ld defaults to not loading the static
chain since there is never any need to do so on a PLT call.
With power7’s weakly ordered memory model, it is possible when using
lazy binding for ld.so to update a plt entry in one thread and have
another thread see the individual plt entry words update in the wrong
order, despite ld.so carefully writing in the correct order and using
memory write barriers. To avoid this we need some sort of read
barrier in the call stub, or use LD_BIND_NOW=1. By default,
looks for calls to commonly used functions that create threads, and if
seen, adds the necessary barriers. Use these options to change the
ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, and that have no requirement on r2 (the TOC/GOT pointer) or r12, and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. The optimization is attractive, with up to 40% reduction in execution time for a small function, but can result in symbol interposition failures. Also, minor changes in a shared library, including system libraries, can cause a function that was localentry:0 to become localentry:8. This will result in a dynamic loader complaint and failure to run. The option is experimental, use with care. --no-plt-localentry is the default.
ld links input object files containing
relocations used on power10 prefixed instructions it normally creates
linkage stubs (PLT call and long branch) using power10 instructions
@notoc PLT calls where
r2 is not known. The
power10 notoc stubs are smaller and faster, so are preferred for
power10. --power10-stubs and --no-power10-stubs
allow you to override the linker’s selection of stub instructions.
--power10-stubs=auto allows the user to select the default