Program Headers =============== These are GNU extended program header type values: They are typically found in ElfW(Phdr).p_type. PT_GNU_EH_FRAME 0x6474e550 PT_SUNW_EH_FRAME 0x6474e550 Segment contains the EH_FRAME_HDR section (stack frame unwind information) NOTE: The virtual address range referred to by PT_GNU_EH_FRAME must be covered by a PT_LOAD entry - PT_GNU_EH_FRAME on its own does not trigger the mapping/loading of any data. PT_SUNW_EH_FRAME is used by a non-GNU implementation for the same purpose, and has the same value (although this does not imply compatible contents). The contents of the EH_FRAME_HDR are described in the LSB. As of v5.0: Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html#EHFRAMEHDR Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#PROGHEADER PT_GNU_STACK 0x6474e551 The p_flags member of this ElfW(Phdr) structure apply to the stack. If present AND p_flags DOES NOT contain PF_X (0x1) then the stack should _not_ be executable. Otherwise the stack follows the architecture specific default for executability: For example on x86 the stack is executable by default. NOTE: Some implementations may use this header's p_size to set the stack size. glibc does NOT do this: See GNU_PROPERTY_STACK_SIZE instead. Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#PROGHEADER PT_GNU_RELRO 0x6474e552 The specified segment should be made read-only once run-time linking of this object has completed. As with PT_GNU_EH_FRAME this header entry does NOT guarantee that the range in question is loaded: That must be ensured via a PT_LOAD entry which covers the range. Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#PROGHEADER PT_GNU_PROPERTY 0x6474e553 The Linux kernel uses this program header to locate the ".note.gnu.property section". If there is a program property that requires the kernel to perform some action before loading and ELF file (eg AArch64 BTI or Intel CET) then this header MUST be present. If no such features are to be enabled this header MUST NOT be present. The contents are laid out as follows: Field | Length | Contents n_namsz | 4 | 4 n_descsz | 4 | Size of n_desc (4 byte int, processor format) n_type | 4 | NT_GNU_PROPERTY_TYPE_0 (0x5) n_name | 4 | GNU\0 n_desc | n_descsz | property array Each element of n_desc, in turn is: typedef struct { Elf_Word pr_type; Elf_Word pr_datasz; unsigned char pr_data[PR_DATASZ]; unsigned char pr_padding[PR_PADDING]; } Elf_Prop; pr_data is aligned to 4 bytes in 32-bit objects and 8 bytes in 64-bit ones. The segment itself is aligned according to the program header's p_align field. PR_PADDING bytes are added _after_ PR_DATASZ so that each property is aligned to 4 bytes (on 32 bit architectures) and to 8 bytes on 64 bit architectures. This is true even if pr_datasz is 0 (cf GNU_PROPERTY_NO_COPY_ON_PROTECTED). Properties are sorted in ascending order of pr_type; Defined properties are: GNU_PROPERTY_STACK_SIZE 0x1 pr_data holds a native sized (4 bytes on 32 bit architecures, 8 bytes on 64 bit) integer in the target processor's native format. The linker should pick the highest instance of this from all relocatable objects in the link chain and ensure the stack is at least this big. There is no implication or requirement that the linker should or will reduce the stack size to match this value. GNU_PROPERTY_NO_COPY_ON_PROTECTED 0x2 The linker should treat protected data symbol as defined locally at run-time and copy this property to the output share object. The linker should add this property to the output share object if any protected symbol is expected to be defined locally at run-time. The run-time loader should disallow copy relocations against protected data symbols defined such objects. This type is expected to have a pr_datasz field of 0, and no pr_data contents (only padding). GNU_PROPERTY_LOPROC 0xc0000000 GNU_PROPERTY_HIPROC 0xdfffffff Reserved for processor-specific values. GNU_PROPERTY_LOUSER 0xe0000000 GNU_PROPERTY_HIUSER 0xffffffff Reserved for application specific values. Reference: https://raw.githubusercontent.com/wiki/hjl-tools/linux-abi/linux-abi-draft.pdf PT_GNU_SFRAME 0x6474e554 Segment contains the SFrame section (Simple Frame format stack trace information). NOTE: The virtual address range referred to by PT_GNU_SFRAME must be covered by a PT_LOAD entry - PT_GNU_SFRAME on its own does not trigger the mapping/loading of any data. The contents of the SFrame section are described in the GNU Binutils documentation. As of 2.40: https://sourceware.org/binutils/docs/sframe-spec.html There are further extensions to p_type but currently they are all architecture specific and should be documented in the relevant ABIs. Dynamic segment extensions (PT_DYNAMIC entries) =============================================== The following types within PT_DYNAMIC are GNU extensions: The values are typically found in the ElfW(Dyn).d_tag member. DT_GNU_FLAGS_1 0x6ffffdf4 Similar to DT_FLAGS and DT_FLAGS_1, but DT_FLAGS is generic and the DT_FLAGS_1 bit mask has been exhausted (last available bit claimed by Solaris). Currently supports the following flag bit(s) in its d_val value: DF_GNU_1_UNIQUE This flag bit indicates that the library should be loaded at most once across all namespaces unless a standalone copy is explicitly requested. Some background: By default libraries and all their dependencies are loaded into a single namespace or link-map (LM_ID_BASE) - this applies to libraries loaded by ld.so when a program starts, and to those loaded later by dlopen(3). glibc implements a dynamic loading extension - dlmopen(3) which is similar to dlopen(3) but can load libraries into secondary namespaces, each of which has its own private link map. Libraries in these namespaces are NOT used by the linker to resolve symbols for one another: A library in namespaces 2 (for example) will not use symbols or libraries from any other namespace, nor will it be used to satisfy symbol lookups from libraries in those namespaces. This mechanism is the basis for isolation of LD_AUDIT libraries (for example). While this is generally desirable some libraries do not behave well under these conditions - in particular libc (malloc/free get upset when they interact with independent copies of themselves since they have no knowledge of one another's memory accounting) and libpthread (which tends to deadlock of two different namespaces attempt to initialise thread metadata). DF_GNU_1_UNIQUE is used to mark such libraries so that when they are loaded only one copy (which resides in LM_ID_BASE) is mapped, and all namespaces use that copy (unless such sharing is explicitly suppressed, such as for LD_AUDIT libraries). This behaviour can be explicitly overridden by the caller of dlmopen(3). Reference: This document is canonical. Prelinking ========== DT_GNU_PRELINKED 0x6ffffdf5 The d_val field contains a time_t value giving the UTC time at which the object was (pre)linked. Reference: See the accompanying prelink document for details. DT_GNU_CONFLICTSZ 0x6ffffdf6 Used in prelinked objects. d_val contains the size of the conflict segment. DT_GNU_LIBLISTSZ 0x6ffffdf7 Used in prelinked objects. d_val contains the size of the library list. DT_GNU_CONFLICT 0x6ffffef8 Used in prelinked objects. The d_ptr value gives the location of the conflict segment. This will contain an array of ElfW(Rela) structs. If DT_GNU_LIBLIST matches the library searchlist after loading then these relocation records are replayed immediately after run-time loading. DT_GNU_LIBLIST 0x6ffffef9 Used in prelinked objects. The d_ptr value gives the location of the ElfW(Lib) array giving the SONAME, checksum and timestamp or each library encountered at prelink time. This is used to check that all required prelinked libraries are still present, loaded, and have the correct checksums at runtime. typedef struct { ElfW(Word) l_name; /* Name (string table index) */ ElfW(Word) l_time_stamp; /* Timestamp */ ElfW(Word) l_checksum; /* Checksum */ ElfW(Word) l_version; /* Interface version */ ElfW(Word) l_flags; /* Flags */ } ElfW(Lib); Hashes ====== DT_GNU_HASH 0x6ffffef5 The d_ptr value gives the location of the GNU style symbol hash table. The GNU hash of a symbol is computed as follows: - take the NAME of the symbol (WITHOUT any @version suffix) - unsigned long h ← 5381 - for each unsigned character C in NAME, starting at position 0: - h ← (h << 5) + h + C; OR h ← (h * 33) + C; - uint32_t HASH ← h Hash Table contents: bitmask-bits is a power of 2. It is at least 32 (on 32 bit); at least 64 on 64 bit architectures. There are other restrictions, see elflink.c in the binutils-gdb/bfd source. The bucket in which a symbol's hash entry is found is: gnu-hash( symbol-name ) % nbuckets The table is divided into 4 parts: ---------------------------------------------------------------------------- Part 1 (metadata): - nbuckets : 4 byte native integer. Number of buckets A bucket occupies 32 bits. - symoffset : 4 byte native integer. Starting index of first "real" symbol in the ".dynsym" section, See below. - bitmask-words: 4 byte native integer. The number of ELFCLASS words in part 2 of the table. On 64-bit architctures: bitmask-bits / 64 And on 32-bit ones : bitmask-bits / 32 - bloom-shift : 4 byte native integer. The shift-count used in the bloom filter. symoffset: There are synthetic symbols - one for each section in the linker output. symoffset gives the number of such synthetic symbols ( which cannot be looked up via the GNU hash section described here ). NB: symbols that _can_ be looked up via the GNU hash must be stored in the ".dynsym" section in ascending order of bucket. That is the ordering is determined by: gnu-hash( symbol-name ) % nbuckets ---------------------------------------------------------------------------- Part 2 (the bloom filter bitmask): - bloom : ElfW(Addr)[ bitmask-words ] For each symbol [name] S the following is carried out (by the link editor): - C ← __ELF_NATIVE_CLASS /* ie 32 on ELF32, 64 on ELF64 */ - H ← gnu-hash( S ) - BWORD ← (H / C) & (bitmask-words - 1) - in bloom[ BWORD ] set: - bit H & (C - 1) - bit (H >> bloom-shift) & (C - 1) NOTE: The discussions and examples of this that are around may use modulo operations instead of the logical-ands you see above: This is not an error or divergence since: x % 2ⁿ ≡ x & (2ⁿ - 1) /* NB: where x is unsigned */ NOTE: For those unfamiliar with bloom filters: If either bit described above is NOT SET then the hash is DEFINITELY NOT present in the table and lookup need proceed no further. ---------------------------------------------------------------------------- Part 3 (the bucket metadata): - bucket[nbuckets] : Array of 4 byte native integers, giving: For each bucket: - The INDEX of the first symbol in that bucket, OR 0 if no symbols in that bucket. NOTE: these indices give the offset into the ".dynsym" section. For the offset into the bucket data in part 4 of the table, see below: ---------------------------------------------------------------------------- Part 4 (the chains, or actual bucket data): - chain : Contiguous arrays of pseudo hash values combining the hash values and the index of the related symbol Each pseudo hash value is a 4 byte native integer. ElfW(Word)[number-of-symbols]. For each symbol [name] S: - CHASH ← gnu-hash( S ) - BUCKET ← CHASH % nbuckets - CINDEX ← position of the symbol _within_ its bucket 0 for the first symbol, 1 for the second and so forth The chain data are stored as a single linear chunk with each pseudo-hash value immediately following another. CINDEX gives the position of a pseudo-hash inside the bucket to which it belongs, rather than its position in the chain data area as a whole. [ b0h0 | b0h1 | b0h3 | b1h0 | … - if a pseudo-hash value is the last one in the bucket: - CHASH ← CHASH | 1 /* set the least bit */ - else - CHASH ← CHASH & ~1 /* unset the least bit */ - BYTE-OFFSET ← (bucket[BUCKET] + CINDEX - symoffset) * 4 - CHAIN-ADDR ← ((char *)&bucket[nbuckets]) + BYTE-OFFSET - *(ElfW(Word) *)(CHAIN-ADDR) ← CHASH The least bit of a pseudo-hash value being set indicates that this entry is the last in the chain - this is used during lookupg since unlike the stock ELF hash the GNU hash does not use linked lists to store its chains. Reference: https://sourceware.org/legacy-ml/binutils/2006-10/msg00377.html Reference: https://flapenguin.me/elf-dt-gnu-hash Reference: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=1384c1a46b83b55876b6a73dbcba0386a458063b;hb=HEAD#l7263 bfd_elf_size_dynsym_hash_dynstr Reference: https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-lookup.c;h=807f3ea9b67489b3116535b7c433c774a72e4c29;hb=HEAD#l357 do_lookup_x Section Headers =============== The GNU extensions to section header type values. Typically found in ElfW(Shdr).sh_type. SHT_GNU_INCREMENTAL_INPUTS 0x6fff4700 Section name: ".gnu_incremental_inputs" Currently used internally during incremental linking by gold. SHT_GNU_ATTRIBUTES 0x6ffffff5 Section name: ".gnu.attributes" GNU specific program attributes, sh_size bytes at sh_offset into the file. The first byte is the version of the attribute spec: Currently only 'A' is defined. Each attribute is stored as: - len: 4 byte integer in native format (total attribute length) - data: (len - 4) bytes of attribute data - starting with a \0 terminated name - at least 6 bytes of tag-byte+value - a tag byte - a 4 byte native integer size (including the tag byte & the size itself) - if the tag is 2 or 3: a LEB128 encoded value stored in the remaining space - DOCUMENTME: some attribute bytes? reverse engineer from readelf? SHT_GNU_HASH 0x6ffffff6 Section name: ".gnu.hash" (architecture specific ABI may override this) This section contains the GNU style hash table. See DT_GNU_HASH. Currently only the MIPS architecture is known to use a different name. SHT_GNU_LIBLIST 0x6ffffff7 See DT_GNU_LIBLIST. Section name: ".gnu.liblist" This section should refer to a SHT_STRTAB type section via its sh_link field: That strtab holds the names of the libraries listed in each ElfW(Lib) struct contained in this GNU_LIBLIST section. Symbol Versioning ================= These sections implement GNU symbol versioning. These sections all have the SHF_ALLOC atribute. Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html Reference: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#SYMVERSION SHT_GNU_verdef 0x6ffffffd Section name: ".gnu.version_d" This section contains symbol version definitions. The number of entries it contains is given by the DT_VERDEFNUM entry of the Dynamic Section SHT_DYNAMIC/".dynamic". The sh_link member of this section header (see the System V ABI) points to the SHT_STRTAB section that contains the strings referenced by this section. This section contains an array of ElfW(Verdef) structures optionally followed by an array of ElfW(Verdaux) structures. ElfW(Verdef): typedef struct { ElfW(Half) vd_version; /* 0, 1 or 2. See below */ ElfW(Half) vd_flags; /* A flag bitfield. See below */ ElfW(Half) vd_ndx; /* Referred to by SHT_GNU_versym. See below */ ElfW(Half) vd_cnt; /* Number of associated ElfW(Verdaux) entries */ ElfW(Word) vd_hash; /* Version name hash (per ELF Hash function) */ ElfW(Word) vd_aux; /* Offset in bytes from this ElfW(Verdef) to its ElfW(Verdaux) array */ ElfW(Word) vd_next; /* Offset in bytes frm this ElfW(Verdef) to the next ElfW(Verdef) entry. 0 for last entry. */ } ElfW(Verdef); vd_version: VER_DEF_NONE 0 // No version VER_DEF_CURRENT 1 // Currrent version vd_flags: VER_FLG_BASE 0x1 // [Default] Version of the whole object VER_FLG_WEAK 0x2 // Weak version identifier vd_ndx: VER_NDX_LOCAL 0 // private symbol VER_NDX_GLOBAL 1 // global symbol VER_NDX_LORESERVE 0xff00 // Beginning of reserved entries VER_NDX_ELIMINATE 0xff01 // DOCUMENTME: VER_NDX_ELIMINATE does not appear to be implemented in glibc: If an implementation exists its semantics should be reverse-engineered from there and explained here. ElfW(Verdaux): typedef struct { ElfW(Word) vda_name; // byte offset into the strtab of the version name ElfW(Word) vda_next; // byte offset from this ElfW(Verdaux) to the next } ElfW(Verdaux); SHT_GNU_verneed 0x6ffffffe Section name: ".gnu.version_r" This section contains symbol version requirements. The number of entries it contains is given by the DT_VERNEEDNUM entry of the Dynamic Section SHT_DYNAMIC/".dynamic". The sh_link member of this section header (see the System V ABI) points to the SHT_STRTAB section that contains the strings referenced by this section. This section contains an array of ElfW(Verneed) structures optionally followed by an array of ElfW(Vernaux) structures. ElfW(Verneed): typedef struct { ElfW(Half) vn_version; /* See below */ ElfW(Half) vn_cnt; /* Number of associated ElfW(Vernaux) entries */ ElfW(Word) vn_file; /* Byte offset in strtab of required DSO filename */ ElfW(Word) vn_aux; /* Byte offset from this ElfW(Verneed) to its ElfW(Vernaux) array */ ElfW(Word) vn_next; /* Byte offset from this ElfW(Verneed) to the next one. 0 in the last one */ } ElfW(Verneed); ElfW(Vernaux): typedef struct { ElfW(Word) vna_hash; /* Dependency name hash (per ELF hash function) */ ElfW(Half) vna_flags; /* Dependency flag bitfield. See below */ ElfW(Half) vna_other; /* Referred to by SHT_GNU_versym, but see below */ ElfW(Word) vna_name; /* Byte offset in strtab of required (symbol) name */ ElfW(Word) vna_next; /* Byte offset from this ElfW(Vernaux) to the next 0 for the last entry */ } ElfW(Vernaux); vna_flags: VER_FLG_WEAK 0x2 Weak version identifier: Not fatal if this symbol+version is missing. vna_other: This value is used to look up the symbol version hash: It gives the position of the hash in the version lookup table. Bit 15 (0x8000) is a flag bit and should be masked out of this value before using it as an index (eg by bitwise-and-ing its value with 0x7fff) If bit 15 (0x8000) is set then this symbol is hidden and is never an acceptable candidate for matching version criteria. Reference: glibc: elf/dl-version.c; elf/dl-lookup.c SHT_GNU_versym 0x6fffffff Section name: ".gnu.version" The versioned symbol table. If present, this must have the same number ofentries as the SHT_DYNSYM/".dynsym" section. The entries in this section are in the same order as those in SHT_DYNSYM. That is to say: Entry 2 in this table corresponds to entry 2 in SHT_DYNSYM, entry 3 here to entry 3 in SHT_DYNSYM, and so on. This section contains an array of elements of type ElfW(Half). Each entry specifies the version defined for or required by the corresponding symbol in the Dynamic Symbol Table. Entries do not give the version directly - instead they refer to the corresponding ElfW(Vernaux).vna_other or ElfW(Verdef).vd_ndx structure+member. Two values are reserved: VER_NDX_LOCAL 0 - The symbol is private, and is not available outside this object. VER_NDX_GLOBAL 1 - The symbol is globally available (ie the base or default version). Note section descriptors (SHT_NOTE extensions) ============================================== These SHT_NOTE descriptor types are GNU extensions Found in the type field of the ELF note layout. Section name: ".note" as per standard SHT_NOTE sections. Each note entry should be aligned to 4 bytes in 32-bit objects or 8 bytes in 64-bit objects (see below for exceptions to this). Alignment: A note parser should use p_align from the program section header for note alignment rather than assuming alignment based on ELF file class. NT_GNU_ABI_TAG 1 Use to indicate kernel type and minimum kernel version: Section must be named ".note.ABI-tag" Alignment: Always 4-bytes, Even on 64 bit architectures. The name field (namesz/name) contains the string "GNU". The descsz field must be at least 16, The first 16 bytes of the desc [aka descdata] field are as follows: The first 4 byte word is a native integer indicating the kernel type: GNU_ABI_TAG_LINUX 0 GNU_ABI_TAG_HURD 1 GNU_ABI_TAG_SOLARIS 2 GNU_ABI_TAG_FREEBSD 3 GNU_ABI_TAG_NETBSD 4 GNU_ABI_TAG_SYLLABLE 5 GNU_ABI_TAG_NACL 6 The second, third, and fourth 4-byte words of the desc field contain the earliest compatible kernel version. For example, if the 3 words are 2, 2, and 5, this signifies a 2.2.5 kernel. NT_GNU_HWCAP 2 The first 4 bytes are a native integer giving the number of capabilities. The next 4 bytes give the bitmask of enabled capabilities. The remainder is a packed array of: [ 1 byte ][ N bytes ] [ TESTBIT ][ \0 terminated cap name ] NT_GNU_BUILD_ID 3 descsz bytes of build-id data. Alignment: Always 4-bytes, Even on 64 bit architectures. Typically presented as a hex string by user-facing tools. Stored as binary (ie not necessarily printable, not encoded). The build-id is desctribed as having the following properties: It is unique among the set of meaningful contents for ELF files and identical when the output filewould otherwise have been identical. The computation mechanism for the build-id is not given, and is in any case opaque after compile time. NT_GNU_GOLD_VERSION 4 Up to descsz of [printable] gold version string bytes. NT_GNU_PROPERTY_TYPE_0 5 32 or 64 bit aligned (matching the architecture) bytes of data. Each entry within this data blob consists of: 4 bytes, a native integer giving the subtype. 4 bytes, a native integer giving the size of the entry See PT_GNU_PROPERTY and/or architecture specific ABIs for details.