Tim Haines [Tue, 21 Nov 2023 20:59:06 +0000 (14:59 -0600)]
Clean up and improve documentation of x86_32 registers (#1629)
* Improve comments for register lengths
* Separate MMX/3DNow! and x87 register lengths and categories
The MMX* registers are only the lower 64 bits of the st* ones.
* Improve comments for EFLAGS fields
* Add conversion to ROSE register for FLAGC
FLAGC is the lower bit of the I/O Permission Level field in EFIELD.
* Add conversion to ROSE register for FLAGD
FLAGC is the upper bit of the I/O Permission Level field in EFIELD.
* Add conversion to ROSE register for Nested Task flag
* Add conversion to ROSE register for Resume Flag
* Add missing Virtual-8086 mode (VM) EFLAGS field
* Add missing Alignment Check/Access Control (AC) EFLAGS field
* Add missing Virtual Interrupt Flag (VIF) EFLAGS field
* Add missing Virtual Interrupt Pending (VIP) EFLAGS field
* Add missing ID Flag (ID) EFLAGS field
* Remove registers xmm8-xmm31 and aliases
These registers are only available in 64-bit mode.
From Intel(r) 64 and IA-32 Architectures Software Developer’s Manual
June 2021:
11.2.1 SSE2 in 64-Bit Mode and Compatibility Mode
In compatibility mode, SSE2 extensions function like they do in
protected mode. In 64-bit mode, eight additional XMM registers are
accessible. Registers XMM8-XMM15 are accessed by using REX prefixes.
14.1.1 256-Bit Wide SIMD Register Support
Intel AVX introduces support for 256-bit wide SIMD registers
(YMM0-YMM7 in operating modes that are 32-bit or less, YMM0-YMM15 in
64-bit mode).
15.1.2 32 SIMD Register Support
Intel AVX-512 instructions also support 32 SIMD registers in 64-bit
mode (XMM0-XMM31, YMM0-YMM31 and ZMM0-ZMM31). The number of available
vector registers in 32-bit mode is still 8.
* Fix avx-512 opmask size.
It's 64 bits, not 128.
From Intel(R) 64 and IA-32 Architectures Software Developer’s Manual
June 2021
15.6.1 OPMASK Register to Predicate Vector Data Processing
The opmask is a set of eight architectural registers of size
MAX_KL (64-bit).
* Rename OCT to XMMS
This makes it consistent with the names used for the other vector
extensions.
* Use symbolic names for the segment register base IDs
* Add missing ROSE category conversions
* Add missing ROSE subrange conversions
* Preserve register number in getBaseRegister
* Clear whole subrange byte for GPRs in getBaseRegister
Tim Haines [Tue, 14 Nov 2023 12:46:06 +0000 (06:46 -0600)]
Construct a Module from the CU's offset not its PC (#1626)
* Construct a Module from the CU's offset not its PC
The PC value can be non-unique across CUs. For example, they can all
be 0x0 for a PIE binary. The offset of the CU is unique as it's the
location inside of the .debug_info table.
* Use correct lookup when creating module during Object parse
When creating a module during fix_global_symbol_modules_static_dwarf,
the default module covers all ranges, so we need to look for the
exact offset to prevent using the default module every time.
* Use correct offset in DwarfWalker::parseModule
* Lookup DIE location with dwarf_offdie when parsing ranges
Tim Haines [Tue, 7 Nov 2023 21:24:11 +0000 (15:24 -0600)]
Deprecate Symtab::getOrCreateModule (#1623)
* Deprecate Symtab::getOrCreateModule
There are several problems here:
1) Users shouldn't be creating modules
2) When created, the returned module must be "fixed up" before it's
useful. There's no need for that when one could be properly constructed
at the callsite inside Dyninst.
3) It violates the Single Responsibility Principle
* Add Symtab::addModule
It's private, so can only be used by friends- specifically Object.
* Use addModule throughout Symtab
* Don't update mod_lookup_ in fixSymModules
The address ranges have already been inserted there by 'addModule'.
* Use new check/add idiom for Modules in binaryEdit::writeFile
* Use new idiom in Object::fix_global_symbol_modules_static_dwarf
Tim Haines [Tue, 31 Oct 2023 12:44:00 +0000 (07:44 -0500)]
Fix MachRegister bool checks (#1613)
The logic between the getRegisterX and isRegisterX members diverged over time. This implements the isRegisterX in terms of the getRegisterX while preserving extra checks where necessary.
* Write 'isPC' in terms of 'getPC'
* Add missing architectures in getFramePointer
* Don't assert in getFramePointer
* Write isFramePointer in terms of getFramePointer
This also adds correct detection of frame pointers on PPC.
* Add missing arch in getStackPointer
* Don't assert in getStackPointer
* Reorder checks in getStackPointer
For consistency
* Write isStackPointer in terms of getStackPointer
This also now includes StackTop.
* isFramePointer ws
* isPC ws
* isStackPointer typo
* Don't assert in getSyscallNumberReg
* Add missing arch in getSyscallNumberReg
* Write isSyscallNumberReg in terms of getSyscallNumberReg
The original implementation in 7b8d777ce from 2013 used o{r,e}ax for
x86, but was changed to use {r,e}ax by 23a5a76d2 in 2015. Neither
the SystemV ABI nor Intel Dev Guide refer to o*ax, so I think this
check is now correct.
* Don't assert in getSyscallReturnValueReg
* Add missing arches in getSyscallReturnValueReg
* Write isSyscallReturnValueReg in terms of getSyscallReturnValueReg
These two had become completely unsynchronized. There is a reg for
aarch64 and both PPC registers were wrong in the bool check.
Tim Haines [Mon, 30 Oct 2023 11:46:05 +0000 (06:46 -0500)]
Remove FunctionBase::ranges_lock (#1596)
It's only ever used in DwarfWalker::setRanges, and that is only ever
called from DwarfWalker::parseSubprogram which is now correctly
guarded (as of 8b400af59b).
Tim Haines [Mon, 30 Oct 2023 11:44:19 +0000 (06:44 -0500)]
Clean up MachRegister class (#1604)
* Add missing switch breaks in size()
* Add break after assert in case statement
* Move assert into default case in 'size'
This is semantically equivalent, but fixes the 'incomplete without
default' linter warning.
* Add error messages in getROSERegister
I don't know if it's correct to assert in these conditions. This is
semantically equivalent, fixes the 'incomplete without default' linter
warning, and avoids asserting.
* Return InvalidRegister in DwarfEncToReg
This makes them all consisent and fixes the 'incomplete without default'
linter warning.
* Remove dead code
I left the commented-out switch cases. I'm not sure why they aren't
used.
Tim Haines [Mon, 30 Oct 2023 11:42:46 +0000 (06:42 -0500)]
Remove BINEDIT_DEBUG (#1607)
* Remove BINEDIT_DEBUG
The vast majority of usages were removed over the years (e.g., f90cb3d090f9ac8641ee95eb18bcf5cf6e1526947f8dcc9807b2ff). It's also
not every been turned on in the CMake build system, so it's not been
used in at least a decade.
Tim Haines [Sat, 28 Oct 2023 02:56:57 +0000 (21:56 -0500)]
Refactor common/dyn_regs.h (#1590)
This is a substantial re-architecture of the files used to record the per-architecture machine instructions. Most of the changes here are to facilitate automation of integrating new instructions via Capstone.
* Move Dyninst::Architecture into its own file
* Move MachRegister into separate files
* Remove non-existent classes in MachRegister.h
These should have been removed by d42b65910 in 2021.
* Only define registers in dyn_regs.C
* Move getArchAddressWidth into Architecture.h
* Use new Architecture.h and registers/MachRegister.h headers everywhere
Also fix broken transient includes.
* Remove unnecessary includes of 'dyn_regs.h'
* Put registers for each arch in separate file
* Move isSegmentRegister into x86_regs.h
* Use new per-architecture register files
This should reduce compile times and file sizes.
* Remove unnecessary comment in stackwalk/procstate.h
* Merge aarch64 sys regs into regs file
* Fix whitespace in aarch64_regs.h
* Get rid of aarch64 subdir
* Merge gfx908 sys_regs into regs
* Whitespace gfx908 regs
* Reorder gfx908 register declarations
This makes it consistent with the other architectures.
* Merge gfx90a sys_regs into regs
* Whitespace gfx90a
* Reorder gfx90a register declarations
This makes it consistent with the other architectures.
* Merge gfx940 sys_regs into regs
* Whitespace gfx940
* Reorder gfx940 register declarations
This makes it consistent with the other architectures.
* Flatten AMDGPU register namespace
This makes it consistent with the mnemonics namespace.
Tim Haines [Mon, 23 Oct 2023 18:33:03 +0000 (13:33 -0500)]
Reorder enumerators in instructionAPI::Result::Result_Type (#1588)
These enumerators were reordered by 85fd6745. It turns out there is
an undocumented requirement that they be in this specific order. Rather
than fixing the incorrect usage of inequality checks on enumerators,
I'm putting them back in order with a note.
Tim Haines [Fri, 13 Oct 2023 23:15:16 +0000 (18:15 -0500)]
Fix line information parsing for CUs with no aranges (#1581)
As of libdw 0.189, `dwarf_addrdie` assumes the presence of
.debug_aranges. For compilers that do not emit one, or emit an invalid
one (e.g., gtpin binaries), then manually search through all of the CUs
to find a match.
Tim Haines [Thu, 12 Oct 2023 20:59:39 +0000 (15:59 -0500)]
Replace Module::getAllFunctions (#1579)
The documented meaning did not match the implementation. This fixes
that and breaks the interface so that users are forced to see the
change rather than being surprised by it. It also makes it consistent
with the other 'find' members like findSymbol and findLocalVariable.
Tim Haines [Thu, 12 Oct 2023 20:07:16 +0000 (15:07 -0500)]
Remove DWARFisms from Symtab::Module (#1575)
* Remove compilation directory from Module
This is a concept specific to DWARF. These functions are not documented.
* Remove DWARFisms from Symtab::Module
There is no need to store the CU DIE from which a Module instance is
derived. The address of the CU can be used to reconstitute the
entry in the .debug_info section using dwarf_addrdie.
Because Module.h is part of the public API for Dyninst, this also
removes the transitive dependency on libdw.
Tim Haines [Tue, 10 Oct 2023 20:37:49 +0000 (15:37 -0500)]
Add Symtab::getContainingModule(Offset) (#1571)
* Add Symtab::getContainingModule(Offset)
Returns the module with PC ranges that contain a given offset (really
address). In contrast, findModuleByOffset(Offset) finds a module
starting at the given offset.
gcc 12 reports a diagnostic for a maybe uninitialized value when
boost::option::value_or is called on an optional that has no
value, even though this is safe
- add a diagnostic suppression macro for this warning and use it
to suppress the warning
kupsch [Tue, 10 Oct 2023 14:24:43 +0000 (09:24 -0500)]
fix gcc 6's broken __has_x_attribute (#1569)
- gcc 6's __has_c_attribute and __has_cpp_attribute return true if
an attribute is supported as a non-standard extension, but if used
produces a warning if the language standard is earlier than the
attribute's standardization; treat gcc 6 like clang and only allow
if the language standard is after the introduction.
- refactor the conditional compilation tests into common macros
Tim Haines [Mon, 9 Oct 2023 18:55:57 +0000 (13:55 -0500)]
Refactor Symtab::getOrCreateModule (#1568)
* Merge getOrCreateModule and newModule
The latter was only ever called from the former.
* Remove dead debug code
* Do not adjust Module's address
It's unclear why this was here. This function is currently called from
only two places: BinaryEdit::writeFile and
Object::fix_global_symbol_modules_static_dwarf.
In the first, the module created is called 'dyninstInst'. This is the
only place where that name is used, so only one module would be created.
Moreover, 'writeFile' will only produce a single binary output, so there
wouldn't be multiple modules.
In the second, the module lookup will always fail because we are
creating new ones for each DWARF compilation unit (CU), and those are
guaranteed to be unique as we iterate over the results of
'dwarf_nextcu'.
* Remove 'directory definitions' check
This kind of name is never used manually anywhere in Dyninst and the
names that come from DWARF compilation units (CUs) are never
directories.
* Clean up 'create' tracing message
* Tidy up variable declarations.
* Remove 'assert' after 'new'.
We require exceptions to be enabled when building Dyninst and we aren't
using the 'nothrow' version of 'operator new' here. This check is
useless.
* Remove existence check.
This will never be true because 'findModuleByOffset' would have found
the module.
A Symtab::Module is a one-to-one mapping to a DWARF compilation unit
(CU). In DWARF4, we consider a CU to be an entry in the .debug_info
section with the tag DW_TAG_compile_unit. In DWARF5, we also include
entries with the tag DW_TAG_partial_unit as they can contain symbol
definitions; we assume libdw will merge all other split unit types for
us.
The name of a module is the DW_AT_name of the containing DIE. This is
either the full path name of the source file used to create the CU or
the relative path of the same with respect to the DW_AT_comp_dir. We
ensure that the module's name is always an absolute path.
Modules have never been required to have unique names. That is, many
modules can share the same name. The following demonstrates this case:
Because the two CUs have the same name, Dyninst throws away the contents
of the second one because this function would return the first. It is
also possible (and likely) that the two CUs have different line maps and
location lists. These, too, are discarded. Although unlikely, it is
legal for a compiler to emit CUs with overlapping PC range values. This
means the only way to uniquely identify a module is by its offset in
the .debug_info section.
Tim Haines [Mon, 9 Oct 2023 17:15:22 +0000 (12:15 -0500)]
Remove Symtab::changeSymbolOffset (#1567)
It is never used. Not a breaking change as it's private.
I have left the function of the same name in Aggregate because it's
protected, there is a virtual dtor, and that class is accessible by
users. It's possible that someone is using it.
kupsch [Mon, 2 Oct 2023 18:36:35 +0000 (13:36 -0500)]
fix deprecated annotation warning using clang (#1554)
The clang compiler allows the use of some attributes introduced in a
later language standard than use to compile the source as a
non-standard extension. Clang's __has_cpp_attribute and
__has_c_attribute returns true for these attributes. If one of
these attributes is used, clang warns of a non-standard language
feature usage. So for clang, only use a standard attribute if the
feature test returns true and the language standard version is
valid.
- fix clang's [[deprecated]] (only if C++ >= 14 and C >= 23)
- use now known C-23 __STDC_VERSION__ value 202311L
- Added the macro DYNINST_DEPRECATED(msg). It can be placed before a
function, method, type or variable so that on use a deprecated
diagnostic is produced. The macro works using language standard
annotation or compiler specific annotations or has no effect if
neither is available.
It was never documented and makes no sense as offsets are unique
within a module (i.e., DWARF CU). Further, Dyninst uses a separate
Symtab instance for each object file in an archive.
Tim Haines [Thu, 28 Sep 2023 20:53:04 +0000 (15:53 -0500)]
Replace boost::multi_index_container with tbb::concurrent_unordered_set in symtab_impl (#1544)
* Replace boost::multi_index_container with tbb::concurrent_unordered_set
There are now only two dimensions to each Module, so the multi_index
isn't needed. This also replaces the mutex with the intrinsic reader/
writer locks in TBB.
Tim Haines [Thu, 28 Sep 2023 18:34:27 +0000 (13:34 -0500)]
Make a default module a class invariant in Symtab (#1538)
This simplifies the handling of modules and ensures a default
always exists. Creation must happen after the MappedFile has
been created (Symtab::file() checks that one has been created),
but before the symbols are assigned to a Module.
Tim Haines [Thu, 28 Sep 2023 18:32:56 +0000 (13:32 -0500)]
Fix duplicate symbol entries in Symtab:everyFunction (#1542)
A function should only be added to everyFunction if it was not already
in funcsByOffset or if it lives in a different code region from the
function found with the same offset.
This was introduced by https://github.com/dyninst/dyninst/pull/1534.