Bug 444399 - disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP).
This is unfortunately a big and complex patch, to implement LD{,A}XP and
ST{,L}XP. These were omitted from the original AArch64 v8.0 implementation
for unknown reasons.
(Background) the patch is made significantly more complex because for AArch64
we actually have two implementations of the underlying
Load-Linked/Store-Conditional (LL/SC) machinery: a "primary" implementation,
which translates LL/SC more or less directly into IR and re-emits them at the
back end, and a "fallback" implementation that implements LL/SC "manually", by
taking advantage of the fact that V serialises thread execution, so we can
"implement" LL/SC by simulating a reservation using fields LLSC_* in the guest
state, and invalidating the reservation at every thread switch.
(Background) the fallback scheme is needed because the primary scheme is in
violation of the ARMv8 semantics in that it can (easily) introduce extra
memory references between the LL and SC, hence on some hardware causing the
reservation to always fail and so the simulated program to wind up looping
forever.
For these instructions, big picture:
* for the primary implementation, we take advantage of the fact that
IRStmt_LLSC allows I128 bit transactions to be represented. Hence we bundle
up the two 64-bit data elements into an I128 (or vice versa) and present a
single I128-typed IRStmt_LLSC in the IR. In the backend, those are
re-emitted as LDXP/STXP respectively. For LL/SC on 32-bit register pairs,
that bundling produces a single 64-bit item, and so the existing LL/SC
backend machinery handles it. The effect is that a doubleword 32-bit LL/SC
in the front end translates into a single 64-bit LL/SC in the back end.
Overall, though, the implementation is straightforward.
* for the fallback implementation, it is necessary to extend the guest state
field `guest_LLSC_DATA` to represent a 128-bit transaction, by splitting it
into _DATA_LO64 and DATA_HI64. Then, the implementation is an exact
analogue of the fallback implementation for single-word LL/SC. It takes
advantage of the fact that the backend already supports 128-bit CAS, as
fixed in bug 445354. As with the primary implementation, doubleword 32-bit
LL/SC is bundled into a single 64-bit transaction.
Detailed changes:
* new arm64 guest state fields LLSC_DATA_LO64/LLSC_DATA_LO64 to replace
guest_LLSC_DATA
* (ridealong fix) arm64 front end: a fix to a minor and harmless decoding bug
for the single-word LDX/STX case.
* arm64 front end: IR generation for LD{,A}XP/ST{,L}XP: tedious and
longwinded, but per comments above, an exact(ish) analogue of the singleword
case
* arm64 backend: new insns ARM64Instr_LdrEXP / ARM64Instr_StrEXP to wrap up 2
x 64 exclusive loads/stores. Per comments above, there's no need to handle
the 2 x 32 case.
* arm64 isel: translate I128-typed IRStmt_LLSC into the above two insns
* arm64 isel: some auxiliary bits and pieces needed to handle I128 values;
this is standard doubleword isel stuff
* arm64 isel: (ridealong fix): Ist_CAS: check for endianness of the CAS!
* arm64 isel: (ridealong) a couple of formatting fixes
* IR infrastructure: add support for I128 constants, done the same as V128
constants
* memcheck: handle shadow loads and stores for I128 values
* testcase: memcheck/tests/atomic_incs.c: on arm64, also test 128-bit atomic
addition, to check we really have atomicity right
* testcase: new test none/tests/arm64/ldxp_stxp.c, tests operation but not
atomicity. (Smoke test).