[PATCH v2 4/5] Cygwin: use udis86 to find fast cwd pointer on x64
Jeremy Drake
cygwin@jdrake.com
Fri Mar 28 00:52:21 GMT 2025
On Thu, 27 Mar 2025, Corinna Vinschen wrote:
> On Mar 27 10:26, Jeremy Drake via Cygwin-patches wrote:
> > comment, it seems 8.0 is the odd-version-out here.
>
> Yeah, but we don't support 8.0 anymore, only 8.1.
>
> > BTW, something I would *like* to do but haven't figured out how to
> > accomplish cleanly yet is to follow the registers. What I mean by this is
> > illustrated by what I did in the aarch64 version: I could find the call to
> > RtlEnterCrticalSection, then work backwards, find the add whose Rd was x0
> > (the register for the first (pointer) parameter in the calling
> > convention), then find the adrp whose Rd was the Rn of the add. What I
> > would do on x86_64 is find the call to RtlEnterCriticalSection, find any
> > mov rcx, <reg> before, then find the lea <reg>, [rip+XXX] (where reg would
> > be rcx if there wasn't a mov rcx after the lea). Unfortunately, the
> > variable length-ness doesn't lend itself to iterating backwards, so I am
> > not confirming that the lea actually ends up in rcx for the function call.
> > The only register correlation I do is that the register used in the
> > mov <reg>, QWORD PTR [rip+XXX] is then used in the next instruction that
> > must be test <reg>, <reg>. The old code required that <reg> to be rbx,
> > but I don't see any reason that rbx is required...
>
> Yeah, reading x86_64 backwards will lead to confusion. And no, rbx
> isn't required, any non-volatile register could do it. It seems that
> rbx is used because of the way vc++ allocates register.
After taking out the windows 8.0 case, I think this should be doable:
* when finding the lea that we're already looking for, save the
destination register
* if the destination register is not rcx, look for a 64-bit mov into rcx
from <reg> (where <reg> is the register from the lea) before the call to
RtlEnterCriticalSection
This won't catch cases where they shuffle it between multiple registers,
or otherwise obfusate the load into rcx (push/pop, xchg, using some memory
location, ...) but I think this covers every case I've seen (including
those mentioned in comments about preview builds). It would also allow us
to skip the theoretical-but-legal sequence (intel)
lea rXX, [rip+XXXX] ; FastPebLock
...
call UnrelatedFunction
mov rcx, rXX
call RtlEnterCriticalSection
mov rYY, QWORD PTR [rip+YYYY] ; RtlpCurDirRef
test rYY, rYY
...
I'll try to find some time to test this latest round on as many released
Windows versions >= 8.1 as I can, and then send a v3 series. It works on
22631 at least.
More information about the Cygwin-patches
mailing list