Bug 29189

Summary: dlltool delaylibs corrupt float/double arguments
Product: binutils Reporter: strager <strager.nds>
Component: binutilsAssignee: Alan Modra <amodra>
Status: RESOLVED FIXED    
Severity: normal CC: dushistov, egor.pugin
Priority: P2    
Version: 2.39   
Target Milestone: 2.41   
Host: Target: windows-amd64
Build: Last reconfirmed: 2023-05-15 00:00:00
Attachments: semi-tested bug fix
minimal repro
patch for base x86_64 arch

Description strager 2022-05-27 07:01:59 UTC
Created attachment 14120 [details]
semi-tested bug fix

(This report was originally posted on the mailing list: https://lists.gnu.org/archive/html/bug-binutils/2022-05/msg00099.html)

I am calling a function in another x64 DLL with the
following C signature:

    int napi_create_double(void*, double, void*);

The first time I call this function, the 'double' argument
ends up as 1.20305e-307 inside napi_create_double, no matter
what value the caller gives. The 'double' is corrupted.
Calls after the first don't corrupt the 'double'.

The cause is ntdll.dll, eventually called by MinGW's
__delayLoadHelper2, modifying the xmm1 register:

#0  0x00007ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from C:\WINDOWS\System32\KernelBase.dll
#4  0x00007ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70 <__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>,
        ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>)
        at C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209
#5  0x00007ffcd7b717c9 in __tailMerge_node_napi_lib ()
       from MYDLL.dll
#6  0x000002ad2fe84c50 in ?? ()

   0x00007ffd26ce2ffb <+1051>:  movups (%rdx),%xmm0
   0x00007ffd26ce2ffe <+1054>:  movups %xmm0,(%rsi)
   0x00007ffd26ce3001 <+1057>:  movsd  0x10(%rdx),%xmm1
=> 0x00007ffd26ce3006 <+1062>:  movsd  %xmm1,0x10(%rsi)
   0x00007ffd26ce300b <+1067>:  mov    (%rsi),%rbp
   0x00007ffd26ce300e <+1070>:  mov    %r11,%rax
   0x00007ffd26ce3011 <+1073>:  lock cmpxchg %r12,0x1384d6(%rip)        # 0x7ffd26e1b4f0
   0x00007ffd26ce301a <+1082>:  jne    0x7ffd26ce3102 <ntdll!RtlLookupFunctionEntry+1314>

According to Windows x64 documentation, xmm1 is a volatile
register:
https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?redirectedfrom=MSDN&view=msvc-170

I think the solution is for dll's delaylib trampoline to
save xmm1 on the stack before calling __delayLoadHelper2.
I made a patch which does this, and it fixes the bug for my
code.

See attached patch. I think my patch has two problems:

1. AVX/vmovupd/ymm might not be usable on the target
   machine, but saving just xmm isn't enough. Should we
   perform a CPUID check?
2. We store unaligned with vmovupd. Storing aligned with
   vmovapd would be better. I haven't looked into how to
   align ymm registers when storing on the stack.

I'd love to get this bug fixed so others don't spend two
days debugging assembly code!
Comment 1 strager 2022-05-27 07:03:50 UTC
My patch saves ymm4 and ymm5, but I think that's unnecessary, since they won't be used for parameters.
Comment 2 strager 2022-05-27 07:04:46 UTC
Created attachment 14121 [details]
minimal repro

Attached is a small program (DLL and EXE) which demonstrates the issue.
Comment 3 Alan Modra 2023-05-15 02:29:17 UTC
Created attachment 14880 [details]
patch for base x86_64 arch

This patch just fixes the fp arg problem, leaving vector args for later if it turns out necessary.  I usually don't work on windows bugs, and it's a long time since I maintained x86 binutils so don't be surprised if I got something wrong.  Please test.  I've only tested to the point of seeing that dlltool doesn't crash..
Comment 4 Sourceware Commits 2023-05-25 07:44:47 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7529ff1fcdbe260a0ac84ee8f33f4fa4ee1ac455

commit 7529ff1fcdbe260a0ac84ee8f33f4fa4ee1ac455
Author: Alan Modra <amodra@gmail.com>
Date:   Mon May 15 10:44:29 2023 +0930

    PR29189, dlltool delaylibs corrupt float/double arguments
    
            PR 29189
            * dlltool.c (i386_x64_trampoline): Save and restore xmm0-5.  Make
            use of parameter save area for integer arg regs.  Comment.
Comment 5 Alan Modra 2023-05-25 07:45:52 UTC
Tested and committed for 2.41.