Created attachment 14120 [details] semi-tested bug fix (This report was originally posted on the mailing list: https://lists.gnu.org/archive/html/bug-binutils/2022-05/msg00099.html) I am calling a function in another x64 DLL with the following C signature: int napi_create_double(void*, double, void*); The first time I call this function, the 'double' argument ends up as 1.20305e-307 inside napi_create_double, no matter what value the caller gives. The 'double' is corrupted. Calls after the first don't corrupt the 'double'. The cause is ntdll.dll, eventually called by MinGW's __delayLoadHelper2, modifying the xmm1 register: #0 0x00007ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from C:\WINDOWS\SYSTEM32\ntdll.dll #1 0x00007ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #2 0x00007ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #3 0x00007ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from C:\WINDOWS\System32\KernelBase.dll #4 0x00007ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70 <__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>, ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>) at C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209 #5 0x00007ffcd7b717c9 in __tailMerge_node_napi_lib () from MYDLL.dll #6 0x000002ad2fe84c50 in ?? () 0x00007ffd26ce2ffb <+1051>: movups (%rdx),%xmm0 0x00007ffd26ce2ffe <+1054>: movups %xmm0,(%rsi) 0x00007ffd26ce3001 <+1057>: movsd 0x10(%rdx),%xmm1 => 0x00007ffd26ce3006 <+1062>: movsd %xmm1,0x10(%rsi) 0x00007ffd26ce300b <+1067>: mov (%rsi),%rbp 0x00007ffd26ce300e <+1070>: mov %r11,%rax 0x00007ffd26ce3011 <+1073>: lock cmpxchg %r12,0x1384d6(%rip) # 0x7ffd26e1b4f0 0x00007ffd26ce301a <+1082>: jne 0x7ffd26ce3102 <ntdll!RtlLookupFunctionEntry+1314> According to Windows x64 documentation, xmm1 is a volatile register: https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?redirectedfrom=MSDN&view=msvc-170 I think the solution is for dll's delaylib trampoline to save xmm1 on the stack before calling __delayLoadHelper2. I made a patch which does this, and it fixes the bug for my code. See attached patch. I think my patch has two problems: 1. AVX/vmovupd/ymm might not be usable on the target machine, but saving just xmm isn't enough. Should we perform a CPUID check? 2. We store unaligned with vmovupd. Storing aligned with vmovapd would be better. I haven't looked into how to align ymm registers when storing on the stack. I'd love to get this bug fixed so others don't spend two days debugging assembly code!
My patch saves ymm4 and ymm5, but I think that's unnecessary, since they won't be used for parameters.
Created attachment 14121 [details] minimal repro Attached is a small program (DLL and EXE) which demonstrates the issue.
Created attachment 14880 [details] patch for base x86_64 arch This patch just fixes the fp arg problem, leaving vector args for later if it turns out necessary. I usually don't work on windows bugs, and it's a long time since I maintained x86 binutils so don't be surprised if I got something wrong. Please test. I've only tested to the point of seeing that dlltool doesn't crash..
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7529ff1fcdbe260a0ac84ee8f33f4fa4ee1ac455 commit 7529ff1fcdbe260a0ac84ee8f33f4fa4ee1ac455 Author: Alan Modra <amodra@gmail.com> Date: Mon May 15 10:44:29 2023 +0930 PR29189, dlltool delaylibs corrupt float/double arguments PR 29189 * dlltool.c (i386_x64_trampoline): Save and restore xmm0-5. Make use of parameter save area for integer arg regs. Comment.
Tested and committed for 2.41.