Bug 29189 - dlltool delaylibs corrupt float/double arguments
Summary: dlltool delaylibs corrupt float/double arguments
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.39
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2022-05-27 07:01 UTC by strager
Modified: 2022-05-27 09:32 UTC (History)
2 users (show)

See Also:
Target: windows-amd64
Last reconfirmed:

semi-tested bug fix (394 bytes, patch)
2022-05-27 07:01 UTC, strager
Details | Diff
minimal repro (673 bytes, application/zip)
2022-05-27 07:04 UTC, strager

Note You need to log in before you can comment on or make changes to this bug.
Description strager 2022-05-27 07:01:59 UTC
Created attachment 14120 [details]
semi-tested bug fix

(This report was originally posted on the mailing list: https://lists.gnu.org/archive/html/bug-binutils/2022-05/msg00099.html)

I am calling a function in another x64 DLL with the
following C signature:

    int napi_create_double(void*, double, void*);

The first time I call this function, the 'double' argument
ends up as 1.20305e-307 inside napi_create_double, no matter
what value the caller gives. The 'double' is corrupted.
Calls after the first don't corrupt the 'double'.

The cause is ntdll.dll, eventually called by MinGW's
__delayLoadHelper2, modifying the xmm1 register:

#0  0x00007ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from C:\WINDOWS\System32\KernelBase.dll
#4  0x00007ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70 <__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>,
        ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>)
        at C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209
#5  0x00007ffcd7b717c9 in __tailMerge_node_napi_lib ()
       from MYDLL.dll
#6  0x000002ad2fe84c50 in ?? ()

   0x00007ffd26ce2ffb <+1051>:  movups (%rdx),%xmm0
   0x00007ffd26ce2ffe <+1054>:  movups %xmm0,(%rsi)
   0x00007ffd26ce3001 <+1057>:  movsd  0x10(%rdx),%xmm1
=> 0x00007ffd26ce3006 <+1062>:  movsd  %xmm1,0x10(%rsi)
   0x00007ffd26ce300b <+1067>:  mov    (%rsi),%rbp
   0x00007ffd26ce300e <+1070>:  mov    %r11,%rax
   0x00007ffd26ce3011 <+1073>:  lock cmpxchg %r12,0x1384d6(%rip)        # 0x7ffd26e1b4f0
   0x00007ffd26ce301a <+1082>:  jne    0x7ffd26ce3102 <ntdll!RtlLookupFunctionEntry+1314>

According to Windows x64 documentation, xmm1 is a volatile

I think the solution is for dll's delaylib trampoline to
save xmm1 on the stack before calling __delayLoadHelper2.
I made a patch which does this, and it fixes the bug for my

See attached patch. I think my patch has two problems:

1. AVX/vmovupd/ymm might not be usable on the target
   machine, but saving just xmm isn't enough. Should we
   perform a CPUID check?
2. We store unaligned with vmovupd. Storing aligned with
   vmovapd would be better. I haven't looked into how to
   align ymm registers when storing on the stack.

I'd love to get this bug fixed so others don't spend two
days debugging assembly code!
Comment 1 strager 2022-05-27 07:03:50 UTC
My patch saves ymm4 and ymm5, but I think that's unnecessary, since they won't be used for parameters.
Comment 2 strager 2022-05-27 07:04:46 UTC
Created attachment 14121 [details]
minimal repro

Attached is a small program (DLL and EXE) which demonstrates the issue.