Summary: | NaN generation by optimized math functions | ||
---|---|---|---|
Product: | glibc | Reporter: | Charles Schwieters <charles> |
Component: | dynamic-link | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | carlos, charles, fweimer, hjl.tools |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.24 | ||
Target Milestone: | --- | ||
See Also: |
https://bugzilla.redhat.com/show_bug.cgi?id=1421155 https://sourceware.org/bugzilla/show_bug.cgi?id=20495 https://sourceware.org/bugzilla/show_bug.cgi?id=21265 https://sourceware.org/bugzilla/show_bug.cgi?id=22636 |
||
Host: | Target: | ||
Build: | Last reconfirmed: | 2017-03-14 00:00:00 |
Description
Charles Schwieters
2017-03-08 19:26:56 UTC
Can you reproduce this with current upstream master? Instructions for running a program against the newly built glibc are here: https://sourceware.org/glibc/wiki/Testing/Builds#Compile_normally.2C_run_under_new_glibc I suggest you engage Intel support. Without access to your binary, we cannot tell if this is an Intel compiler issue, or a problem with the dynamic linker trampoline. The trampoline is not obviously wrong. I have reproduced the behavior with the upstream master, including fixing things by reverting that single commit. Thanks to the good documentation you pointed me to, I was able to run this on more platforms, including a Xeon Phi 7230. Here is a summary of the results: platform NaN occurs Xeon Phi 7230 no Xeon CPU E5-2670 yes Xeon E5-1650 v2 yes Pentium 4405 no AMD Opteron 2218 no My binary is available for this sort of analysis, but Intel may be the correct place to turn. H.J., This looks like a serious issue with the dynamic loader support code that saves/restores the AVX-related registers. Do you have time to look into this? Please verify that +_dl_runtime_resolve_avx_slow: + cfi_startproc + cfi_adjust_cfa_offset(16) # Incorporate PLT + vorpd %ymm0, %ymm1, %ymm8 + vorpd %ymm2, %ymm3, %ymm9 + vorpd %ymm4, %ymm5, %ymm10 + vorpd %ymm6, %ymm7, %ymm11 + vorpd %ymm8, %ymm9, %ymm9 + vorpd %ymm10, %ymm11, %ymm10 + vpcmpeqd %xmm8, %xmm8, %xmm8 + vorpd %ymm9, %ymm10, %ymm10 + vptest %ymm10, %ymm8 is the cause by changing it to _dl_runtime_resolve_avx_slow: jmp _dl_runtime_resolve_avx Correct. This patch gets rid of the bad behavior: diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h index b27fa06974..73c9003006 100644 --- a/sysdeps/x86_64/dl-trampoline.h +++ b/sysdeps/x86_64/dl-trampoline.h @@ -66,16 +66,7 @@ .align 16 _dl_runtime_resolve_avx_slow: cfi_startproc - cfi_adjust_cfa_offset(16) # Incorporate PLT - vorpd %ymm0, %ymm1, %ymm8 - vorpd %ymm2, %ymm3, %ymm9 - vorpd %ymm4, %ymm5, %ymm10 - vorpd %ymm6, %ymm7, %ymm11 - vorpd %ymm8, %ymm9, %ymm9 - vorpd %ymm10, %ymm11, %ymm10 - vpcmpeqd %xmm8, %xmm8, %xmm8 - vorpd %ymm9, %ymm10, %ymm10 - vptest %ymm10, %ymm8 + jmp _dl_runtime_resolve_avx # Preserve %ymm0 - %ymm7 registers if the upper 128 bits of any # %ymm0 - %ymm7 registers aren't zero. PRESERVE_BND_REGS_PREFIX According to x86-64 psABI, xmm0-xmm7 can be used to pass function parameters. But ICC also uses xmm8-xmm15 to pass function parameters which violates x86-64 psABI. As a workaround, you can set environment variable LD_BIND_NOW=1 by # export LD_BIND_NOW=1 Yes. LD_BIND_NOW=1 fixes things up. Is this a bug in icc/ifort/icpc which should be fixed? I have a bug report open on the compilers. What are the performance implications for LD_BIND_NOW=1? Thanks. (In reply to H.J. Lu from comment #6) > According to x86-64 psABI, xmm0-xmm7 can be used to pass function > parameters. But ICC also uses xmm8-xmm15 to pass function parameters > which violates x86-64 psABI. As a workaround, you can set environment > variable LD_BIND_NOW=1 by > > # export LD_BIND_NOW=1 Given that this used to work do we need to carry a fix in glibc for ICC binaries? Or are you going to take this to the ICC team? Is this fixed in a particular version of ICC? There is nothing to fix in glibc since it follows x86-64 psABI. I am discussing with ICC team now to see how to address this. *** This bug has been marked as a duplicate of bug 21265 *** |