In code compiled with the Intel Fortran compiler
(ifort-v11 (IFORT) 16.0.3 20160415) with optimizing, my very old code suddenly started misbehaving, giving NaNs unexpectedly. I traced this to when Debian made the upgrade of libc6 from 2.24-8 --> 2.24-9. I then did some sleuthing to discover that git commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 was the culprit. Reversing this patch allowed things to work (as this binary does on many, many different versions of glibc).
Rather than make a fix myself (I an not qualified), I can describe the behavior: the problem seems to only occur on machines with an avx cpu flag. On machines which list no AVX cpu flags, the bad behavior does not occur. Unfortunately, I have been unable to run this on a machine with AVX512 instructions.
Can you reproduce this with current upstream master? Instructions for running a program against the newly built glibc are here:
I suggest you engage Intel support. Without access to your binary, we cannot tell if this is an Intel compiler issue, or a problem with the dynamic linker trampoline. The trampoline is not obviously wrong.
I have reproduced the behavior with the upstream master, including fixing things by reverting that single commit. Thanks to the good documentation you pointed me to, I was able to run this on more platforms, including a Xeon Phi 7230. Here is a summary of the results:
platform NaN occurs
Xeon Phi 7230 no
Xeon CPU E5-2670 yes
Xeon E5-1650 v2 yes
Pentium 4405 no
AMD Opteron 2218 no
My binary is available for this sort of analysis, but Intel may be the correct place to turn.
This looks like a serious issue with the dynamic loader support code that saves/restores the AVX-related registers.
Do you have time to look into this?
Please verify that
+ cfi_adjust_cfa_offset(16) # Incorporate PLT
+ vorpd %ymm0, %ymm1, %ymm8
+ vorpd %ymm2, %ymm3, %ymm9
+ vorpd %ymm4, %ymm5, %ymm10
+ vorpd %ymm6, %ymm7, %ymm11
+ vorpd %ymm8, %ymm9, %ymm9
+ vorpd %ymm10, %ymm11, %ymm10
+ vpcmpeqd %xmm8, %xmm8, %xmm8
+ vorpd %ymm9, %ymm10, %ymm10
+ vptest %ymm10, %ymm8
is the cause by changing it to
Correct. This patch gets rid of the bad behavior:
diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
index b27fa06974..73c9003006 100644
@@ -66,16 +66,7 @@
- cfi_adjust_cfa_offset(16) # Incorporate PLT
- vorpd %ymm0, %ymm1, %ymm8
- vorpd %ymm2, %ymm3, %ymm9
- vorpd %ymm4, %ymm5, %ymm10
- vorpd %ymm6, %ymm7, %ymm11
- vorpd %ymm8, %ymm9, %ymm9
- vorpd %ymm10, %ymm11, %ymm10
- vpcmpeqd %xmm8, %xmm8, %xmm8
- vorpd %ymm9, %ymm10, %ymm10
- vptest %ymm10, %ymm8
+ jmp _dl_runtime_resolve_avx
# Preserve %ymm0 - %ymm7 registers if the upper 128 bits of any
# %ymm0 - %ymm7 registers aren't zero.
According to x86-64 psABI, xmm0-xmm7 can be used to pass function
parameters. But ICC also uses xmm8-xmm15 to pass function parameters
which violates x86-64 psABI. As a workaround, you can set environment
variable LD_BIND_NOW=1 by
# export LD_BIND_NOW=1
Yes. LD_BIND_NOW=1 fixes things up. Is this a bug in icc/ifort/icpc which should be fixed? I have a bug report open on the compilers. What are the performance implications for LD_BIND_NOW=1? Thanks.
(In reply to H.J. Lu from comment #6)
> According to x86-64 psABI, xmm0-xmm7 can be used to pass function
> parameters. But ICC also uses xmm8-xmm15 to pass function parameters
> which violates x86-64 psABI. As a workaround, you can set environment
> variable LD_BIND_NOW=1 by
> # export LD_BIND_NOW=1
Given that this used to work do we need to carry a fix in glibc for ICC binaries?
Or are you going to take this to the ICC team? Is this fixed in a particular version of ICC?
There is nothing to fix in glibc since it follows x86-64 psABI.
I am discussing with ICC team now to see how to address this.
Thanks. Closing as invalid based on comment 6 and comment 9.
*** This bug has been marked as a duplicate of bug 21265 ***