This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

_dl_runtime_resolve_avx_slow clobbering xmm8

From: Ivan Tubert-Brohman <ivan dot tubert-brohman at schrodinger dot com>
To: libc-help at sourceware dot org, "Colvin,Tor" <colvin at schrodinger dot com>
Date: Fri, 25 Aug 2017 14:58:11 -0400
Subject: _dl_runtime_resolve_avx_slow clobbering xmm8
Authentication-results: sourceware.org; auth=none

TL;DR: we found that _dl_runtime_resolve_avx_slow clobbers xmm8, but
code generated by the Intel fortran compiler assumes the persistence
of xmm8 when calling a shared library function. Which one is wrong?

Long version:

We noticed strange behavior in our software after upgrading to RHEL
7.4. We were able to reproduce the bug with the simplified Fortran
function below (compiled with ifort 17.0.1):

      subroutine multbox(actmin, actmax, bsize, nlev)
      ! adjust the size of a box with corners in actmin, actmax
      ! to be a multiple of 2**(nlev-1)*bsize.
      implicit none
      real*8 bsize
      integer k, lbig, ldiv, length, lnew, nlev
      real*8 actmin(2), actmax(2)

      do k = 1, 3
        length = int((actmax(k)-actmin(k))/bsize)+1
        ldiv = 2**(nlev-1)
        lbig = length/ldiv+1
        lnew = lbig*ldiv
        actmax(k) = actmin(k)+lnew*bsize
      enddo

      return
      end subroutine multbox

Called twice with the same arguments, we get the wrong results the first time.

For example, calling the function with certain values, we expect:
 actmax after call 1   22.3282900000000        21.7469530000000
 actmax after call 2   22.3282900000000        21.7469530000000

Observed result:
 actmax after call 1  -536.000000000000       -536.004394531251
 actmax after call 2   22.3282900000000        21.7469530000000

We stepped through the function and found that the problem is that the
first time __svml_idiv4 (an Intel runtime library function which
apparently divides integers in xmm0 by xmm1 and stores the results in
xmm0) is called, the value of the xmm8 register gets clobbered, but
the code in multbox_ assumes that it will preserve its value. Stepping
into that first call, we found that with glibc-2.17-196.el7.x86_64
(found in RHEL 7.4), the loading involves the recently introduced
_dl_runtime_resolve_avx_slow, which clobbers xmm8; with an older
version of glibc (we tried glibc-2.17-78.el7.x86_64), the loading
involves _dl_runtime_resolve, which doesn't affect xmm8.

My question here is, who is at fault? Is ifort making unfounded
assumptions about the persistence of xmm8, or is
_dl_runtime_resolve_avx_slow wrong in not preserving it? I looked at
the latter's code and it looks like it tries to preserve xmm0-xmm7,
but not xmm8.

The problem goes away with LD_BIND_NOW, but that's not an option in production.

We'll file a ticket with Intel, but I'm interested in hearing the
glibc perspective on this question.

Thanks,
Ivan

PS: For reference, here's the disassembled multbox function:

   0x0000000000403080 <+0>:     push   %r13
   0x0000000000403082 <+2>:     push   %r14
   0x0000000000403084 <+4>:     push   %rbx
   0x0000000000403085 <+5>:     mov    %rsi,%r13
   0x0000000000403088 <+8>:     mov    %rdi,%r14
   0x000000000040308b <+11>:    mov    $0x1,%eax
   0x0000000000403090 <+16>:    movsd  (%rdx),%xmm11
   0x0000000000403095 <+21>:    xor    %ebx,%ebx
   0x0000000000403097 <+23>:    movaps %xmm11,%xmm9
   0x000000000040309b <+27>:    movups 0x0(%r13),%xmm2
   0x00000000004030a0 <+32>:    movups (%r14),%xmm10
   0x00000000004030a4 <+36>:    subpd  %xmm10,%xmm2
   0x00000000004030a9 <+41>:    unpcklpd %xmm9,%xmm9
   0x00000000004030ae <+46>:    divpd  %xmm9,%xmm2
   0x00000000004030b3 <+51>:    mov    (%rcx),%ecx
   0x00000000004030b5 <+53>:    dec    %ecx
   0x00000000004030b7 <+55>:    shl    %cl,%eax
   0x00000000004030b9 <+57>:    cmp    $0x1f,%ecx
   0x00000000004030bc <+60>:    cvttpd2dq %xmm2,%xmm0
   0x00000000004030c0 <+64>:    cmovbe %eax,%ebx
   0x00000000004030c3 <+67>:    movdqu 0x81794(%rip),%xmm12        # 0x484860
   0x00000000004030cc <+76>:    paddd  %xmm12,%xmm0
   0x00000000004030d1 <+81>:    movd   %ebx,%xmm8
   0x00000000004030d6 <+86>:    pshufd $0x0,%xmm8,%xmm8
   0x00000000004030dc <+92>:    movlhps %xmm8,%xmm8     # puts something in xmm8
   0x00000000004030e0 <+96>:    movdqa %xmm8,%xmm1
   0x00000000004030e5 <+101>:   callq  0x402610 <__svml_idiv4@plt>   #
the problematic call
   0x00000000004030ea <+106>:   movsd  0x10(%r14),%xmm5
   0x00000000004030f0 <+112>:   paddd  %xmm12,%xmm0
   0x00000000004030f5 <+117>:   movsd  0x10(%r13),%xmm3
   0x00000000004030fb <+123>:   movaps %xmm8,%xmm2     # this expects
that xmm8 still has the value set above.
   0x00000000004030ff <+127>:   pmuludq %xmm0,%xmm2
   0x0000000000403103 <+131>:   subsd  %xmm5,%xmm3
   0x0000000000403107 <+135>:   divsd  %xmm11,%xmm3
   0x000000000040310c <+140>:   cvttsd2si %xmm3,%eax
   0x0000000000403110 <+144>:   inc    %eax
   0x0000000000403112 <+146>:   psrlq  $0x20,%xmm8
   0x0000000000403118 <+152>:   cltd
   0x0000000000403119 <+153>:   idiv   %ebx
   0x000000000040311b <+155>:   psrlq  $0x20,%xmm0
   0x0000000000403120 <+160>:   pxor   %xmm4,%xmm4
   0x0000000000403124 <+164>:   pmuludq %xmm0,%xmm8
   0x0000000000403129 <+169>:   inc    %eax
   0x000000000040312b <+171>:   imul   %eax,%ebx
   0x000000000040312e <+174>:   pand   0x8173a(%rip),%xmm2        # 0x484870
   0x0000000000403136 <+182>:   psllq  $0x20,%xmm8
   0x000000000040313c <+188>:   por    %xmm8,%xmm2
   0x0000000000403141 <+193>:   cvtdq2pd %xmm2,%xmm1
   0x0000000000403145 <+197>:   cvtsi2sd %ebx,%xmm4
   0x0000000000403149 <+201>:   mulpd  %xmm1,%xmm9
   0x000000000040314e <+206>:   mulsd  %xmm4,%xmm11
   0x0000000000403153 <+211>:   addpd  %xmm9,%xmm10
   0x0000000000403158 <+216>:   addsd  %xmm11,%xmm5
   0x000000000040315d <+221>:   movups %xmm10,0x0(%r13)
   0x0000000000403162 <+226>:   movsd  %xmm5,0x10(%r13)
   0x0000000000403168 <+232>:   pop    %rbx
   0x0000000000403169 <+233>:   pop    %r14
   0x000000000040316b <+235>:   pop    %r13
   0x000000000040316d <+237>:   retq
   0x000000000040316e <+238>:   xchg   %ax,%ax

Follow-Ups:
- Re: _dl_runtime_resolve_avx_slow clobbering xmm8
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]