This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/3] Update s_sincosf.c and x86-64 s_sincosf-fma.c


On Mon, Dec 3, 2018 at 10:44 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
> > H.J. Lu wrote:
> > Only sincosf_poly is vectorized.  Without changing the existing
> > structure, I need
> > to duplicate everything in sysdeps/ieee754/flt-32/s_sincosf.h.
>
> The only shared function which relies on the structure is reduce_fast.
> To avoid any duplication it could just get the 2 values it needs as arguments.
>
> > My x86-64 vector version has x86-64 specific intrinsics:
> >
> >  __v2df vps1c2 = (__v2df) _mm_loadu_pd (&p->s1c2.s1);
> >  __v2df vps2c3 = (__v2df) _mm_loadu_pd (&p->s2c3.s2);
> >  __v2df vps3c4 = (__v2df) _mm_loadu_pd (&p->s3c4.s3);
>
> Simple vector loads are supported, you don't need any intrinsics for these.

I try to replace

  struct
    {
      double s1;
      double c2;
    } s1c2;

with __v2df.

> >  __v4sf v4sf = _mm_cvtpd_ps (vsincos);
>
> This works fine too for me using v4sf = {(float)vsincos[0], (float)vsincos[1]};
>

I got

        vcvtsd2ss       %xmm1, %xmm2, %xmm0
        vunpckhpd       %xmm1, %xmm1, %xmm1
        vmovss  %xmm0, (%rdi)
        vcvtsd2ss       %xmm1, %xmm2, %xmm2
        vmovss  %xmm2, (%rsi)
        ret

instead of

        vcvtpd2psx      %xmm2, %xmm2
        vmovss  %xmm2, (%rdi)
        vextractps      $1, %xmm2, (%rsi)
        ret

I prefer

 __v4sf v4sf = _mm_cvtpd_ps (vsincos);

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]