This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/3] Update s_sincosf.c and x86-64 s_sincosf-fma.c
On Mon, Dec 3, 2018 at 10:44 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
> > H.J. Lu wrote:
> > Only sincosf_poly is vectorized. Without changing the existing
> > structure, I need
> > to duplicate everything in sysdeps/ieee754/flt-32/s_sincosf.h.
>
> The only shared function which relies on the structure is reduce_fast.
> To avoid any duplication it could just get the 2 values it needs as arguments.
>
> > My x86-64 vector version has x86-64 specific intrinsics:
> >
> > __v2df vps1c2 = (__v2df) _mm_loadu_pd (&p->s1c2.s1);
> > __v2df vps2c3 = (__v2df) _mm_loadu_pd (&p->s2c3.s2);
> > __v2df vps3c4 = (__v2df) _mm_loadu_pd (&p->s3c4.s3);
>
> Simple vector loads are supported, you don't need any intrinsics for these.
I try to replace
struct
{
double s1;
double c2;
} s1c2;
with __v2df.
> > __v4sf v4sf = _mm_cvtpd_ps (vsincos);
>
> This works fine too for me using v4sf = {(float)vsincos[0], (float)vsincos[1]};
>
I got
vcvtsd2ss %xmm1, %xmm2, %xmm0
vunpckhpd %xmm1, %xmm1, %xmm1
vmovss %xmm0, (%rdi)
vcvtsd2ss %xmm1, %xmm2, %xmm2
vmovss %xmm2, (%rsi)
ret
instead of
vcvtpd2psx %xmm2, %xmm2
vmovss %xmm2, (%rdi)
vextractps $1, %xmm2, (%rsi)
ret
I prefer
__v4sf v4sf = _mm_cvtpd_ps (vsincos);
--
H.J.