This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] How to add vector math functions to Glibc
- From: Christoph Lauter <christoph dot lauter at lip6 dot fr>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Andrew Senkevich <andrew dot n dot senkevich at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Tue, 30 Sep 2014 20:40:46 +0200
- Subject: Re: [RFC] How to add vector math functions to Glibc
- Authentication-results: sourceware.org; auth=none
- References: <CAMXFM3tjquzniXP1weqxSVFJyhXqsf2PHuyrrrmqp7K0ZzORqA at mail dot gmail dot com> <CAMXFM3sGMNX1DEPAMt7qUR4UREF_xUAQjCG1OjBiZH2aoOFiPA at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1409181551370 dot 31607 at digraph dot polyomino dot org dot uk> <CAMXFM3tO7MTYCq8-YFZacdbLvR4iAab_n04AuB+rp2Phs4BvQg at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1409242011260 dot 7597 at digraph dot polyomino dot org dot uk> <CAMXFM3tqiqUNuSU2KXvAFM-QescX3+6xUO9=z5X0Ac6C9qJ7zg at mail dot gmail dot com> <CAMe9rOq7bZHb8R=opUzSmAMGWjLpX21mR=Sx96cuBph=TTtDXA at mail dot gmail dot com> <54246CB5 dot 7020908 at redhat dot com> <CAMe9rOoLmJ2jHWmERoB0M83WNKovJOgh0--Kquw9O86A1tPU0g at mail dot gmail dot com> <5424733D dot 6010305 at redhat dot com> <CAMe9rOpacze055qyBFzz3M-b-GNtXCqZzMmkScBL9a94zVj28g at mail dot gmail dot com> <54247FAB dot 6050002 at redhat dot com> <CAMXFM3v8narOLMHC5U=fvyTFWV6s4ZACN-UrAC4fAcUs9SOFfA at mail dot gmail dot com> <54257507 dot 9070508 at redhat dot com> <CAMXFM3vOLspQtHxgJfD_Emht480w2RMbiwnEH6A_LhoS-JZFag at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1409301620020 dot 15186 at digraph dot polyomino dot org dot uk>
Hi all,
just 2cts from someone who wrote a couple of libm functions alreday in
his life:
Joseph S. Myers wrote on 30/09/2014 18:35:
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+# undef __DECL_SIMD_AVX2
+# undef __DECL_SIMD_SSE4
+# define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
+# define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
I think there should be a comment pointing to the ABI/API documentation
that says what function versions this pragma defines to be available and
guaranteeing that it will not be redefined to e.g. say that AVX512 is
available so that existing headers will work with future compilers (but
another pragma will be needed if in future AVX512 versions are added).
Yeah, the ABI/API is not quite self-documenting with functions declared
as follows:
Andrew Senkevich wrote on 30/09/2014 17:00:
+#include <sysdep.h>
+
+ .text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *
+ * ( low accuracy ( < 4ulp ) or enhanced performance ( half of
correct mantissa ) implementation )
+ *
+ * Argument representation:
+ * arg + Pi/2 = (N*Pi + R)
+ *
+ * Result calculation:
+ * cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ * sin(R) is approximated by corresponding polynomial
+ */
+ pushq %rbp
+ movq %rsp, %rbp
+ andq $-64, %rsp
+ subq $448, %rsp
+ movq __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+ vmovapd %ymm0, %ymm1
+ vmovupd 192(%rax), %ymm4
+ vmovupd 256(%rax), %ymm5
+
Of course, there are comments in the code about how the algorithm works
but the code mainly is assembly with lots of magic numbers everywhere.
Frankly speaking, I have trouble seeing the difference between that code
and a binary blob. Yes, this last remark is polemic.
+# elif defined _CILKPLUS && _CILKPLUS >= 0
+/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
+# undef __DECL_SIMD_AVX2
+# undef __DECL_SIMD_SSE4
+# define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
+# define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
+ nomask)))
To be namespace-clean, you have to use reserved-namespace versions of
attributes. That is, __vector__, __nomask__, __processor__ and
__core_i7_sse4_2__.
+ .align 64
+ .globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+ .long 4294967295
What are the semantics of the values in this table (please add a comment)?
How was this table generated?
Yeah, who codes floating-point values as (little-endian ?) memory
notation in decimal? I would understand hexadecimal but decimal?
As is, the code is unmaintainable.
+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,1600
.size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data
seems better than hardcoding another magic number for the size here.
Yeah, so in conclusion: is there any technical rationale why a compiler
couldn't produce vectorized libm function suitable for the purpose of
gcc/cilk integration?
Best Regards,
Christoph Lauter
--
Christoph Lauter
Maître de conférences - Associate Professor
Équipe PEQUAN - LIP6 - UPMC Paris 6
4, place Jussieu, 75252 Paris Cedex 05, 26-00/301
Tel.: +33144278029 / +33182521777
http://www.christoph-lauter.org/