This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] How to add vector math functions to Glibc


Hi all,

just 2cts from someone who wrote a couple of libm functions alreday in his life:

Joseph S. Myers wrote on 30/09/2014 18:35:

+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+#  undef __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_SSE4
+#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
+#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")

I think there should be a comment pointing to the ABI/API documentation
that says what function versions this pragma defines to be available and
guaranteeing that it will not be redefined to e.g. say that AVX512 is
available so that existing headers will work with future compilers (but
another pragma will be needed if in future AVX512 versions are added).


Yeah, the ABI/API is not quite self-documenting with functions declared as follows:

Andrew Senkevich wrote on 30/09/2014 17:00:
+#include <sysdep.h>
+
+ .text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *
+ *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of
correct mantissa ) implementation )
+ *
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+

Of course, there are comments in the code about how the algorithm works but the code mainly is assembly with lots of magic numbers everywhere.

Frankly speaking, I have trouble seeing the difference between that code and a binary blob. Yes, this last remark is polemic.


+# elif defined _CILKPLUS && _CILKPLUS >= 0
+/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
+#  undef __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_SSE4
+#  define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
+#  define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
+  nomask)))

To be namespace-clean, you have to use reserved-namespace versions of
attributes.  That is, __vector__, __nomask__, __processor__ and
__core_i7_sse4_2__.

+ .align 64
+ .globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+ .long 4294967295

What are the semantics of the values in this table (please add a comment)?
How was this table generated?


Yeah, who codes floating-point values as (little-endian ?) memory notation in decimal? I would understand hexadecimal but decimal?

As is, the code is unmaintainable.

+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,1600

.size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data

seems better than hardcoding another magic number for the size here.


Yeah, so in conclusion: is there any technical rationale why a compiler couldn't produce vectorized libm function suitable for the purpose of gcc/cilk integration?

Best Regards,

Christoph Lauter


--
Christoph Lauter
Maître de conférences - Associate Professor
Équipe PEQUAN - LIP6 - UPMC Paris 6
4, place Jussieu, 75252 Paris Cedex 05, 26-00/301
Tel.: +33144278029 / +33182521777
http://www.christoph-lauter.org/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]