Attachment 'VectorABI.txt'


   1 Vector Function Application Binary Interface Specification for OpenMP
   3 1. Vector Function ABI Overview
   5 Vector Function ABI provides ABI for vector functions generated by compiler supporting SIMD constructs of OpenMP 4.0 [1].
   6 The use of a SIMD construct for a function declaration or definition enables the creation of vector versions of the function from the scalar version of the function that can be used to process multiple instances concurrently in a single invocation in a vector context (e.g. vectorized loops). 
   7 Vector Function ABI defines a set of rules that the caller and the callee functions must obey.
   8 These rules consist of:
   9     * Calling convention
  10     * Vector length (the number of concurrent scalar invocations to be processed per invocation of the vector function)
  11     * Mapping from element data types to vector data types
  12     * Ordering of vector arguments
  13     * Vector function masking
  14     * Vector function name mangling
  15     * Compiler generated variants of vector function
  17 Vector Function ABI makes possible to know exact list of available vector function implementations provided by some library based on OpenMP pragma found in the function`s prototype in library headers.
  19 2. Vector Function ABI
  21 2.1. Calling Convention
  23 The vector functions should use calling convention described in section 3.2 Function Calling Sequence of original AMD64 ABI document. 
  25 2.2. Vector Length
  27 Every vector variant of a SIMD-enabled function has a vector length (VLEN).
  28 If OpenMP clause "simdlen" is used, the VLEN is the value of the argument of that clause. The VLEN value must be power of 2.
  29 In other case the notion of the function`s "characteristic data type" (CDT) is used to compute the vector length.
  30 CDT is defined in the following order:
  31     a) For non-void function, the CDT is the return type.
  32     b) If the function has any non-uniform, non-linear parameters, then the CDT is the type of the first such parameter.
  33     c) If the CDT determined by a) or b) above is struct, union, or class type which is pass-by-value (except for the type that maps to the built-in complex data type), the characteristic data type is int.
  34     d) If none of the above three cases is applicable, the CDT is int.
  35     e) For Intel(R) Xeon(TM) Phi(TM) native and offload compilation, if the resulting characteristic data type is 8-bit or 16-bit integer data type, the characteristic data type is int.
  36 The VLEN is then determined based on the CDT and the size of vector register of that ISA for which current vector version is generated. The VLEN is computed using the formula below:
  38 VLEN  = sizeof(vector_register) / sizeof(CDT),
  39 where vector register size specified in section 3.2.1 Registers and the Stack Frame of original AMD64 ABI document.
  41 For example, if ISA is SSE and CDT of the function is "int", the VLEN is 4.
  43 Below we mean under SSE ISA all of SSE2/SSE3/SSSE3/SSE4 ISAs.
  45 2.3. Element Data Type to Vector Data Type Mapping
  47 The vector data types for parameters are selected depending on ISA, vector length, data type of original parameter, and parameter specification.
  48 For uniform and linear parameters (detailed description could be found in [1]), the original data type is preserved.
  49 For vector parameters, vector data types are selected by the compiler. The mapping from element data type to vector data type is described as below.
  50 * The bit size of vector data type of parameter is computed as: 
  51 size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8
  52 For instance, for SSE version of vector function with parameter data type "int":
  53 If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which means one argument of type __m128 to be passed.
  54 * If the size_of_vector_data_type is greater than the width of the vector register, multiple vector registers are selected and the parameter will be passed in multiple vector registers.
  55 For instance, for SSE version of vector function with parameter data type "int": If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the vector data type is __m256, which means 2 arguments of type __m128 are to be passed.
  57 2.4. Ordering of Vector Arguments
  59 When a parameter in the original data type results in one argument in the vector function, the ordering rule is a simple one to one match with the original argument order.
  60 For example, when the original  argument list is (int a, float b, int c), VLEN is 4, the ISA is SSE, and all a, b, and c are classified  vector parameters, the vector function argument list becomes (__m128i vec_a, __m128 vec_b, __m128i vec_c). 
  61 There are cases where a single parameter in the original data type results in the multiple arguments in the vector function. Those addition second and subsequent arguments are inserted in the argument list right after the corresponding first argument, not appended to the end of the argument list of the vector function. For example, the original argument list is (int a, float b, int c), VLEN is 8, the ISA is SSE, and all a, b, and c are classified as vector parameters, the vector function argument list becomes (__m128i vec_a1, __m128i vec_a2, __m128 vec_b1, __m128 vec_b2, __m128i vec_c1, __m128i vec_c2).
  63 2.5. Masking of Vector Function
  65 Masked vector function variant used for invocation in conditional statement (please refer to [1] for detailed information) additionally takes an implicit mask argument, which disables processing of some of the vector lanes. For masked vector functions, the additional "mask" parameters are required.
  66 Each element of "mask" parameters has the data type of the CDT (see Section 2.2). The number of mask parameters is the same as number of parameters required to pass the vector of CDT for the given vector length. The value of a mask parameter must be either bit patterns of all ones or all zeros for each element.
  67 For the MIC target, the mask parameters are collection of 1-bit masks in unsigned integers. The total number of mask bits is equal to VLEN. The number of mask parameters is equal to the number of parameters for the vector of characteristic data type. The mask bits occupy the least significant bits of unsigned integer. For example, if the characteristic data type is double and VLEN is 16, there are 16 mask bits stored in two unsigned integers.
  68 For each element of the vector, if the corresponding mask value is zero, the return value associated to that element is zero. Mask parameters are passed after all other parameters in the same order of parameters that they are apply to.
  70 2.6. Vector Function Name Mangling
  72 The name mangling of the generated vector function based on vector annotation is important part of Vector ABI. It allows the caller and the callee functions to be compiled in separate files or compilation units. Using the function prototype in header files to communicate vector function annotation information, the compiler can perform function matching while vectorizing code at call sites. 
  74 The vector function name is mangled as the concatenation of the following items:
  76 <vector_prefix> <isa> <mask> <vlen> <vparameters> '_' <original_name>
  78 The descriptions of each item are:
  79 * <vector_prefix>
  80     string "_ZGV"
  82 * <original_name>
  83     name of scalar function, including C++ and Fortran mangling
  85 * <isa>
  86     'b'    // SSE
  87     | 'c'  // AVX
  88     | 'd'  // AVX2
  89     | 'e'  // AVX512
  91 * <mask>
  92     'M'    // masked version
  93     | 'N'  // unmasked version
  95 * <vlen>
  96     decimal-number
  98 * <vparameters>
  99     /* empty */
 100     <vparameter> <opt-align> <vparameters>
 101         o <vparameter>
 102         (please refer to [1] for information about parameter types used below)
 103             's' decimal-number // linear parameter, variable stride ,
 104                                // decimal number is the position # of
 105                                // stride argument, which starts from 0
 106             | 'l' <number>     // linear parameter, constant stride
 107             | 'u'              // uniform parameter  
 108             | 'v'              // vector parameter
 109                o <number>
 110                    [n] non-negative decimal integer  // n indicates negative
 111         o <opt-align>  
 112             /* empty */
 113             | 'a' non-negative-decimal-number
 115 Please refer to section 2.7 Compiler generated variants of vector function for examples of vector function mangling.
 117 2.7. Compiler generated variants of vector function
 119 Compiler's architecture selection flag has no impact on ISA selection for the generated vector variants.
 120 Vector variants should be generated by compiler for SSE, AVX, AVX2, AVX512 ISAs, both masked and unmasked versions for each ISA (if one of them is not specified with according clause).
 121 Compiler implementations must not generate calls to version of other ISAs unless some non-standard pragma or clause is used to declare those other versions are available.
 123 Example 1.
 124 #pragma omp declare simd uniform(q) aligned(q:16) linear(k:1)
 125 float foo(float *q, float x, int k)
 126 {
 127     q[k] = q[k] + x;
 128     return q[k];
 129 }
 131 Below is the list of generated function names or list of symbols provided by library with the same pragma in "foo" prototype.
 133 1) _ZGVbN4ua16vl_foo (SSE ISA, unmasked version)
 134 2) _ZGVbM4ua16vl_foo (SSE ISA, masked version)
 135 3) _ZGVcN8ua16vl_foo (AVX ISA, unmasked version)
 136 4) _ZGVcM8ua16vl_foo (AVX ISA, masked version)
 137 5) _ZGVdN8ua16vl_foo (AVX2 ISA, unmasked version)
 138 6) _ZGVdM8ua16vl_foo (AVX2 ISA, masked version)
 139 7) _ZGVeN16ua16vl_foo (AVX512 ISA, unmasked version)
 140 8) _ZGVeM16ua16vl_foo (AVX512 ISA, masked version)
 142 Where the "foo" is the original mangled function name, "_ZGV" is the prefix of the vector function name, "b" indicates the SSE ISA, "c" indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that this is a unmasked version, "M" indicates that this is a masked version, "4" is the vector length for SSE ISA, "8" is the vector length for AVX and AVX2 ISA, "ua16" indicates uniform(q) and align(a:32), "v" indicates second argument x is vector argument, "l" indicates linear(k:1) - k is a linear variable whose stride is 1
 144 Example 2.
 145 #pragma omp declare simd notinbranch
 146 double foo(double x)
 147 {
 148     return x*x;
 149 }
 151 Below is the list of generated function names or list of symbols provided by library with the same pragma in "foo" prototype.
 153 1) _ZGVbN2v_foo (SSE ISA)
 154 2) _ZGVcN4v_foo (AVX ISA)
 155 3) _ZGVdN4v_foo (AVX2 ISA)
 156 3) _ZGVeN8v_foo (AVX512 ISA)
 158 Where the "foo" is the original mangled function name, "_ZGV" is the prefix of the vector function name, "b" indicates the SSE ISA, "c" indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that this is a unmasked version, "2" is the vector length for SSE ISA, "4" is the vector length for AVX and AVX2 ISA, "v" indicates single argument x is vector argument.
 160 3. References
 162 [1] OpenMP 4.0 Specification 
 165 [2] Intel Vector ABI

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2015-06-08 15:26:59, 10.9 KB) [[attachment:VectorABI.txt]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.