[PATCH PR96053] Add "#pragma GCC no_reduc_chain"

Wed Jul 22 07:02:05 GMT 2020

On Wed, 22 Jul 2020, zhoukaipeng (A) wrote:

> Hi,
> 
> It is the patch to add "#pragma GCC no_reduc_chain" for pr96053.  It 
> only completes the front end of C language.
> 
> For the testcase, it successfully skipped doing slp by finding sequences 
> from reduction chains.  Without "#pragma GCC no_reduc_chain", it will 
> fail to do vectorization.
> 
> Please help to check if there is any problem.  If there is no problem, I 
> will continue to complete the front end of the remaining languages.

First of all I think giving users more control over vectorization is
good.  Now as for "#pragma GCC no_reduc_chain" I'd like to avoid
negatives and terms internal to GCC.  I also would like to see
vectorization pragmas to be grouped somehow, also to avoid bit
explosion in struct loop.  There's already annot_expr_no_vector_kind
and annot_expr_vector_kind both only used by the fortran FE at
the moment.  Note ANNOATE_EXPR already allows an extra argument
thus only annot_expr_vector_kind should prevail with its argument
specifying a bitmask of vectorizer hints.  We'd have an extra
enum for those like

enum annot_vector_subkind {
  annot_vector_never = 0,
  annot_vector_auto = 1, // this is the default
  annot_vector_always = 3,
  your new flag
};

and the user would specify it via

#pragma GCC vect [(never|always|auto)] [your new flag]

now, I honestly have a difficulty in suggesting a better name
than no_reduc_chain.  Quoting the testcase:

+double f(double *a, double *b)
+{
+  double res1 = 0;
+  double res0 = 0;
+#pragma GCC no_reduc_chain
+  for (int i = 0 ; i < 1000; i+=4) {
+    res0 += a[i] * b[i];
+    res1 += a[i+1] * b[i*1];
+    res0 += a[i+2] * b[i+2];
+    res1 += a[i+3] * b[i+3];
+  }
+  return res0 + res1;
+}

for your case with IIRC V2DF vectors using reduction chains will
result in a vectorization factor of two while with a SLP reduction the
vectorization factor is one.  
So maybe it is better to give the user control over the vectorization
factor?  That's desirable in other cases where the user wants to force
a larger VF to get extra unrolling for example.  For the testcase above
you'd use

#pragma GCC vect vf(1)

or so (syntax to be discussed).  The side-effect would be that
with a reduction chain the VF request cannot be fulfilled but
with a SLP reduction it can.  Of course no_reduc_chain is much
easier to actually implement in a strict way while specifying
VF will likely need to be documented as a hint (with an eventual
diagnostic if it wasn't fulfilled)

Richard/Jakub, any thoughts?

Thanks,
Richard.