[PATCH v3] vect/rs6000: Support vector with length cost modeling
Kewen.Lin
linkw@linux.ibm.com
Wed Jul 22 15:48:09 GMT 2020
Hi Richard,
Thanks for the review!
on 2020/7/22 下午5:11, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> - else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>> - {
>> - peel_iters_prologue = 0;
>> - peel_iters_epilogue = 0;
>> + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> + {
>> + /* Calculate how many masks we need to generate. */
>> + unsigned int num_masks = 0;
>> + rgroup_controls *rgm;
>> + unsigned int num_vectors_m1;
>> + FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm)
>> + if (rgm->type)
>> + num_masks += num_vectors_m1 + 1;
>> + gcc_assert (num_masks > 0);
>> +
>> + /* In the worst case, we need to generate each mask in the prologue
>> + and in the loop body. One of the loop body mask instructions
>> + replaces the comparison in the scalar loop, and since we don't
>> + count the scalar comparison against the scalar body, we shouldn't
>> + count that vector instruction against the vector body either.
>> +
>> + Sometimes we can use unpacks instead of generating prologue
>> + masks and sometimes the prologue mask will fold to a constant,
>> + so the actual prologue cost might be smaller. However, it's
>> + simpler and safer to use the worst-case cost; if this ends up
>> + being the tie-breaker between vectorizing or not, then it's
>> + probably better not to vectorize. */
>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks,
>> + vector_stmt, NULL, NULL_TREE, 0, vect_prologue);
>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
>> + vector_stmt, NULL, NULL_TREE, 0, vect_body);
>> + }
>> + else
>> + {
>> + gcc_assert (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo));
>> +
>> + /* Consider cost for LOOP_VINFO_PEELING_FOR_ALIGNMENT. */
>> + if (npeel < 0)
>> + {
>> + peel_iters_prologue = assumed_vf / 2;
>> + /* See below, if peeled iterations are unknown, count a taken
>> + branch and a not taken branch per peeled loop. */
>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1,
>> + cond_branch_taken, NULL, NULL_TREE, 0,
>> + vect_prologue);
>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1,
>> + cond_branch_not_taken, NULL, NULL_TREE, 0,
>> + vect_prologue);
>> + }
>> + else
>> + {
>> + peel_iters_prologue = npeel;
>> + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> + /* See vect_get_known_peeling_cost, if peeled iterations are
>> + known but number of scalar loop iterations are unknown, count
>> + a taken branch per peeled loop. */
>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1,
>> + cond_branch_taken, NULL, NULL_TREE, 0,
>> + vect_prologue);
>> + }
>
> I think it'd be good to avoid duplicating this. How about the
> following structure?
>
> if (vect_use_loop_mask_for_alignment_p (…))
> {
> peel_iters_prologue = 0;
> peel_iters_epilogue = 0;
> }
> else if (npeel < 0)
> {
> … // A
> }
> else
> {
> …vect_get_known_peeling_cost stuff…
> }
>
> but in A and vect_get_known_peeling_cost, set peel_iters_epilogue to:
>
> LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0
>
> for LOOP_VINFO_USING_PARTIAL_VECTORS_P, instead of setting it to
> whatever value we'd normally use. Then wrap:
>
> (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken,
> NULL, NULL_TREE, 0, vect_epilogue);
> (void) add_stmt_cost (loop_vinfo,
> target_cost_data, 1, cond_branch_not_taken,
> NULL, NULL_TREE, 0, vect_epilogue);
>
> in !LOOP_VINFO_USING_PARTIAL_VECTORS_P and make the other vect_epilogue
> stuff in A conditional on peel_iters_epilogue != 0.
>
> This will also remove the need for the existing LOOP_VINFO_FULLY_MASKED_P
> code:
>
> if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> {
> /* We need to peel exactly one iteration. */
> peel_iters_epilogue += 1;
> stmt_info_for_cost *si;
> int j;
> FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
> j, si)
> (void) add_stmt_cost (loop_vinfo, target_cost_data, si->count,
> si->kind, si->stmt_info, si->vectype,
> si->misalign, vect_epilogue);
> }
>
> Then, after the above, have:
>
> if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> …add costs for mask overhead…
> else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> …add costs for lengths overhead…
>
> So we'd have one block of code for estimating the prologue and epilogue
> peeling cost, and a separate block of code for the loop control overhead.
>
It's a great idea, by following your subsequent suggestion to make the structure
like:
- calculate peel_iters_prologue
- calculate peel_iters_epilogue
- add costs associated with peel_iters_prologue
- add costs associated with peel_iters_epilogue
- add costs related to branch taken/not_taken.
the updated v3 is attached.
Just bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit
param vect-partial-vector-usage=1, I'll test it without partial vectors
setting, also on aarch64 later.
BR,
Kewen
-----
gcc/ChangeLog:
* config/rs6000/rs6000.c (adjust_vect_cost_per_loop): New function.
(rs6000_finish_cost): Call adjust_vect_cost_per_loop.
* tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code
to determine peel_iters_epilogue to function ...
(vect_get_peel_iters_epilogue): ... this. New function.
(vect_estimate_min_profitable_iters): Add cost modeling for vector
with length, refactor cost calculation on peel_iters_prologue and
peel_iters_epilogue.
-------------- next part --------------
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 009afc5f894..d71f2bf1c16 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5177,6 +5177,34 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
return retval;
}
+/* For some target specific vectorization cost which can't be handled per stmt,
+ we check the requisite conditions and adjust the vectorization cost
+ accordingly if satisfied. One typical example is to model shift cost for
+ vector with length by counting number of required lengths under condition
+ LOOP_VINFO_FULLY_WITH_LENGTH_P. */
+
+static void
+adjust_vect_cost_per_loop (rs6000_cost_data *data)
+{
+ struct loop *loop = data->loop_info;
+ gcc_assert (loop);
+ loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
+
+ if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+ rgroup_controls *rgc;
+ unsigned int num_vectors_m1;
+ unsigned int shift_cnt = 0;
+ FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+ if (rgc->type)
+ /* Each length needs one shift to fill into bits 0-7. */
+ shift_cnt += (num_vectors_m1 + 1);
+
+ rs6000_add_stmt_cost (loop_vinfo, (void *) data, shift_cnt, scalar_stmt,
+ NULL, NULL_TREE, 0, vect_body);
+ }
+}
+
/* Implement targetm.vectorize.finish_cost. */
static void
@@ -5186,7 +5214,10 @@ rs6000_finish_cost (void *data, unsigned *prologue_cost,
rs6000_cost_data *cost_data = (rs6000_cost_data*) data;
if (cost_data->loop_info)
- rs6000_density_test (cost_data);
+ {
+ adjust_vect_cost_per_loop (cost_data);
+ rs6000_density_test (cost_data);
+ }
/* Don't vectorize minimum-vectorization-factor, simple copy loops
that require versioning for any reason. The vectorization is at
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e933441b922..c85b3ea1393 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3474,42 +3474,56 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info,
return NULL;
}
-/* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times. */
-int
-vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
- int *peel_iters_epilogue,
- stmt_vector_for_cost *scalar_cost_vec,
- stmt_vector_for_cost *prologue_cost_vec,
- stmt_vector_for_cost *epilogue_cost_vec)
+/* Calculate how many iterations peeled for epilogue with information LOOP_VINFO
+ and PEEL_ITERS_PROLOGUE. */
+
+static int
+vect_get_peel_iters_epilogue (loop_vec_info loop_vinfo, int peel_iters_prologue)
{
- int retval = 0;
int assumed_vf = vect_vf_for_cost (loop_vinfo);
-
if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
- *peel_iters_epilogue = assumed_vf / 2;
if (dump_enabled_p ())
- dump_printf_loc (MSG_NOTE, vect_location,
+ dump_printf_loc (MSG_NOTE, vect_location,
"cost model: epilogue peel iters set to vf/2 "
"because loop iterations are unknown .\n");
-
- /* If peeled iterations are known but number of scalar loop
- iterations are unknown, count a taken branch per peeled loop. */
- retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
- NULL, NULL_TREE, 0, vect_prologue);
- retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken,
- NULL, NULL_TREE, 0, vect_epilogue);
+ return assumed_vf / 2;
}
else
{
int niters = LOOP_VINFO_INT_NITERS (loop_vinfo);
- peel_iters_prologue = niters < peel_iters_prologue ?
- niters : peel_iters_prologue;
- *peel_iters_epilogue = (niters - peel_iters_prologue) % assumed_vf;
+ int npeel_prol
+ = niters < peel_iters_prologue ? niters : peel_iters_prologue;
+ int npeel_epil = (niters - npeel_prol) % assumed_vf;
/* If we need to peel for gaps, but no peeling is required, we have to
peel VF iterations. */
- if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !*peel_iters_epilogue)
- *peel_iters_epilogue = assumed_vf;
+ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !npeel_epil)
+ npeel_epil = assumed_vf;
+ return npeel_epil;
+ }
+}
+
+/* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times. */
+int
+vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
+ int *peel_iters_epilogue,
+ stmt_vector_for_cost *scalar_cost_vec,
+ stmt_vector_for_cost *prologue_cost_vec,
+ stmt_vector_for_cost *epilogue_cost_vec)
+{
+ int retval = 0;
+
+ *peel_iters_epilogue
+ = vect_get_peel_iters_epilogue (loop_vinfo, peel_iters_prologue);
+
+ if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+ {
+ /* If peeled iterations are known but number of scalar loop
+ iterations are unknown, count a taken branch per peeled loop. */
+ retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL,
+ NULL_TREE, 0, vect_prologue);
+ retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, NULL,
+ NULL_TREE, 0, vect_epilogue);
}
stmt_info_for_cost *si;
@@ -3652,24 +3666,106 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
TODO: Build an expression that represents peel_iters for prologue and
epilogue to be used in a run-time test. */
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ bool prol_need_br_taken_cost = false;
+ bool prol_need_br_not_taken_cost = false;
+
+ /* Calculate peel_iters_prologue. */
+ if (vect_use_loop_mask_for_alignment_p (loop_vinfo))
+ peel_iters_prologue = 0;
+ else if (npeel < 0)
{
- peel_iters_prologue = 0;
- peel_iters_epilogue = 0;
+ peel_iters_prologue = assumed_vf / 2;
+ if (dump_enabled_p ())
+ dump_printf (MSG_NOTE, "cost model: "
+ "prologue peel iters set to vf/2.\n");
- if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+ /* If peeled iterations are unknown, count a taken branch and a not taken
+ branch per peeled loop. Even if scalar loop iterations are known,
+ vector iterations are not known since peeled prologue iterations are
+ not known. Hence guards remain the same. */
+ prol_need_br_taken_cost = true;
+ prol_need_br_not_taken_cost = true;
+ }
+ else
+ peel_iters_prologue = npeel;
+
+ bool epil_need_br_taken_cost = false;
+ bool epil_need_br_not_taken_cost = false;
+
+ /* Calculate peel_iters_epilogue. */
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+ /* We need to peel exactly one iteration for gaps. */
+ peel_iters_epilogue = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0;
+ else if (npeel < 0)
+ {
+ /* If peeling for alignment is unknown, loop bound of main loop
+ becomes unknown. */
+ peel_iters_epilogue = assumed_vf / 2;
+ if (dump_enabled_p ())
+ dump_printf (MSG_NOTE, "cost model: "
+ "epilogue peel iters set to vf/2 because "
+ "peeling for alignment is unknown.\n");
+
+ /* See the same reason above in peel_iters_prologue calculation. */
+ epil_need_br_taken_cost = true;
+ epil_need_br_not_taken_cost = true;
+ }
+ else
+ {
+ peel_iters_epilogue = vect_get_peel_iters_epilogue (loop_vinfo, npeel);
+ if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
- /* We need to peel exactly one iteration. */
- peel_iters_epilogue += 1;
- stmt_info_for_cost *si;
- int j;
- FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
- j, si)
- (void) add_stmt_cost (loop_vinfo, target_cost_data, si->count,
- si->kind, si->stmt_info, si->vectype,
- si->misalign, vect_epilogue);
+ /* If peeled iterations are known but number of scalar loop
+ iterations are unknown, count a taken branch per peeled loop. */
+ prol_need_br_taken_cost = true;
+ epil_need_br_taken_cost = true;
}
+ }
+
+ stmt_info_for_cost *si;
+ int j;
+ /* Add costs associated with peel_iters_prologue. */
+ if (peel_iters_prologue)
+ FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si)
+ {
+ (void) add_stmt_cost (loop_vinfo, target_cost_data,
+ si->count * peel_iters_prologue, si->kind,
+ si->stmt_info, si->vectype, si->misalign,
+ vect_prologue);
+ }
+
+ /* Add costs associated with peel_iters_prologue. */
+ if (peel_iters_epilogue)
+ FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si)
+ {
+ (void) add_stmt_cost (loop_vinfo, target_cost_data,
+ si->count * peel_iters_epilogue, si->kind,
+ si->stmt_info, si->vectype, si->misalign,
+ vect_epilogue);
+ }
+
+ /* Add possible cond_branch_taken/cond_branch_not_taken cost. */
+ if (prol_need_br_taken_cost)
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken,
+ NULL, NULL_TREE, 0, vect_prologue);
+ if (prol_need_br_not_taken_cost)
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, 1,
+ cond_branch_not_taken, NULL, NULL_TREE, 0,
+ vect_prologue);
+
+ if (epil_need_br_taken_cost)
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken,
+ NULL, NULL_TREE, 0, vect_epilogue);
+
+ if (epil_need_br_not_taken_cost)
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, 1,
+ cond_branch_not_taken, NULL, NULL_TREE, 0,
+ vect_epilogue);
+
+ /* Take care of special costs for rgroup controls of partial vectors. */
+ if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ {
/* Calculate how many masks we need to generate. */
unsigned int num_masks = 0;
rgroup_controls *rgm;
@@ -3691,93 +3787,79 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
simpler and safer to use the worst-case cost; if this ends up
being the tie-breaker between vectorizing or not, then it's
probably better not to vectorize. */
- (void) add_stmt_cost (loop_vinfo,
- target_cost_data, num_masks, vector_stmt,
- NULL, NULL_TREE, 0, vect_prologue);
- (void) add_stmt_cost (loop_vinfo,
- target_cost_data, num_masks - 1, vector_stmt,
- NULL, NULL_TREE, 0, vect_body);
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks,
+ vector_stmt, NULL, NULL_TREE, 0, vect_prologue);
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
+ vector_stmt, NULL, NULL_TREE, 0, vect_body);
}
else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
{
- peel_iters_prologue = 0;
- peel_iters_epilogue = 0;
- }
- else if (npeel < 0)
- {
- peel_iters_prologue = assumed_vf / 2;
- if (dump_enabled_p ())
- dump_printf (MSG_NOTE, "cost model: "
- "prologue peel iters set to vf/2.\n");
-
- /* If peeling for alignment is unknown, loop bound of main loop becomes
- unknown. */
- peel_iters_epilogue = assumed_vf / 2;
- if (dump_enabled_p ())
- dump_printf (MSG_NOTE, "cost model: "
- "epilogue peel iters set to vf/2 because "
- "peeling for alignment is unknown.\n");
+ /* Refer to the functions vect_set_loop_condition_partial_vectors
+ and vect_set_loop_controls_directly, we need to generate each
+ length in the prologue and in the loop body if required. Although
+ there are some possible optimization, we consider the worst case
+ here. */
+
+ /* For now we only operate length-based partial vectors on Power,
+ which has constant VF all the time, we need some tweakings below
+ if it doesn't hold in future. */
+ gcc_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ());
+
+ /* For wrap around checking. */
+ tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
+ unsigned int compare_precision = TYPE_PRECISION (compare_type);
+ widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo);
+
+ bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo);
+ bool need_iterate_p
+ = (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+ && !vect_known_niters_smaller_than_vf (loop_vinfo));
+
+ /* Init min/max, shift and minus cost relative to single
+ scalar_stmt. For now we only use length-based partial vectors on
+ Power, target specific cost tweaking may be needed for other
+ ports in future. */
+ unsigned int min_max_cost = 2;
+ unsigned int shift_cost = 1, minus_cost = 1;
+
+ /* Init cost relative to single scalar_stmt. */
+ unsigned int prol_cnt = 0;
+ unsigned int body_cnt = 0;
+
+ rgroup_controls *rgc;
+ unsigned int num_vectors_m1;
+ FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+ if (rgc->type)
+ {
+ unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor;
- /* If peeled iterations are unknown, count a taken branch and a not taken
- branch per peeled loop. Even if scalar loop iterations are known,
- vector iterations are not known since peeled prologue iterations are
- not known. Hence guards remain the same. */
- (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken,
- NULL, NULL_TREE, 0, vect_prologue);
- (void) add_stmt_cost (loop_vinfo,
- target_cost_data, 1, cond_branch_not_taken,
- NULL, NULL_TREE, 0, vect_prologue);
- (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken,
- NULL, NULL_TREE, 0, vect_epilogue);
- (void) add_stmt_cost (loop_vinfo,
- target_cost_data, 1, cond_branch_not_taken,
- NULL, NULL_TREE, 0, vect_epilogue);
- stmt_info_for_cost *si;
- int j;
- FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si)
- {
- (void) add_stmt_cost (loop_vinfo, target_cost_data,
- si->count * peel_iters_prologue,
- si->kind, si->stmt_info, si->vectype,
- si->misalign,
- vect_prologue);
- (void) add_stmt_cost (loop_vinfo, target_cost_data,
- si->count * peel_iters_epilogue,
- si->kind, si->stmt_info, si->vectype,
- si->misalign,
- vect_epilogue);
- }
- }
- else
- {
- stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
- stmt_info_for_cost *si;
- int j;
- void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+ /* Need one shift for niters_total computation. */
+ if (!niters_known_p && nitems != 1)
+ prol_cnt += shift_cost;
- prologue_cost_vec.create (2);
- epilogue_cost_vec.create (2);
- peel_iters_prologue = npeel;
+ /* Need to handle wrap around. */
+ if (iv_limit == -1
+ || (wi::min_precision (iv_limit * nitems, UNSIGNED)
+ > compare_precision))
+ prol_cnt += (min_max_cost + minus_cost);
- (void) vect_get_known_peeling_cost (loop_vinfo, peel_iters_prologue,
- &peel_iters_epilogue,
- &LOOP_VINFO_SCALAR_ITERATION_COST
- (loop_vinfo),
- &prologue_cost_vec,
- &epilogue_cost_vec);
+ /* Need to handle batch limit excepting for the 1st one. */
+ prol_cnt += (min_max_cost + minus_cost) * num_vectors_m1;
- FOR_EACH_VEC_ELT (prologue_cost_vec, j, si)
- (void) add_stmt_cost (loop_vinfo,
- data, si->count, si->kind, si->stmt_info,
- si->vectype, si->misalign, vect_prologue);
+ unsigned int num_vectors = num_vectors_m1 + 1;
+ /* Need to set up lengths in prologue, only one MIN required
+ since start index is zero. */
+ prol_cnt += min_max_cost * num_vectors;
- FOR_EACH_VEC_ELT (epilogue_cost_vec, j, si)
- (void) add_stmt_cost (loop_vinfo,
- data, si->count, si->kind, si->stmt_info,
- si->vectype, si->misalign, vect_epilogue);
+ /* Need to update lengths in body for next iteration. */
+ if (need_iterate_p)
+ body_cnt += (2 * min_max_cost + minus_cost) * num_vectors;
+ }
- prologue_cost_vec.release ();
- epilogue_cost_vec.release ();
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, prol_cnt, scalar_stmt,
+ NULL, NULL_TREE, 0, vect_prologue);
+ (void) add_stmt_cost (loop_vinfo, target_cost_data, body_cnt, scalar_stmt,
+ NULL, NULL_TREE, 0, vect_body);
}
/* FORNOW: The scalar outside cost is incremented in one of the
@@ -3913,8 +3995,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
}
/* ??? The "if" arm is written to handle all cases; see below for what
- we would do for !LOOP_VINFO_FULLY_MASKED_P. */
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P. */
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* Rewriting the condition above in terms of the number of
vector iterations (vniters) rather than the number of
@@ -3941,7 +4023,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
dump_printf (MSG_NOTE, " Minimum number of vector iterations: %d\n",
min_vec_niters);
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* Now that we know the minimum number of vector iterations,
find the minimum niters for which the scalar cost is larger:
@@ -3996,6 +4078,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
&& min_profitable_iters < (assumed_vf + peel_iters_prologue))
/* We want the vectorized loop to execute at least once. */
min_profitable_iters = assumed_vf + peel_iters_prologue;
+ else if (min_profitable_iters < peel_iters_prologue)
+ /* For LOOP_VINFO_USING_PARTIAL_VECTORS_P, we need to ensure the
+ vectorized loop to execute at least once. */
+ min_profitable_iters = peel_iters_prologue;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -4013,7 +4099,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
if (vec_outside_cost <= 0)
min_profitable_estimate = 0;
- else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
/* This is a repeat of the code above, but with + SOC rather
than - SOC. */
@@ -4025,7 +4111,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
if (outside_overhead > 0)
min_vec_niters = outside_overhead / saving_per_viter + 1;
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
int threshold = (vec_inside_cost * min_vec_niters
+ vec_outside_cost
More information about the Gcc-patches
mailing list