Refactor peel_iters_{pro,epi}logue cost model handlings
Kewen.Lin
linkw@linux.ibm.com
Tue Jul 28 08:01:00 GMT 2020
Hi Richard,
on 2020/7/27 下午9:10, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi,
>>
>> As Richard S. suggested in the thread:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550633.html
>>
>> this patch is separated from the one of that thread, mainly to refactor the
>> existing peel_iters_{pro,epi}logue cost model handlings.
>>
>> I've addressed Richard S.'s review comments there, moreover because of one
>> failure of aarch64 testing, I updated it a bit more to keep the logic unchanged
>> as before first (refactor_cost.diff).
>
> Heh, nice when a clean-up exposes an existing bug. ;-) I agree the
> updates look correct. E.g. if vf is 1, we should assume that there
> are no peeled iterations even if the number of iterations is unknown.
>
> Both patches are OK with some very minor nits (sorry):
Thanks for the comments! I've addressed them and commit refactor_cost.diff
via r11-2378.
For adjust.diff, it works well on powerpc64le-linux-gnu (bootstrap/regtest),
but got one failure on aarch64 as below:
PASS->FAIL: gcc.target/aarch64/sve/cost_model_2.c -march=armv8.2-a+sve scan-assembler-times \\tst1w\\tz 1
I spent some time investigating it and thought it's expected. As you mentioned
above, the patch can fix the cases with VF 1, the VF of this case is exactly one. :)
Without the proposed adjustment, the cost for V16QI looks like:
Vector inside of loop cost: 1
Vector prologue cost: 4
Vector epilogue cost: 3
Scalar iteration cost: 4
Scalar outside cost: 6
Vector outside cost: 7
prologue iterations: 0
epilogue iterations: 0
After the change, the cost becomes to:
Vector inside of loop cost: 1
Vector prologue cost: 1
Vector epilogue cost: 0
Scalar iteration cost: 4
Scalar outside cost: 6
Vector outside cost: 1
prologue iterations: 0
epilogue iterations: 0
which outperforms VNx16QI that isn't affected by this patch with partial vectors.
The test case expects VNx16QI to be used, the asm looks like:
cmp w2, 0
ble .L1
mov x1, 0
sbfiz x2, x2, 2, 32
mov z0.s, #1
whilelo p0.s, xzr, x2
.p2align 3,,7
.L3:
st1w z0.s, p0, [x0, x1, lsl 2]
add x1, x1, 4
whilelo p0.s, x1, x2
b.any .L3
But with this patch, instead it ends up with:
.LFB0:
.cfi_startproc
cmp w2, 0
ble .L1
movi v0.4s, 0x1
sub w1, w2, #1
add x2, x0, 16
add x1, x2, x1, uxtw 4
.p2align 3,,7
.L3:
str q0, [x0], 16
cmp x0, x1
bne .L3
I updated the expected asm as the current generated asm, not sure whether it's
a fix good enough.
Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu.
BR,
Kewen
-----
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch
taken costs for prologue and epilogue if they don't exist.
(vect_estimate_min_profitable_iters): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model
change.
-------------- next part --------------
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c
index d9d707893d8..a7d748397d6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c
@@ -9,4 +9,4 @@ vset (int *restrict dst, int *restrict src, int count)
*dst++ = 1;
}
-/* { dg-final { scan-assembler-times {\tst1w\tz} 1 } } */
+/* { dg-final { scan-assembler-times {\tstr\tq} 1 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 185019c3305..43ec4fb04d5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3520,10 +3520,12 @@ vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
{
/* If peeled iterations are known but number of scalar loop
iterations are unknown, count a taken branch per peeled loop. */
- retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL,
- NULL_TREE, 0, vect_prologue);
- retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, NULL,
- NULL_TREE, 0, vect_epilogue);
+ if (peel_iters_prologue > 0)
+ retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
+ NULL, NULL_TREE, 0, vect_prologue);
+ if (*peel_iters_epilogue > 0)
+ retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken,
+ NULL, NULL_TREE, 0, vect_epilogue);
}
stmt_info_for_cost *si;
@@ -3670,7 +3672,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
bool prologue_need_br_not_taken_cost = false;
/* Calculate peel_iters_prologue. */
- if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+ if (vect_use_loop_mask_for_alignment_p (loop_vinfo))
peel_iters_prologue = 0;
else if (npeel < 0)
{
@@ -3689,7 +3691,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
else
{
peel_iters_prologue = npeel;
- if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+ if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && peel_iters_prologue > 0)
/* If peeled iterations are known but number of scalar loop
iterations are unknown, count a taken branch per peeled loop. */
prologue_need_br_taken_cost = true;
@@ -3719,7 +3721,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
else
{
peel_iters_epilogue = vect_get_peel_iters_epilogue (loop_vinfo, npeel);
- if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+ if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && peel_iters_epilogue > 0)
/* If peeled iterations are known but number of scalar loop
iterations are unknown, count a taken branch per peeled loop. */
epilogue_need_br_taken_cost = true;
More information about the Gcc-patches
mailing list