This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: "Gabriel F. T. Gomes" <gabriel at inconstante dot eti dot br>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 25 Jun 2019 15:34:42 -0300
- Subject: Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
- References: <20190329133529.22523-1-adhemerval.zanella@linaro.org> <20190329133529.22523-22-adhemerval.zanella@linaro.org> <20190624210726.nllhjjcdtruehpn6@tereshkova>
On 24/06/2019 18:07, Gabriel F. T. Gomes wrote:
> On Fri, Mar 29 2019, Adhemerval Zanella wrote:
>>
>> This patches consolidates all the powerpc llround{f} implementations on
>> the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}. The only missing
>> optimization is the power6x one which I could not make GCC generates
>> mftgpr for 32 bits output.
>
> Similar to the llrint case, no harm done as such optimization wasn't
> available, anyway.
>
>> + /* The barrier prevents compiler from optimizing it to llround when
>> + compiled with -fno-math-errno */
>> + math_opt_barrier (x);
>
> I don't actually understand what this accomplishes, and I don't see any
> difference in the code generated with and without this barrier (all my
> builds have -fno-math-errno passed to the compiler). Could you help me
> understand this?
>
> This is the code I get (with or without the barrier):
>
> 00079dc0 <__llround_power6>:
> 79dc0: fc 20 0b 10 frin f1,f1
> 79dc4: 94 21 ff f0 stwu r1,-16(r1)
> 79dc8: fc 00 0e 5e fctidz f0,f1
> 79dcc: d8 01 00 08 stfd f0,8(r1)
> 79dd0: 80 61 00 08 lwz r3,8(r1)
> 79dd4: 80 81 00 0c lwz r4,12(r1)
> 79dd8: 38 21 00 10 addi r1,r1,16
> 79ddc: 4e 80 00 20 blr
>
> 00079de0 <__llround_power5plus>:
> 79de0: 94 21 ff f0 stwu r1,-16(r1)
> 79de4: fc 20 0b 10 frin f1,f1
> 79de8: fc 00 0e 5e fctidz f0,f1
> 79dec: d8 01 00 08 stfd f0,8(r1)
> 79df0: 60 00 00 00 nop
> 79df4: 60 00 00 00 nop
> 79df8: 60 00 00 00 nop
> 79dfc: 80 61 00 08 lwz r3,8(r1)
> 79e00: 80 81 00 0c lwz r4,12(r1)
> 79e04: 38 21 00 10 addi r1,r1,16
> 79e08: 4e 80 00 20 blr
>
>> --- a/sysdeps/powerpc/powerpc32/fpu/s_llround.c
>> +++ b/sysdeps/powerpc/powerpc32/fpu/s_llround.c
Without math_opt_barrier I am seeing with gcc 8.2.1 20190214
(--build=powerpc64-unknown-linux-gnu --host=powerpc64-unknown-linux-gnu
--target=powerpc-glibc-linux-gnu no extra option, built with
build-many-glibcs.py) I see using the flags:
powerpc-glibc-linux-gnu-gcc -mcpu=power4 ../sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c -c -std=gnu11 -fgnu89-inline -g -O2 -mcpu=power4 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -mhard-float -Wstrict-prototypes -Wold-style-definition -fno-math-errno -mlong-double-128 -mcpu=power6 -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
---
__llround_power6:
.LVL0:
.LFB46:
.file 1 "../sysdeps/powerpc/powerpc32/fpu/s_llround.c"
.loc 1 33 1 view -0
.cfi_startproc
.loc 1 35 3 view .LVU1
.loc 1 35 10 is_stmt 0 view .LVU2
b llround
.LVL1:
.loc 1 35 10 view .LVU3
.cfi_endproc
---
This is because gcc transforms the "(long long int) round (x)" to llround
and thus creates a cyclic call (this is similar to aarch64 {l}lrint
optimization).
As a side note, the __builtin_llround for powerpc64 generates the expected
code.
>>
>> [...]
>>
>> + {
>> + /* IEEE 1003.1 lround function. IEEE specifies "round to the nearest
>> + integer value, rounding halfway cases away from zero, regardless of
>> + the current rounding mode." However PowerPC Architecture defines
>> + "round to Nearest" as "Choose the best approximation. In case of a
>> + tie, choose the one that is even (least significant bit o).".
>> + So we can't use the PowerPC "round to Nearest" mode. Instead we set
>> + "round toward Zero" mode and round by adding +-0.5 before rounding
>> + to the integer value.
>> +
> ~~
> Two white-spaces here.
Acked.
>
>> --- a/sysdeps/powerpc/powerpc32/fpu/s_lround.S
>> +++ /dev/null
>>
>> [...]
>>
>> - fcmpu cr5, fp1, fp9 /* if x >= 0x7fffffff.8p0 */
>> - fcmpu cr1, fp1, fp8 /* if x <= -0x80000000.8p0 */
>
> OK. These have also been converted to c code in lround.c...
>
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc32/fpu/s_lround.c
>>
>> [...]
>>
>> + if (x >= 0x7fffffff.8p0 || x <= -0x80000000.8p0)
>> + x = (x < 0.0) ? -0x1p+52 : 0x1p+52;
>
> ... Here.
>