This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: "Gabriel F. T. Gomes" <gabriel at inconstante dot eti dot br>
Cc: libc-alpha at sourceware dot org
Date: Tue, 25 Jun 2019 15:34:42 -0300
Subject: Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
References: <20190329133529.22523-1-adhemerval.zanella@linaro.org> <20190329133529.22523-22-adhemerval.zanella@linaro.org> <20190624210726.nllhjjcdtruehpn6@tereshkova>


On 24/06/2019 18:07, Gabriel F. T. Gomes wrote:
> On Fri, Mar 29 2019, Adhemerval Zanella wrote:
>>
>> This patches consolidates all the powerpc llround{f} implementations on
>> the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}.  The only missing
>> optimization is the power6x one which I could not make GCC generates
>> mftgpr for 32 bits output.
> 
> Similar to the llrint case, no harm done as such optimization wasn't
> available, anyway.
> 
>> +  /* The barrier prevents compiler from optimizing it to llround when
>> +     compiled with -fno-math-errno */
>> +  math_opt_barrier (x);
> 
> I don't actually understand what this accomplishes, and I don't see any
> difference in the code generated with and without this barrier (all my
> builds have -fno-math-errno passed to the compiler).  Could you help me
> understand this?
> 
> This is the code I get (with or without the barrier):
> 
>   00079dc0 <__llround_power6>:
>     79dc0:       fc 20 0b 10     frin    f1,f1
>     79dc4:       94 21 ff f0     stwu    r1,-16(r1)
>     79dc8:       fc 00 0e 5e     fctidz  f0,f1
>     79dcc:       d8 01 00 08     stfd    f0,8(r1)
>     79dd0:       80 61 00 08     lwz     r3,8(r1)
>     79dd4:       80 81 00 0c     lwz     r4,12(r1)
>     79dd8:       38 21 00 10     addi    r1,r1,16
>     79ddc:       4e 80 00 20     blr
> 
>   00079de0 <__llround_power5plus>:
>     79de0:       94 21 ff f0     stwu    r1,-16(r1)
>     79de4:       fc 20 0b 10     frin    f1,f1
>     79de8:       fc 00 0e 5e     fctidz  f0,f1
>     79dec:       d8 01 00 08     stfd    f0,8(r1)
>     79df0:       60 00 00 00     nop
>     79df4:       60 00 00 00     nop
>     79df8:       60 00 00 00     nop
>     79dfc:       80 61 00 08     lwz     r3,8(r1)
>     79e00:       80 81 00 0c     lwz     r4,12(r1)
>     79e04:       38 21 00 10     addi    r1,r1,16
>     79e08:       4e 80 00 20     blr
> 
>> --- a/sysdeps/powerpc/powerpc32/fpu/s_llround.c
>> +++ b/sysdeps/powerpc/powerpc32/fpu/s_llround.c

Without math_opt_barrier I am seeing with gcc 8.2.1 20190214 
(--build=powerpc64-unknown-linux-gnu --host=powerpc64-unknown-linux-gnu 
--target=powerpc-glibc-linux-gnu no extra option, built with 
build-many-glibcs.py) I see using the flags:

powerpc-glibc-linux-gnu-gcc -mcpu=power4 ../sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c -c -std=gnu11 -fgnu89-inline  -g -O2 -mcpu=power4 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -mhard-float -Wstrict-prototypes -Wold-style-definition -fno-math-errno -mlong-double-128   -mcpu=power6       -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES 

---
__llround_power6:
.LVL0:
.LFB46: 
        .file 1 "../sysdeps/powerpc/powerpc32/fpu/s_llround.c"
        .loc 1 33 1 view -0
        .cfi_startproc
        .loc 1 35 3 view .LVU1
        .loc 1 35 10 is_stmt 0 view .LVU2
        b llround
.LVL1:  
        .loc 1 35 10 view .LVU3
        .cfi_endproc
---

This is because gcc transforms the "(long long int) round (x)" to llround
and thus creates a cyclic call (this is similar to aarch64 {l}lrint 
optimization).  

As a side note, the __builtin_llround for powerpc64 generates the expected
code.

>>
>> [...]
>>
>> +    {
>> +       /* IEEE 1003.1 lround function.  IEEE specifies "round to the nearest
>> +	  integer value, rounding halfway cases away from zero, regardless of
>> +	  the current rounding mode."  However PowerPC Architecture defines
>> +	  "round to Nearest" as "Choose the best approximation. In case of a
>> +	  tie, choose the one that is even (least significant bit o).".
>> +	  So we can't use the PowerPC "round to Nearest" mode. Instead we set
>> +	  "round toward Zero" mode and round by adding +-0.5 before rounding
>> +	  to the integer value.
>> +  
>    ~~
> Two white-spaces here.

Acked.

> 
>> --- a/sysdeps/powerpc/powerpc32/fpu/s_lround.S
>> +++ /dev/null
>>
>> [...]
>>
>> -	fcmpu	cr5, fp1, fp9	/* if x >= 0x7fffffff.8p0  */
>> -	fcmpu	cr1, fp1, fp8	/* if x <= -0x80000000.8p0  */
> 
> OK.  These have also been converted to c code in lround.c...
> 
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc32/fpu/s_lround.c
>>
>> [...]
>>
>> +  if (x >= 0x7fffffff.8p0 || x <= -0x80000000.8p0)
>> +    x = (x < 0.0) ? -0x1p+52 : 0x1p+52;
> 
> ... Here.
>

Follow-Ups:
- Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
  - From: Gabriel F. T. Gomes

References:
- Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
  - From: Gabriel F. T. Gomes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]