This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add a new macro to mask a float
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-alpha at sourceware dot org
- Date: Wed, 29 Jun 2016 14:17:32 -0300
- Subject: Re: [PATCH] Add a new macro to mask a float
- Authentication-results: sourceware.org; auth=none
- References: <1467142073-13886-1-git-send-email-tuliom at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 20 dot 1606282056460 dot 12650 at digraph dot polyomino dot org dot uk>
On 28/06/2016 18:00, Joseph Myers wrote:
> On Tue, 28 Jun 2016, Tulio Magno Quites Machado Filho wrote:
>
>> +/* Faster to do an in-place masking of the float number in the VSR
>> + than move to GPR for the masking and back. maskl, maskr, and maski
>> + are used to convert the 32-bit "mask" parameter to a 64-bit mask
>> + suitable for the internal representation of a scalar
>> + single-precision floating point number in the Power8 processor.
>> + Note: before applying the mask, xvmovdp is used to ensure f is
>> + normalized. */
>
> Actually, could you clarify what that internal representation is, and what
> "to ensure f is normalized" is about? Is this macro definition exactly
> equivalent to the integer masking, including for subnormal arguments and
> NaNs?
>
> If it's exactly equivalent in all cases, including subnormals and NaNs,
> then my previous comment applies - it would be better as a compiler
> optimization. If it's only equivalent for normal values but the code in
> question can't get subnormal arguments / NaNs / whatever values it's not
> equivalent for, then doing this in glibc is more plausible, though there
> are coding style issues, the macro comments would need to explain the
> limitation, and it would be necessary to be sure in each case that problem
> arguments can't get there.
>
My understanding of this optimization is to just make the the FP to GPR move,
bitwise operation and GRP to FP move again to a more simple bitwise operation
on FP register itself. It is indeed equivalent to integer masking and I
believe the 'normalized' here means to make the float mask to represented
as internal double required in VSX operations.
So the code:
float foo (float x)
{
MASK_FLOAT(x, 0xfffff000);
return x;
}
Is currently optimized on GCC 4.8 as:
foo:
xscvdpspn 12,1
mfvsrd 9,12
srdi 9,9,32
rlwinm 9,9,0,0,19
sldi 10,9,32
mtvsrd 1,10
xscvspdpn 1,1
blr
And with this patch as:
foo:
0: addis 2,12,.TOC.-0b@ha
addi 2,2,.TOC.-0b@l
.localentry foo,.-foo
addis 9,2,.LC1@toc@ha
lfs 0,.LC1@toc@l(9)
xvmovdp 1, 1
xxland 1, 1, 0
blr
.LC1:
.4byte 4294963200
Taking in consideration the constant will be in current TOC on the function
it will require just one float load (lfs) to get the flag.