This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add a new macro to mask a float
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Joseph Myers <joseph at codesourcery dot com>
- Cc: libc-alpha at sourceware dot org
- Cc:
- Date: Mon, 04 Jul 2016 11:01:24 -0300
- Subject: Re: [PATCH] Add a new macro to mask a float
- Authentication-results: sourceware.org; auth=none
- References: <1467142073-13886-1-git-send-email-tuliom@linux.vnet.ibm.com> <alpine.DEB.2.20.1606282056460.12650@digraph.polyomino.org.uk> <577402AC.5080208@linaro.org> <alpine.DEB.2.20.1606291725470.22371@digraph.polyomino.org.uk> <577419D9.8070903@linaro.org>
Joseph Myers <joseph@codesourcery.com> writes:
> On Tue, 28 Jun 2016, Tulio Magno Quites Machado Filho wrote:
>
>> +/* Faster to do an in-place masking of the float number in the VSR
>> + than move to GPR for the masking and back. maskl, maskr, and maski
>> + are used to convert the 32-bit "mask" parameter to a 64-bit mask
>> + suitable for the internal representation of a scalar
>> + single-precision floating point number in the Power8 processor.
>> + Note: before applying the mask, xvmovdp is used to ensure f is
>> + normalized. */
>
> Actually, could you clarify what that internal representation is, and what
> "to ensure f is normalized" is about? Is this macro definition exactly
> equivalent to the integer masking, including for subnormal arguments and
> NaNs?
That's just an optimization. A SP denormal here could cause the CPU to waste
some cycles.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 29/06/2016 14:34, Joseph Myers wrote:
>> On Wed, 29 Jun 2016, Adhemerval Zanella wrote:
>>
>>> My understanding of this optimization is to just make the the FP to GPR move,
>>> bitwise operation and GRP to FP move again to a more simple bitwise operation
>>> on FP register itself. It is indeed equivalent to integer masking and I
>>> believe the 'normalized' here means to make the float mask to represented
>>> as internal double required in VSX operations.
>>
>> What do you mean by "internal double"? Is this purely some fixed
>> rearrangement of bits, so that e.g. subnormal float values still get
>> represented as subnormals rather than like normal doubles?
>
> In fact the float number are converted in double value, so 0x1p-149f would
> be represented internally in the VSX register as
> v4_int32 = {0x0, 0x0, 0x0, 0x36a00000}. And in fact this is an issue
> (below).
>
>>
>> Say the number is the least subnormal float - 0x1p-149f, integer
>> representation 1 - and that it's masked with 0xfffff000, as in the various
>> MASK_FLOAT calls. Can you confirm that the instruction sequence in the
>> patch produces 0.0f, as the integer masking does, when executed on a
>> POWER8 processor? And that if instead the value is 0x1p-137f, it's
>> returned unchanged?
>>
>> If equivalent to integer masking for all inputs including subnormals and
>> infinities and NaNs, then my previous point applies that this should be a
>> compiler optimization instead of a glibc patch.
>>
>
> Now that you raised these questioning I do not think this change is safe
> for float values in POWER. Current patch does:
>
> __asm__ ("xvmovdp %x2, %x2\n\t" \
> "xxland %x0, %x2, %1\n\t" \
>
> And I think 'xvmovdp' here is not what it really meant (it is
> Copy Sign Double-Precision). I think what the algorithm meant was in fact:
>
> __asm__ ("xvcvdpsp %x2, %x2\n\t" \
> "xxland %x0, %x2, %1\n\t" \
> "xvcvspdp %x0, %x0" \
Exactly. After making that change, you can also simplify the mask treatment,
making it trivial for the compiler to do this optimization.
I'll forward this to GCC.
Thank you!
--
Tulio Magno