This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove alpha specific fmax, fmin to fix sNaN handling [BZ #20947]



On 02/01/2018 12:04, Joseph Myers wrote:
> On Mon, 1 Jan 2018, Adhemerval Zanella wrote:
> 
>>> In the case of ceil, inexact should never be generated.  Since the alpha 
>>> ceil implementations work entirely with asm which does not use /i to 
>>> enable inexact exceptions, I'm not sure why they should generate such 
>>> exceptions spuriously.  What failures are you seeing exactly - every case 
>>> of noninteger arguments to ceil / ceilf, or only some such cases, or even 
>>> cases of integer arguments?
>>
>> The ceil/ceilf issues are in attachments (ran with s_ceil{f} built with
>> -mieee-with-inexact).
> 
> ceil / ceilf should *not* be built with -mieee-with-inexact (since they 
> should never raise inexact).  But also that option shouldn't make any 
> difference to those functions.
> 
> This is systematically raising spurious inexact for noninteger ceil / 
> ceilf arguments.  I don't see why these arguments would trap to the 
> kernel, but maybe (a) confirm in a debugger exactly which instruction 
> results in inexact being raised; (b) maybe instrument the kernel to report 
> when that instruction is being emulated so you can see if the emulation is 
> involved here at all?  If the emulation is involved, the kernel should be 
> fixed to check TRP to see if inexact should be raised.

It is the 'cvttq/svm' which changes the fpcr and sets INE bit.

(gdb) i r fpcr
fpcr           0x680e000000000000       7497930429618454528
(gdb) ni
0x000002000009a194      38            __asm (
(gdb) i r fpcr
fpcr           0xe90e000000200000       -1653384013196296192

(0x000002000009a194 is the cvttq/svm from s_ceil.S).

A comment from alpha divq.S (present in other assembly implementation
as well) states:

 37    The FPCR save/restore is due to the fact that the EV6 _will_ set FPCR_INE
 38    for cvttq/c even without /sui being set.  It will not, however, properly
 39    raise the exception, so we don't have to worry about FPCR_INED being clear
 40    and so dying by SIGFPE.  */

Which leads to believe we are it seems valid to /m as well.  Also the comments
on qemu patch at [1] indicates that CVTTQ semantic does set inexact for 
1. denorms -> 0 and 2. values outside of that range -> lower 64 bits of value.
So I am not sure if it a hardware issue or a expected semantic (Alpha Architecture
Handbook I have access does indicate that cvttq sets INE bit for some operations).

I haven't tested if it is the case of an emulated instruction (I currently
I do not have access to rebuild/reinstall new kernel on the machine), but 
since I am checking on EV68CB I guess it is not.

In any case I think we have two options here: either adjust the implementation
to clear FPCR_INE bit after cvttq/svm (which will incur in a mf_fpcr followed 
by a mt_fpcr) or just remove the optimized implementation.  I more inclined
the the former since working on FPCR is usually costly, a very naive attempt
to save/restore the fpcr on cvttq for ceil did solved the issues but also
showed worse performance than using the generic implementation (I used a
ceil benchtests based on trunc{f} inputs).

[1] https://patchwork.ozlabs.org/patch/363303/

> 
>>> That however does not explain issues for fma / fmaf.  What do you see 
>>> there - spurious inexact, missing inexact, wrong results?  The use of 
>>> -mieee-with-inexact ought to ensure instructions are generated that set 
>>> "inexact" appropriately, and unless it's set appropriately, wrong results 
>>> can occur because the round-to-odd implementation relies on correct 
>>> setting of inexact.  fmaf in particular is very simple, so as long as the 
>>> right instructions are used and nothing gets reordered past the libc_fe* 
>>> calls, not much should be able to go wrong.
>>
>> The issues I am seeing on alpha for fma/fmaf are also in attachments.
> 
> For float, these are all missing underflow exceptions.
> 
> Alpha is an architecture with after-rounding tininess detection.  Recall 
> that after-rounding tininess detection is based on what the result would 
> be if rounded to normal precision but with infinite exponent range, so 
> it's possible for a result to be rounded to +/- the least normal but still 
> result in underflow with after-rounding tininess detection, which appears 
> to be the case for the failing tests for float.
> 
> Now, the Linux kernel has an old soft-fp version that only supports 
> before-rounding tininess detection, but the cases with before-rounding 
> underflow are a strict superset of those with after-rounding underflow, so 
> that can't explain missing underflow exceptions.  (I tried in 2015 to get 
> updated soft-fp into the Linux kernel.  A patch series was accepted into a 
> powerpc tree that was supposed to be pull-requested for Linux 4.4 
> <https://lkml.org/lkml/2015/8/26/804> but it never actually got into 
> Linus's tree for some reason.)
> 
> Maybe there is a a hardware bug that means certain underflow cases neither 
> raise the underflow flag in hardware nor pass things to software 
> emulation, or something like that?
> 
> (IEEE 754-1985, unlike IEEE 754-2008, allows for underflow to be raised 
> only where there are both tininess and loss of accuracy as detected as a 
> denormalization loss, as opposed to tininess and inexactness.  But the 
> Alpha Architecture Handbook says "In the Alpha architecture, tininess is 
> detected by hardware after rounding, and loss of accuracy is detected by 
> software as an inexact result.", which indicates that option in IEEE 
> 754-1985 isn't relevant here.)
> 
> For double, there are a few cases of missing underflow exceptions, for 
> which the above analysis would apply.  But most of the failures there are 
> spurious underflow exceptions, which are more mysterious, as they include 
> cases where the result is large, nowhere near underflowing.  I'd suggest 
> finding out exactly which instruction, with what operands, is generating 
> the spurious underflow exception (possibly an instruction that generates 
> an exact subnormal result, where the underflow flag should not be set?).  
> And, again, see whether kernel emulation is involved for that instruction.
> 

I will try to investigate fma{f} as well.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]