[PATCH] SPU use a non-functional errno
patrick mansfield
patmans@us.ibm.com
Mon Apr 7 18:43:00 GMT 2008
Hi Jeff -
On Tue, Apr 01, 2008 at 04:10:24PM -0400, Jeff Johnston wrote:
> Patrick Mansfield wrote:
>> Hi Jeff -
>>
>> Not sure of the best way to handle this, please comment or apply, thanks!
>>
>
> If you want to override sys/errno.h, that's fine, but if you want impure.c
> changes, you'll have to override that file in libc/machine/spu as well.
OK.
>> Modify SPU to directly use errno, since it does not need reentrant code.
>> With this patch, any code using errno is 8 bytes smaller, plus there is no
>> function call.
>>
>> More importantly, testing of the SPU optimized math function asin showed a
>> decrease in time of 16%, a simple test took 16.6 seconds, where without
>> the change it took 19.7 seconds. Similar gains are likely for other math
>> domain checks in the SPU optimized math code (they are already set up for
>> branchless compare and setting of errno, but the code has to always read
>> and set errno).
>
> You should only be touching errno on an error. Normally, you have a local
> variable to store results and then check for failure. If failure occurs,
> you set errno appropriately. Thus, you only slow down on failure which is
> more than reasonable and if you get your local variable into a register(s),
> you can do very efficient checking/branching for the non-error case.
Adding an "if" can actually slow down the code on SPU, since it has
no branch prediction and a fairly long pipeline.
I ran some test cases with acosh, it has domain checking of x < 1.
There are also abs(x) < 1 checks, I did not try to run comparison tests
for any of those cases.
Is the following enough data for you to accept an updated patch?
We have four cases, with/without errno as a function, and with/without
branchless code for the domain check (branchless SPU code like we have in
newlib/libm/machine/spu/headers/dom_chkd_less_than.h vs normal C "if (x <
1)" code).
The compiler is generating a branch for the "if" code (sometimes it can
actually generate branchless code), and nicely adds a branch hint assuming
the branch will not be taken.
I'm using spu-gcc from IBM's CELL SDK 3.0.
For acosh(2) test case, normalized to 1 for the timing of the "if" and
errno a function, we have:
errno function non-function errno
with if 1.00 .99
branchless 1.10 .96
So branchless with errno function is the worst, branchless with
non-function errno the best, but it is not that much faster than the "if"
with non-function errno.
If the domain is bad, we'll be quite a bit slower for the "if"
cases. For acosh(.5) I see:
errno function non-function errno
with if 1.00 .93
branchless .99 .87
"if" with errno function is now the worst, but branchless with
non-function errno is quite a bit faster.
And code sizes are:
errno function non-function errno
with if 35052 34988
branchless 35020 34988
So non-function errno saves 64 or 32 bytes.
The test program also includes fprintf() calls, and it references errno in
the assist call, not sure how much smaller the non-functional errno makes
fprintf().
-- Patrick Mansfield
More information about the Newlib
mailing list