array math.h (SSE2?)
John D Lamb
J.D.Lamb@btinternet.com
Sat Mar 25 14:42:00 GMT 2006
James Bergstra wrote:
> Does anyone know of a library, or source file somewhere in which functions from
> math.h such as exp() and log() are implemented using SSE2? AMD's libacml_mv
> demonstrates that such an implementation can be faster than gcc's, and
> furthermore that multiple(2) functions can be computed in parallel on a single
> processor. (Making 2-3 fold speed improvements on array calculations).
>
> Or, taking a step back, is there a project analagous to ATLAS or FFTW that
> provides fast array computations of these functions on various architectures?
>
It wouldn't be too hard to write some of these. As an example,
#define SQRT( x, result ) ({ \
__asm__( "fsqrt\n\t" \
: "=t"( result ) \
: "0"( x ) ); })
is about 2-3 times as fast as std::sqrt and (I think) gives identical
results for doubles other than std::numeric_limits<infinity>() (which is
easily fixed).
However, two issues:
1. GSL is pretty well optimised if you use the appropriate flags for
pentium 4 SSE2: -march=pentium4 -malign-double -mfpmath=sse -msse -msse2
If you compile and check the code you will find a big speed improvement
and the SSE2-specific instructions already in the assembly that's
generated. I haven't checked the special functions (e.g. cos, exp) but
there's clearly room for gcc to use SSE2 optimisations (though not the
fcos op because it doesn't give the same answer as gsl_sf_cos).
2. Since gsl allows vector and matrix views, the doubles (assuming
you're using doubles) you might want to add are not necessarily stored
in contiguous memory, which limits the use of movapd and so limits the
value of addsd, subsd and mulsd to speed up calculations.
Of course, there's nothing to stop you writing your own functions that
operate on gsl_blocks and give you much faster arithmetic.
--
JDL
More information about the Gsl-discuss
mailing list