C J Kenneth Tan -- OptimaNumerics
Tue Mar 28 23:38:00 GMT 2006


On 2006-03-28 08:44 -0500 Robert G. Brown ( wrote:

> If the calls are the same, though, presumably one could link into an
> alternative library via suitable compile/link flags.

Yes, the API need to be kept the same.  

> So I don't see why there is an issue here.  LAPACK efficiency was, I
> thought, "largely" inherited from BLAS although I'm sure there are
> additional efficiencies one can realize if one works for them.  

Depends on the definition of "largely".  In our LAPACK benchmarks, we
use the same BLAS for the benchmarks, therefore showing the
differences in the LAPACK layer.  We found the differences could be
very high (as much as thousands of percents).  

> However I don't see this is an either/or proposition, and looking at:
> > .
> ... the important question isn't benchmarks against "competitor", it
> is benchmarks against the specific case of GSL+ATLAS tuned BLAS and the
> marginal advantages there, using the same compiler and support library
> otherwise.  I'm more than curious to see what the marginal advantages
> are of tuned LAPACK >>beyond<< those provided by using a good ATLAS
> tuned BLAS and corresponding LAPACK.

As I have mentioned above, the BLAS layer is kept the same in each of the
benchmarks, therefore the benchmarks show the differences in LAPACK

> Or if you like, a longer term question that is quite worth bringing up
> is to what extent it is desireable to introduce architecture specific
> tuning into actual GSL code, period.

Is this considered scalable from a development perspective?

> On the one hand, it makes the code ugly and extremely difficult to
> maintain and offers small marginal performance gains in most places BUT
> linear algebra.  It tends to depend heavily on compiler as well -- if
> you write code optimized for e.g.  presumed SSE1 or SSE2 instructions
> your compiler has to support then, not just your CPU.  It makes the code
> in a sense less portable.  It is "expensive" in human time and developer
> time in the GSL is probably largely donated and a precious resource.
> I'd rather give up 20-30% in "potential" performance and use the GSL for
> free, personally, blessing the developers and their contribution to
> world civilization as I do so. (Thanks, guys!:-)
> On the other hand, for some applications tuning CAN be a big, big
> performance booster -- 2-3x -- and hence very valuable.  I'm working
> (depending on whether and how much support I get) on a portable/open way
> of writing at least moderately autotuning HPC code.  Something that will
> get "most" of the benefit of a full ATLAS-like autotuning build without
> architecture-specific instrumentation.  Don't hold your breath, but in a
> year or three maybe this problem can be at least attacked without
> playing the library juggling game.

Is this approach scalable when it comes to highly complex scientific
code, as opposed to the comparatively more straight forward BLAS level

> In the meantime, hey, the GSL is open source, full GPL, viral.  That
> means that as long as the LAPACK API is preserved -- or a wrapper of the
> standard lapack calls provided -- you can ALWAYS hack and link your own
> LAPACK in.  The effort in any event is WAY less than the effort required
> to actually build an architecture-dependent LAPACK in the first place,
> and you can freely market a "tuned" version of the GSL as long as you
> provide its sources, and it seems to me that in the case of lapack this
> is as MOST developing a wrapper.  I've done similar things already for
> adding e.g. rng's to the GSL -- it isn't difficult.  Or am I missing
> something?

Yes, we agree on this.  This is what I think is a good option.  It is
crucial that LAPACK API is preserved.

Kenneth Tan
C J Kenneth Tan, PhD
OptimaNumerics Ltd                    Telephone: +44 798 941 7838
E-mail:      Telephone: +44 207 099 4428
Web:    Facsimile: +44 207 100 4572

More information about the Gsl-discuss mailing list