gslclapack

Robert G. Brown rgb@phy.duke.edu
Tue Mar 28 13:43:00 GMT 2006


On Tue, 28 Mar 2006, C J Kenneth Tan -- OptimaNumerics wrote:

> Brian,
>
> On 2006-03-28 12:48 +0100 Brian Gough (bjg@network-theory.co.uk) wrote:
>
>> Date: Tue, 28 Mar 2006 12:48:30 +0100
>> From: Brian Gough <bjg@network-theory.co.uk>
>> To: James Bergstra <james.bergstra@umontreal.ca>
>> Cc: gsl-discuss <gsl-discuss@sources.redhat.com>
>> Subject: Re: gslclapack
>>
>> James Bergstra writes:
>> > On one hand, I'm jealous that lapack has a fancy routine, and react "let's
>> > implement that for gsl too" and on the other hand, I think "why is gsl
>> > re-implementing (what I believe is) a very standard library?"
>>
>> LAPACK is huge & tedious to install.  GSL is meant to be easy to use -
>> a dependency on lapack would be a pain.
>
> How do you find LAPACK being a pain?  The public version of LAPACK has
> been well tested and the design has been well thought out.  It has a
> very comprehensive test suite also.

Is this an either/or proposition?  Having its own lapack implementation
just eliminates a dependency which I agree is desireable, and since the
system-provided LAPACK often sucks as far as efficiency goes it also
means that one CAN gradually start to tune up gsllapack. If the calls
are the same, though, presumably one could link into an alternative
library via suitable compile/link flags.  I thought this was the way
things were being done already, actually, so one could e.g. use ATLAS
BLAS:

  Linking with an alternative BLAS library

  The following command line shows how you would link the same application
  with an alternative CBLAS library called `libcblas',

  $ gcc example.o -lgsl -lcblas -lm

  For the best performance an optimized platform-specific CBLAS library
  should be used for -lcblas. The library must conform to the CBLAS
  standard. The ATLAS package provides a portable high-performance BLAS
  library with a CBLAS interface. It is free software and should be
  installed for any work requiring fast vector and matrix operations. The
  following command line will link with the ATLAS library and its CBLAS
  interface,

  $ gcc example.o -lgsl -lcblas -latlas -lm

So I don't see why there is an issue here.  LAPACK efficiency was, I
thought, "largely" inherited from BLAS although I'm sure there are
additional efficiencies one can realize if one works for them.  However
as long as the API is the same, one could presumably do -lgsllapack vs
-llapack as a compile/link time choice.  This is moderately more complex
at compile time, but it is easy to document and methodology that any
good programmer should be familiar with anyway.

> What's the price of performance?  This question can be phrased as
> what's the price of electricity and what's the price of server room
> space.

This is dead right, of course.  NOT having the ability to use optimized
libraries can cost you as much as a factor of 2-3 in the case of ATLAS
tuned BLAS, which for a linear algebra application can double the
productivity/dollar of hundreds of thousands of dollars of cluster
hardware (as Ken's sales talk so nicely shows:-).  One doesn't want
there to be a disincentive to the use of GSL in HPC so it can continue
to take over the world:-)

However I don't see this is an either/or proposition, and looking at:

> http://www.OptimaNumerics.com/docs/hpc-asia05/hpc-asia05.pdf .

... the important question isn't benchmarks against "competitor", it
is benchmarks against the specific case of GSL+ATLAS tuned BLAS and the
marginal advantages there, using the same compiler and support library
otherwise.  I'm more than curious to see what the marginal advantages
are of tuned LAPACK >>beyond<< those provided by using a good ATLAS
tuned BLAS and corresponding LAPACK.

Or if you like, a longer term question that is quite worth bringing up
is to what extent it is desireable to introduce architecture specific
tuning into actual GSL code, period.

On the one hand, it makes the code ugly and extremely difficult to
maintain and offers small marginal performance gains in most places BUT
linear algebra.  It tends to depend heavily on compiler as well -- if
you write code optimized for e.g.  presumed SSE1 or SSE2 instructions
your compiler has to support then, not just your CPU.  It makes the code
in a sense less portable.  It is "expensive" in human time and developer
time in the GSL is probably largely donated and a precious resource.
I'd rather give up 20-30% in "potential" performance and use the GSL for
free, personally, blessing the developers and their contribution to
world civilization as I do so. (Thanks, guys!:-)

On the other hand, for some applications tuning CAN be a big, big
performance booster -- 2-3x -- and hence very valuable.  I'm working
(depending on whether and how much support I get) on a portable/open way
of writing at least moderately autotuning HPC code.  Something that will
get "most" of the benefit of a full ATLAS-like autotuning build without
architecture-specific instrumentation.  Don't hold your breath, but in a
year or three maybe this problem can be at least attacked without
playing the library juggling game.

In the meantime, hey, the GSL is open source, full GPL, viral.  That
means that as long as the LAPACK API is preserved -- or a wrapper of the
standard lapack calls provided -- you can ALWAYS hack and link your own
LAPACK in.  The effort in any event is WAY less than the effort required
to actually build an architecture-dependent LAPACK in the first place,
and you can freely market a "tuned" version of the GSL as long as you
provide its sources, and it seems to me that in the case of lapack this
is as MOST developing a wrapper.  I've done similar things already for
adding e.g. rng's to the GSL -- it isn't difficult.  Or am I missing
something?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu




More information about the Gsl-discuss mailing list