This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: GSL, VSIPL, and the ultimate numerical library

To: "Robert W. Brewer" <rbrewer at Op dot Net>
Subject: Re: GSL, VSIPL, and the ultimate numerical library
From: Gerard Jungman <jungman at lanl dot gov>
Date: Thu, 02 Nov 2000 12:00:16 -0700
CC: gsl-discuss at sources dot redhat dot com
Organization: LANL T-8
References: <87pukej4bw.fsf@monet.op.net>
"Robert W. Brewer" wrote:
> Performance is important for both of those, otherwise
> for offline use I would just use Octave or Matlab.

As a general comment, I would like to say that this statement
already introduces something important to the discussion.
Some people like to say that "GSL is some kind of GPL'ed
numerical recipes, but better". For several reasons, some
of them obvious, I felt that was a pathetic design
goal.

If we want progress on our own "numerical science" desktops,
we need to understand what it is that people like us really do.
This means addressing issues like how to obtain coherence
across different systems, at different abstraction levels,
like Octave vs. heavy-duty compiled stuff. I hope that
people start thinking in terms of a "GNU Scientific Environment".
Some already are; the ones that I know about are listening
on this list. Jim Amundson coined "GNU Scientific Environment",
and has put real thought into what it might mean. Maybe there
are more people hiding out there with good ideas. Numerical
computation is itself only a part of the vision; there are also
questions like "what can we do with the GPL'ed macsyma?".

I don't use Octave myself, but as an example, I do like
to wrap library functionality in python, so that is a model
I understand for coherence at different levels of abstraction.
GSL is supposed to be "easy to wrap into high level langauges",
at least that is one of the design goals. Well, of course it
is easy to wrap (roughly speaking, if you can type 'swig' and
hit the return key, then you are in business). But that does
nothing to address coherence, because then you just get the
same basically inflexible GSL semantics, bound to another
language. It doesn't help much.

As we've learned in other areas, the key seems to be
a high level of genericity, separating data from
algorithms. Once you get rid of the baggage which is
not relevant to the problem domain, you have a hope
of being able to mix and match solutions.

This is a hard but fundamentally important problem.
I have hope basically only because of the actual
progress that has been made on genericity in C++
over the last 5 years or so. Not only are all these
template shenanigans nifty within the context of C++,
but they provide a model for what is needed outside
of any specific language; the language simply provides
the tool which makes it possible to explore all these
things that people have been trying to get at for
so long.

 
> It looks like GSL has a
> similar problem, many functions are written 3 and 4 times
> just to support the different float and int types the user might want
> to instantiate. 

Yup. That really stinks.

> And after all that, it still can't support
> neat things like arbitrary precision numbers, while a templating
> scheme might be able to.

Exactly. I am interested in that specific issue.
I think fundamental questions like "ok, but what
happens when this is 'long double' or 'float'"
need to be addressed on day one. Not just from the
standpoint of genericity in containers, but also
in algorithms. In principle, algorithms change too,
and you need to encapsulate the parts which vary.

I would be happy to start with some sort of
traits-class or similar solution, which is
basically a table-driven approach, lets say
assuming the existence of a complete and valid
std::numeric_traits implementation.

Support for actual (but static, compile-time) arbitrary
precision would be much harder. But we can almost see
how something like that could work.


> To boil it down, the library I would like to use would be
> (most important first):
>    high performance

Performance is a multidimensional space. Memory access
should be tightly controlled and appropriately parametrized
for different platforms. But I'm willing to loose
cycles; cycles are dirt cheap.

>    generic to allow optimizing under the hood
>       and advanced algorithms

Absolutely. Genericity is the key.

A certain group of language gurus (the aspect-oriented
programming community) likes to talk about what they
call "cross-cutting". To the extent that I understand
what this is, it seems to be one of the pervasive problems
in numerical computing. We have problem domains, which give
certain kinds of abstractions, and we have algorithms and
data, each of which provide other kinds of abstractions.
But these abstractions tend to cut across each other,
sometimes interfering destructively. Somehow, we have
to get more generic in order to defeat this problem.

>    templated for data type choices and easier maintenance

Yup. I think I can tell from your examples (emphasis on data flow,
linear operations, FFTs,...) that your emphasis is genericity
of containers. But as mentioned above, in the larger view
we must also understand how this works together with
parametrization of algorithms.

>    open source

Absolutely.

>    easily tested for correctness with an automated suite

Ahh, testing. What are we going to do about testing?
I though it was very interesting when the software-carpentry
contest had to give up on the testing module the first
time around. That is a very hard problem.

As somebody said in that context (I'm sure it's been said
many times), if your project was not designed from the
start with testing in mind, then you're cooked. The
idea of some kind of magic universal test harness is
pure fantasy.

I don't know what to do about it. I see very little here
or on the horizon which would help. I think we would have
to cobble together bits and pieces of existing tools and
hope for the best.


> I think the VSIPL API has a lot of good ideas for the
> functions that it supports, but the reference implementation
> would be easier to deal with if it were templatized.

I agree. Oh well.


> Maybe it's my personality, but making the same change
> in more than one place just wears me out.

Yup. We only live so long. And besides, when something
wears you out like that, you get angry with it and start
losing faith in the implementation. Just more tedious
foo to keep track of. I don't see people talk about
this much, but I think the project design has to
support positive psychology. This isn't Microsoft;
we should be able to afford doing things the right way.


> The GSL team has implemented a large breadth
> of functions, many of which have no equivalent in the VSIPL
> spec, but which could probably benefit from some of its principles.
> Since both the TASP reference implementation and
> GSL are open source, it might be possible to gradually
> combine them, as Randall Judd seems interested
> in doing.  Many GSL functions not present in VSIPL might
> be wrapped to present an interface similar to the VSIPL API.
> Then gradually the underlying GSL code could be moved into
> the wrapper to increase performance, better conform to
> VSIPL, etc.

I guess I have no real comment on this. There's no reason they
can't get along. But I'm not sure if either one is what I am
really interested in. The level of combining that was discussed
was getting them a little more semantically aligned so that
they could coexist (specifically memory management issues).
I don't think anything else was intended.

I like the opaqueness of the VSIPL data objects, and I don't
very much like the defacto translucency of the GSL data
objects. But in any case, it's hard to see how we can stop
short of a full-blown template-based implementation (something like MTL)
and still get what we want. But maybe this is still the wrong emphasis,
from the standpoint of coherence. I don't know.

Personally, I would rather not have to think about
things like vector/matrix implementations. I am ready, at
least as a first approximation, to declare this problem
as solved (say by MTL, or one of the few other related
projects) and move on. After all, if we have achieved the
correct level of genericity, then it shouldn't matter
which data representation I choose to use today.


> It might also be nice to go back to the
> technique in the Hughes code of implementing most of the
> functions in terms of other VSIPL/GSL functions or lower-level
> helper functions whenever possible.

Sure, isolating a kernel of operations is a time-honored approach.
For instance, in GSL we have a BLAS kernel. I made it so that you
could use the native GSL implementation (nothing great), or link
in your own high performance BLAS kernel.

Now, why the GSL BLAS impl had to be completely native is a mystery
even to me. That was a dumb idea. I blame myself, although only
in part since it was based on general brain-damaged GSL philosophy.
It's still not even finished. We should have just used the
CBLAS model for wrapping existing fortran BLAS and provided
a copy of the fortran reference, with the same "link your
own if you like" methodology.

BLAS is BLAS. It works. It's the same everywhere. It will
never change. GSL commited a serious sin of wheel-reinventing
there, and it stains the whole project.

I would have also liked to see a LAPACK kernel, with all the
same thoughts. Again, CLAPACK solved all the problems (not
nearly as easy as BLAS), and we should have leveraged that
as far as it could go.

Will these sins be corrected? Before version 1.0?


> But the idea of writing Rob's Gee Whiz Altivec Library
> is not very appealing to me, first because VSIPL and GSL have
> put a lot of thought into how to do it right,

I wouldn't oversell the level of thought in GSL.
I know; I was there.


> I guess in the end I can understand why Tisdale
> is so interested in the vision and goals of GSL, so
> it is easier to decide if it is the same direction
> that he would like to go.  I am in a similar quandary,
> there are several interesting-looking projects with
> various tradeoffs, and looking into the future a little would
> help me bind to one or decide to start my own.

It's a real problem. GUI's were in the same situation
for a decade. Which solution do you use? Motif? (GACKKK!)
Xt? (mega GACKKK!), one of the horde of half-assed libraries?
(GACKKK, GACKKK, GACKKK,...)

Now we have gnome and kde. Both are good. Both are defacto
standards. Everybody benefits.

Standards help everybody. It doesn't
matter so much how they come about.
But if we can all more or less agree that
something is good, then we can feel good
about using it. This has not yet happened
in the world of numerical tools.


Thanks for your thoughts (and another
chance for me to stand on the soapbox).

-- 
G. Jungman
References:
- GSL, VSIPL, and the ultimate numerical library
  - From: Robert W. Brewer
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]