This is the mail archive of the
gsl-discuss@sources.redhat.com
mailing list for the GSL project.
Re: About coordinated efforts on scientific software.
- From: Christos Siopis <siopis at umich dot edu>
- To: gsl-discuss at sources dot redhat dot com
- Date: Tue, 22 Oct 2002 19:33:33 -0400 (EDT)
- Subject: Re: About coordinated efforts on scientific software.
On Mon, 21 Oct 2002, Manoj Warrier wrote:
> I guess (hope rather) that GSL will eventually cover the numerical library
> part of point (1). For plotting and graphics we again have a similar
> situation as in the "mathematic packages" ... Check out
> (http://scilinux.sf.net/graphvis.html" for a list of free packages.
I think the problem is not so much with lack of libraries as it is with
lack of an "integrated" environment where one can start with raw data,
pass them through various mathematical transformations, and finally plot
some result, all from inside the same "package"/environment that
encourages trial-and-error, what-if experiments, and rapid prototyping.
The first thought that one might have for achieving this would be to
somehow wrap a number of relevant libraries and use them from inside a
scripting language like python. I can see at least three kinds of problems
with this:
- First, most of the existing libraries are too low-level for direct use
from an interactive scripting environment. Things like memory allocation
(needed e.g. by GSL) or opening a window for plotting (needed by e.g. by
PGPLOT) are *show-stoppers* in an interactive environment. Some heroic
people are going through the pain of actually creating usable interfaces,
such as the PyGSL folks. This is fine, except two things: how do the
wrapper functions interoperate with functions from other wrapped libraries
(see next item below), and how do we ensure we do not enter into a
versioning hell, where the wrapper uses some version A of the library, but
the library has now moved on to version B; add some RPM versioning issues
if you use RedHat's stuff and multiply all this by the number of libraries
which you want to wrap and enjoy the mess...
- Second, there is the issue of consistency of the user interface. For
instance, a NumPy (numeric python) user is used to the ufuncs, "universal"
functions with return type that depends on the input type. So if a NumPy
user wanted to compute the mean of an array, he or she would expect that a
function call like mean(arrayx) would return an long int or a float,
depending on whether arrayx is an array of longs or of floats. But doing
this through GSL/PyGSL, the user would have to use pygsl.statistics.mean
for a float or pygsl.statistics.long.mean for a long int, i.e. the user is
asked to think in terms of C, a strongly typed language. This is both
annoying and prone to hard-to-find errors. A related issue is the overlap
between wrappers of different libraries (e.g., NumPy already has a couple
of mean/average functions from other libraries!). And there is also the
issue of performance, as NumPy objects are converted back and forth to
different formats (some wrappers do a better job at this than others).
- Third, it's the question of "putting this all together". Wrappers are
good for wrapping a small number of small libraries. As you add more and
more, there's all sorts of issues related to the distribution of the
"final" package, the quality and homogeneity of the documentation, and so
on. If there was no other solution at hand, maybe this would all be
acceptable. But with commercial packages offering a "one-stop" solution
(despite a number of other disadvantages) i think the open-source science
community has to do better than that.
SciPy ( www.scipy.org ) is a package that tries to solve some of these
problems but i this it is a little too early to tell how good the outcome
will be, and i cannot help wondering how many more times the open source
numerical community will have to code and debug e.g. an FFT transform or a
statistical package and whether this is the best use of our resources...
Christos