This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

BLAS rational


Please find attached, section 8. Rational
from an ancient paper

    A Set of Level 3 Basic Linear Algebra Subprograms

by Jack Dongarra, Jeremy Du Croz, Iain Duff and Sven Hammarling
available in PostScript from

    http://www.netlib.org/blas/blas3-paper.ps

The GSL documentation would also benefit from a section
which details the rationale behind the library design decisions.

The BLAS library design is a collection of compromises.
The authors maintain that it was necessary
to reduce the number of library subroutines
by adding options to subroutine argument lists
in order to maintain sufficient functionality.
Can anybody tell me why they thought this was necessary?
Was their reasoning sound?
And, if so, is it still valid today?
Library subroutines must include extra code
to process options at run time
which contributes to code bloat and execution time.

8. Rationale

In the design of all levels of BLAS, one of the main concerns is
to keep both the calling sequences simple and the range of options limited,
while at the same time maintaining sufficient functionality.
This clearly implies a compromise, and a good decision is vital
if the BLAS are to be accepted as a useful standard.  In this section
we discuss the reasoning behind some of the decisions we have made.

The three basic matrix-matrix operations were chosen
because they occur in a wide range of linear algebra applications:
this is consistent with the criteria used for the Level 1 and Level 2 BLAS.
We have again aimed at a reasonable compromise
between a much larger number of routines
each performing only one type of operation (e.g., B <-- L^TB),
and a smaller number of routines with a more complicated set of options.
There are in fact, in each precision, 6 real routines
performing altogether 48 different combinations of options,
and 9 complex routines
performing altogether 81 different combinations of options.
The number of routines is much smaller than in the Level 2 BLAS.

The routines that we have specified are not intended
as high-level matrix algebra routines,
but rather as building blocks for the construction of such routines.

In almost every case, where appropriate,
we include operations involving a matrix and its transpose
(the only exceptions are the _SYMM and _HEMM routines).
We could ask the user to transpose the input matrix
but feel that this would be an imposition,
particularly if the BLAS routine is being called
from deep within the user's code.
It would also increase the amount of data movement,
whereas one of the aims of our proposal is
to assist the development of software that minimizes data movement.

It could also be argued that algorithms can be rewritten
to require only one of the patterns of access
for symmetric, Hermitian or triangular matrices (i.e., upper or lower triangle),

but we do not feel that the BLAS should be dictating this to the user.

We do not provide routines for operations involving trapezoidal matrices;
all our triangular matrices are square.
This is consistent with the Level 2 BLAS.
It would be possible to extend the routines for triangular matrices,
so that they could handle trapezoidal matrices,
at the cost of introducing extra arguments.
On the other hand, a trapezoidal matrix can always be partitioned
into a triangular matrix and a rectangular matrix.

We have not included specialized routines to take advantage
of packed storage schemes for symmetric, Hermitian, or triangular matrices,
nor of compact storage schemes for banded matrices,
because such storage schemes do not seem to lend themselves
to partitioning into blocks and hence are not likely to be so useful
in the type of application we are aiming at.
Also packed storage is required much less
with large memory machines available today,
and we wish to keep the set of routines as small as possible.

We also have not specified a set of extended-precision routines
analogous to the ES and EC routines in the Level 2 BLAS,
since this would require a two-dimensional array in extended precision.

As with the Level 2 BLAS no check has been included for singularity,
or near singularity, in the routines for solving triangular equations.
The requirements for such a test depend on the application
and so we felt that this should not be included,
but should instead be performed outside the triangular solver.

We have tried to adhere to the convention of, and maintain consistency with,
the Level 2 BLAS;
however, we have deliberately departed from this approach in a few cases.
The input-output matrix C in the matrix-multiply routines
is the analogue of the vector y in the matrix-vector product routines.
But here C always has the same dimensions,
whereas y was either of length m or n depending on context.
In the rank-k update routines we have included a parameter beta
which was not present in the Level 2 rank update routines.
Here we felt that the parameter beta is useful in applications,
and since the matrix multiply routines
can also be viewed as rank-k update routines,
we have consistency between the MM, RK and R2K routines.

We have also added a parameter alpha
to the routines involving triangular matrices.
This was not felt to be needed in the corresponding Level 2 BLAS,
because there would be little additional cost in a separate operation
to scale the result vector by alpha.
However in the Level 3 BLAS where there is a whole matrix to be scaled,
it is advantageous to incorporate the scaling within _TRMM or _TRSM.

Additionally, we have provided for complex symmetric,
as well as complex Hermitian, matrices
since they occur sufficiently often in applications.

In our proposed naming scheme,
the first character (S, D, C or Z) indicates the relevant Fortran data type.
This conforms to the conventions already established
for the Level  1 and Level 2 BLAS, and also other software such as Linpack.
However the fact that single and double precision versions
of a BLAS routine have different names can be an obstacle to portability
because the actual precision of an S routine, say,
may differ considerably between machines.
For example, SGEMM on a Cray 2 will use arithmetic
with similar precision to DGEMM on an IBM 3090.
The ideal solution would be to use generic names,
not only for single and double precision versions,
but also for real and complex versions.
This option is not available in a standard Fortran 77 environment.
However
for implementations in other environments or programming languages,
which do permit generic names,
we propose that the first character of the Fortran 77 names
should simply be omitted, giving the generic names:
GEMM, SYMM, SYRK, SYR2K, HEMM, HERK, HER2K, TRMM, TRSM.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]