This is the mail archive of the
mailing list for the GSL project.
Re: Sparse matrix extension
- From: Patrick Alken <alken at colorado dot edu>
- To: Alexis Tantet <alexis dot tantet at gmail dot com>
- Cc: "gsl-discuss at sourceware dot org" <gsl-discuss at sourceware dot org>
- Date: Wed, 10 Feb 2016 08:55:48 -0700
- Subject: Re: Sparse matrix extension
- Authentication-results: sourceware.org; auth=none
- Authentication-results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=colorado.edu;
- References: <CAMWWPT3uJj4Vrn7ut6+F18gY===zd6+1r1UJhz0hcCj--zwtdg at mail dot gmail dot com> <CAMWWPT3Y=x-vYphaV+2gHPb9WZqEfYdDxs4T0KqBq987VjvDQA at mail dot gmail dot com> <569E6C33 dot 1090505 at colorado dot edu> <CAMWWPT2kMe=R0qUz4P6n_xzG+oW-8pOFwCtBuaritZ5MF6jzAg at mail dot gmail dot com> <569EA1A9 dot 2080101 at colorado dot edu> <CAMWWPT2d=PnK1cexcZ+OLQAYJudRk9QLGac-Wcn33OaES5UjJA at mail dot gmail dot com> <56B689B1 dot 5090005 at colorado dot edu> <CAMWWPT1mog0HrviL1tMo=f-rrSc2PhamWDGg7ZYLrVnnqkY3ng at mail dot gmail dot com> <56B77E13 dot 1000306 at colorado dot edu> <CAMWWPT2KTLjufNdwW=y-xaJ1dKkxDNQwgat4DB4ZwKDrK_fZSA at mail dot gmail dot com> <56B7A59D dot 5040707 at colorado dot edu> <CAMWWPT0CX+ti2j6QM8YmHqv+n8yg7sK69dtSdzKOQMFx3pXeBA at mail dot gmail dot com> <56B7B85C dot 10508 at colorado dot edu> <CAMWWPT1ELN9_BFLG=KUQ=Z6rC6ytYGTm0_K-=4-urJ6HGxfFJQ at mail dot gmail dot com> <CAMWWPT0qUccM6cNkX6fWezmj09SdcgYqYzGE3eoqetpuyZW6Vg at mail dot gmail dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
I'm in favor of simplicity and easy-parsing, so matrix market sounds
good to me. I'll take a look at your latest code in the next few days.
On 02/10/2016 06:16 AM, Alexis Tantet wrote:
Regarding the file format for sparse matrices, the one I have coded
actually happen to be the coordinate format implemented by Matrix
Market (the platform to share test data such as sparse matrices), with
the addition of a matrix type header:
It is also written that "Harwell-Boeing" is the most commonly used
sparse matrix format, but that:
"Unfortunately the HB specification is somewhat complex difficult to
parse from languages other than Fortran, biased in favor of compressed
column representation and not easily extensible. Some of these factors
may have impeded the more widespread use of the HB format"
It seems to me that complying to the Matrix Market coordinate format
would be the right choice, in terms of ease of implementation,
compliance to other packages and user-friendliness. I could update the
print/scan functions accordingly (mostly handling the header). What do
On Mon, Feb 8, 2016 at 1:59 AM, Alexis Tantet <firstname.lastname@example.org> wrote:
Ok, my mistake, now I see where I got confused.
I had in mind to add all the elements first to the triplets and only
while converting to compressed sum up the duplicates.
While, indeed, if there's a way you can sum up the duplicates directly
while adding them to the triplet matrix (thanks to _ptr), this is more
handy and efficient.
Thanks for the clarification,
On Sun, Feb 7, 2016 at 10:34 PM, Patrick Alken <email@example.com> wrote:
By design, gsl_spmatrix_set won't allow you to do this.
If you add element (i, j, x) and then later to try add element (i, j,
y), gsl_spmatrix_set will detect that there exists an element in the (i,
j) spot and it will simply change x to y - the value of x will be
overwritten by y. This is the same behavior as gsl_matrix_set.
So no duplicates are allowed by design. If you have such an application
where you want to keep track of duplicates, you could do the following:
double *ptr = gsl_spmatrix_ptr(m, i, j);
*ptr += x; /* sum duplicate values */
gsl_spmatrix_set(m, i, j, x); /* initalize to x */
On 02/07/2016 01:31 PM, Alexis Tantet wrote:
I'm not sure I got your last point. I have the following situation in mind:
Start to construct a transition matrix in triplet format, adding one
element after another.
In this particular example, each element is one count of a transition
from (state, box, etc.) i to j,
so I add elements (i, j, 1) to the triplet object, with possibly duplicates.
What happen to these duplicates in the binary tree?
Eventually, when I compress to CRS or CCS, I would like the duplicates
to be summed up, so that element (i, j) counts transitions from i to j
(and no duplicates exist after compression).
Is this more clear?
On Sun, Feb 7, 2016 at 9:14 PM, Patrick Alken <firstname.lastname@example.org> wrote:
I'm not sure what you mean. I've added a new function gsl_spmatrix_ptr
to the git, which as far as I can tell does exactly what your
sum_duplicate flag does. It searches the matrix for an (i,j) element,
and if found returns a pointer. If not found a null pointer is returned.
This makes it easy for the user to modify A(i,j) after it has been added
to the matrix. Are you thinking of something else? Can you point me to
the Eigen routine?
What I meant is to have the equivalent of gsl_spmatrix_compress,
with the difference that gsl_spmatrix_ptr is used instead of gsl_spmatrix_set,
so has to build the compressed matrix from triplets, summing the
duplicates, instead of replacing them.
This is what is done here :
I'm not sure why a user would ever need to do this. The whole point of
the binary tree structure in the triplet storage is to efficiently find
duplicate entries, so that if a user tries to call gsl_spmatrix_set on
an element which is already been previously set, it can find that
element with a binary search (rather than linearly searching the arrays)
and change the value of that element.
Therefore, the way the triplet storage is designed, there is will never
be a duplicate element in the triplet arrays. All of the (i[n],j[n])
will be unique for each n <= nz.
Am I missing something?