This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Sparse matrix extension

From: Patrick Alken <alken at colorado dot edu>
To: Alexis Tantet <alexis dot tantet at gmail dot com>
Cc: "gsl-discuss at sourceware dot org" <gsl-discuss at sourceware dot org>
Date: Sun, 7 Feb 2016 10:25:39 -0700
Subject: Re: Sparse matrix extension
Authentication-results: sourceware.org; auth=none
Authentication-results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=colorado.edu;
References: <CAMWWPT3uJj4Vrn7ut6+F18gY===zd6+1r1UJhz0hcCj--zwtdg at mail dot gmail dot com> <CAMWWPT3Y=x-vYphaV+2gHPb9WZqEfYdDxs4T0KqBq987VjvDQA at mail dot gmail dot com> <569E6C33 dot 1090505 at colorado dot edu> <CAMWWPT2kMe=R0qUz4P6n_xzG+oW-8pOFwCtBuaritZ5MF6jzAg at mail dot gmail dot com> <569EA1A9 dot 2080101 at colorado dot edu> <CAMWWPT2d=PnK1cexcZ+OLQAYJudRk9QLGac-Wcn33OaES5UjJA at mail dot gmail dot com> <56B689B1 dot 5090005 at colorado dot edu> <CAMWWPT1mog0HrviL1tMo=f-rrSc2PhamWDGg7ZYLrVnnqkY3ng at mail dot gmail dot com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:23

On 02/06/2016 06:03 PM, Alexis Tantet wrote:
> Hi Patrick,
>
> Thanks for working on it. Following, some comments:
>
> On Sun, Feb 7, 2016 at 1:02 AM, Patrick Alken <alken@colorado.edu> wrote:
>> Hello,
>>
>>   I've been working my way through your patch. There is a new branch on
>> the git called 'sparse' with all the current changes. You've done a lot
>> of work on your patch so its taking me a while to get through all of it.
>>
>>   The main functionality of CRS is in there, along with basic stuff like
>> memcpy, transpose_memcpy, transpose. Also dgemv works (and hence also
>> the GMRES linear solver).
> Since the GMRES is already implemented, do you think it would be worth
> using the Arnoldi method to find eigenvalues and eigenvectors of
> sparse non-hermitian matrices?

If you think you can make a nice implementation, go for it. We could add
it to the gsl_eigen interface.

>
>> I did some initial work on the fprintf/fscanf
>> for the triplet case - for the compressed matrices, I am wondering if we
>> should stick to a standard format such as Harwell-Boeing, I'm not sure
>> at the moment. We might be able to avoid this issue by making a binary
>> fwrite/fread interface which just directly writes the 3 arrays to disk.
> I agree that a standard format would be best but I am not aware of any
> dominant format.
> Is Harwell-Boeing one of them?
> It seems important to me to have both binary and formatted input/output
> (for coherence with gsl_matrix and because formatted files are more
> handy to share).
> Also, it is important to have some header indicating how many elements
> should be read
> and what is the dimension of the matrix.

I'm not aware of a standard format either, I'll try and look into this
further to come up with a good solution.

>
>> Things I still need to add for the CRS storage:
>>
>> 1. spmatrix_add
>> 2. spblas_dgemm
>> 3. fprintf/fscanf
>> 4. transpose - for now I'll keep it the way you coded it, but I think it
>> would be better to keep the matrix in the same storage format during a
>> transpose, ie: the user might want to pass A^T to an external solver
>> library which requires CCS and they might not expect the transpose
>> routine to change it to CRS. I did a quick search for in-place transpose
>> algorithms in CCS/CRS and they do appear to exist though I haven't had
>> time to implement it yet.
> Fine, but then I think it is important to add a separate function, say
> "transpose_switch_major",
> for which the transposition is performed by just changing the major
> (i.e. row->col or col->row)
> since this is the cheapest way to transpose while some libraries may
> work equally well with CRS or CCS.

Ok I like this idea too since it is very efficient.

>
>> I removed the inner/outer idx stuff from gsl_spmatrix - its not
>> necessary and also I'd like to avoid modifying the gsl_spmatrix struct
>> since any changes would break binary compatibility for the next release.
> That makes perfect sense, but then you need to test each time for the
> major to know what is the size of the pointer?

I am just making separate routines for CCS and CRS (ie: gsl_spmatrix_ccs
and gsl_spmatrix_crs to do triplet -> compressed). I think its simpler
this way.

>
>> Also your modification to gsl_spmatrix_set - I see what you're trying to
>> do but I think its better to add a new function:
>>
>> double *gsl_spmatrix_ptr(gsl_spmatrix *m, size_t i, size_t j)
>>
>> which returns a pointer to the data array for element (i,j), and the
>> user can then add whatever value they want (this is how its done with
>> gsl_matrix and gsl_vector). I think it should be straightforward to add
>> this function, possibly even for the compressed storage too.
> Then I would also add a compressed matrix constructor (or an option) in which
> gsl_spmatrix_ptr is used to sum duplicate entries instead of gsl_spmatrix_set.
> This possibility is offered by some other libraries such as Eigen
> and is for example useful to build matrices counting (e.g. transitions).

I'm not sure what you mean. I've added a new function gsl_spmatrix_ptr
to the git, which as far as I can tell does exactly what your
sum_duplicate flag does. It searches the matrix for an (i,j) element,
and if found returns a pointer. If not found a null pointer is returned.
This makes it easy for the user to modify A(i,j) after it has been added
to the matrix. Are you thinking of something else? Can you point me to
the Eigen routine?

>
>> Anyway, just wanted to let you know things are progressing (though a
>> little slowly).
> Thanks a lot! Would you like me to work on something in particular?
>
> Best,
> Alexis
>
>> Patrick
>>
>> On 01/20/2016 11:31 AM, Alexis Tantet wrote:
>>> Hi Patrick,
>>>
>>> I've cloned gsl.git, made my modifications to spmatrix/ and spblas/
>>> and pushed them to master at:
>>> https://github.com/atantet/gsl
>>>
>>> This time, I've tried to modify the code as little as possible. I've
>>> usually added comments labelled "AT" with each modification.
>>> I also have a few questions about the code which I've labelled "?AT".
>>> Finally, I've added the corresponding tests to both test.c (successful
>>> on my machine).
>>>
>>> I see that duplicates are handled for the triplet format by replacing.
>>> Thus, I removed the function taking care of sorting and summing
>>> duplicates that I had added to _compress (the trick was to transpose
>>> with copy to reorder first, transpose back and remove the duplicates
>>> then, taken from Eigen C++). Instead, I have added an option to sum
>>> the duplicates (useful e.g. when counting links of a graph or
>>> transitions of a Markov chain). One may need the third option of
>>> repeating the triplets, but then avl_insert should also be modified.
>>>
>>> However, if I've understood well, no sorting of the inner indices is
>>> performed right now. This could be problematic in the future (e.g. in
>>> functions such as _equal).
>>>
>>> For _spdgemm for CRS matrices, I've used the trick (A B)^t = B^t A^t
>>> to convert the problem to CSC. I've tested it successfully but you may
>>> not like the coding style.
>>>
>>> I hope that helps,
>>> Best,
>>> Alexis
>>>
>>> On Tue, Jan 19, 2016 at 9:50 PM, Patrick Alken <alken@colorado.edu> wrote:
>>>> On 01/19/2016 12:55 PM, Alexis Tantet wrote:
>>>>> Hi Patrick,
>>>>>
>>>>> Thanks for the quick reply! I wanted to be sure that this contribution
>>>>> is useful before to spend time on the merging with the latest version.
>>>>> I will create the gsl.git repository and work on it during the week.
>>>>>
>>>>> I had already had a look at the documentation but did not know about
>>>>> the iterative solvers (a link between each modules would be useful).
>>>>> My contribution indeed fits in the sparse matrix module + the update
>>>>> of the dgemm and dgemv functions to support CRS (an update may also be
>>>>> needed for the solvers).
>>>> The solvers only call dgemv (as far as I remember) so they shouldn't need an
>>>> update once dgemv is updated.
>>>>
>>>>> I have also developed a simple C++ object allowing to use gsl_spmatrix
>>>>> as a user-defined matrix in ARPACK++ (a maintained fork of ARPACK++
>>>>> can be found at https://github.com/m-reuter/arpackpp), allowing to
>>>>> avoid having to use other libraries such as superLU. It could be
>>>>> useful to others, maybe as an extension. Now that I think about it,
>>>>> the iterative solvers could also be used to support the shift and
>>>>> invert modes (see ARPACK++ documentation). What do you think (I could
>>>>> work on it)?
>>>> I've never used ARPACK, but if you want to make an extension to interface
>>>> GSL/ARPACK its fine with me - such an extension would never be added to the
>>>> main GSL code since GSL tries to be as standalone as possible.
>>>>
>>>>
>>>>> If you have major comments, the sooner the better, so that I can work
>>>>> on them while merging.
>>>>>
>>>>> Thank you for your interest,
>>>>> Alexis
>>>>>
>>>>> On Tue, Jan 19, 2016 at 6:02 PM, Patrick Alken <alken@colorado.edu> wrote:
>>>>>> Hi Alexis,
>>>>>>
>>>>>>    This looks like very good work! Adding compressed row storage has been
>>>>>> on
>>>>>> my todo list for a while. The 'gslsp' extension is unfortunately very out
>>>>>> of
>>>>>> date, and the current git contains newer code (including a GMRES
>>>>>> iterative
>>>>>> linear solver). I removed the gslsp extension from the web page a while
>>>>>> back
>>>>>> to reflect this. You can browse the latest manual to see the current
>>>>>> sparse
>>>>>> matrix capabilities (http://www.gnu.org/software/gsl/manual/gsl-ref.pdf)
>>>>>> -
>>>>>> there are 3 chapters: sparse matrices, sparse blas and sparse linear
>>>>>> algebra
>>>>>> - it looks like your contributions will fit into the sparse matrices
>>>>>> chapter.
>>>>>>
>>>>>>    Would you be able to verify that your changes are compatible with the
>>>>>> current gsl.git repository? This will make it much easier for me to merge
>>>>>> everything into the git when ready. It would be best if you made a new
>>>>>> branch of gsl.git, and add your changes so I can then pull them from
>>>>>> github
>>>>>> or somewhere. I will try to find some time in the next few days to look
>>>>>> over
>>>>>> your code.
>>>>>>
>>>>>> Thanks again,
>>>>>> Patrick
>>>>>>
>>>>>>
>>>>>> On 01/19/2016 09:43 AM, Alexis Tantet wrote:
>>>>>>> Dear GSLers,
>>>>>>>
>>>>>>> As a scientist rather than a developer, I have developed an extension
>>>>>>> of the sparse matrix module (CRS, I/O, manipulation, see below), which
>>>>>>> I have tested. These modifications conserve the structure of the
>>>>>>> original module and be useful for a large number of sparse matrices
>>>>>>> users.
>>>>>>>
>>>>>>> I'm not familiar with the contributing process here. My repository can
>>>>>>> be found there:
>>>>>>> https://github.com/atantet/gslsp
>>>>>>> Unfortunately, I did not know of the gsl.git repository and I forked it
>>>>>>> froml:
>>>>>>> https://github.com/drjerry/gslsp ,
>>>>>>> which seems to be a bit older than gsl.git.
>>>>>>>
>>>>>>> How can I push/merge to gsl.git ? Should it be as an update or another
>>>>>>> extension? Is it necessary to adapt to the newest version of the code
>>>>>>> ?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Alexis Tantet
>>>>>>>
>>>>>>> CHANGES.md:
>>>>>>>
>>>>>>> Extension of the sparse matrix module of GSL
>>>>>>>
>>>>>>> ===================================
>>>>>>>
>>>>>>> Introduction
>>>>>>> ------------
>>>>>>>
>>>>>>> Usages of sparse matrices are numerous in scientific computing.
>>>>>>> When numerical linear algebra problems become large, sparse
>>>>>>> matrices become necessary to avoid memory overload and unnecessary
>>>>>>> computations, at the cost of element access and matrix construction.
>>>>>>> As a result, most large scale linear solvers or eigen solvers perform
>>>>>>> on sparse matrices.
>>>>>>>
>>>>>>> Fortunately, a very useful sparse matrix module has recently been
>>>>>>> introduced to GSL.
>>>>>>> However, important features are still lacking, such has
>>>>>>> Compressed Row Storage (CRS) matrices, input/output functions and
>>>>>>> other matrix properties and manipulation functions.
>>>>>>> This new version attempts to address this, conserving the original
>>>>>>> structure of the module and conventions.
>>>>>>>
>>>>>>> Major changes
>>>>>>> -------------
>>>>>>>
>>>>>>> * Add CRS format and update functions manipulating compressed matrices :
>>>>>>>        - additional flag GSL_SPMATRIX_CRS and macro GSLSP_ISMATRIX (
>>>>>>> gsl_spmatrix.h )
>>>>>>>        - additional members innerSize and outerSize used to iterate
>>>>>>> matrix elements ( gsl_spmatrix.h )
>>>>>>>        - rename some variables for coherence ( gsl_spmatrix.h , *.c )
>>>>>>>        - update all functions on compressed matrices ( *.c )
>>>>>>> * Allow to sum duplicate elements when compressing ( spcompress.c ) :
>>>>>>>        - modify gsl_spmatrix_compress
>>>>>>>        - add    gsl_spmatrix_sum_duplicate
>>>>>>> * CCS <-> CRS and fast transpose inplace in spswap.c :
>>>>>>>        - add gsl_spmatrix_switch_major
>>>>>>>        - add gsl_spmatrix_transpose
>>>>>>> * Add printing and scanning functions in spio.c :
>>>>>>>        - add gsl_spmatrix_fprintf
>>>>>>>        - add gsl_spmatrix_fscanf
>>>>>>> * Add manipulation functions in spmanip.c (particularly useful for
>>>>>>> Markov chain transition matrices) :
>>>>>>>        - add gsl_spmatrix_get_rowsum : get vector of sum over row
>>>>>>> elements
>>>>>>>        - add gsl_spmatrix_get_colsum : get vector of sum over column
>>>>>>> elements
>>>>>>>        - add gsl_spmatrix_div_rows   : divide all elements of each row
>>>>>>> by a vector element
>>>>>>>        - add gsl_spmatrix_div_cols   : divide all elements of each
>>>>>>> column by a vector element
>>>>>>> * Add test functions in atprop.c :
>>>>>>>        - add gsl_spmatrix_gt_elements : greater than test for each
>>>>>>> matrix
>>>>>>> element
>>>>>>>        - add gsl_spmatrix_ge_elements : greater or equal than test for
>>>>>>> each matrix element
>>>>>>>        - add gsl_spmatrix_lt_elements : lower than test for each matrix
>>>>>>> element
>>>>>>>        - add gsl_spmatrix_le_elements : lower or equal than test for
>>>>>>> each matrix element
>>>>>>>        - add gsl_spmatrix_any         : test if any non-zero element in
>>>>>>> matrix
>>>>>>>
>>>>>>> Other minor changes have been made, such as error tests.
>>>>>>> test.c has also been updated to test new features.
>>>
>
>

Follow-Ups:
- Re: Sparse matrix extension
  - From: Alexis Tantet

References:
- Re: Sparse matrix extension
  - From: Alexis Tantet
- Re: Sparse matrix extension
  - From: Patrick Alken
- Re: Sparse matrix extension
  - From: Alexis Tantet
- Re: Sparse matrix extension
  - From: Patrick Alken
- Re: Sparse matrix extension
  - From: Alexis Tantet
- Re: Sparse matrix extension
  - From: Patrick Alken
- Re: Sparse matrix extension
  - From: Alexis Tantet

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]