This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: switched sparse format to binary trees

From: Patrick Alken <patrick dot alken at Colorado dot EDU>
To: gsl-discuss at sourceware dot org
Date: Tue, 29 Apr 2014 13:52:49 -0600
Subject: Re: switched sparse format to binary trees
Authentication-results: sourceware.org; auth=none
References: <535DAFC0 dot 8090402 at colorado dot edu> <535ED7A5 dot 7020309 at lanl dot gov> <535EFCA5 dot 8050807 at colorado dot edu> <CAFoZWWg9ENCX0cxNBMGJLc9AxT0Tt2a9yaBq7NctcLLK4jLeVA at mail dot gmail dot com> <535FFA33 dot 90604 at lanl dot gov>

Thanks Gerard and Frank,

I've implemented the block allocation strategy so there are no moremalloc/free calls during the matrix assembly process. This was made mucheasier by not deleting individual nodes. In principle, individual nodescould be deleted if the user sets that matrix element to 0, since thepoint of sparse matrix data structures is to keep track of only non-zeroelements.

However, I think this is not common, and the simplicity of the memorymanagement scheme outweighs the benefit of handling this rare case I think.

For now, the spmatrix routines internally handle all of the blockallocation, node array pointers, etc. I don't think its really necessaryto expose this to the general user, since its already about as efficientas it will probably get, and most users (like myself) simply want a niceeasy intuitive interface to build a matrix.

It's a shame they don't explain why this is happening.
Those large factors are kind of nuts.

Yes those times for the hash method do seem unreasonably large. Ihaven't been able to find many other papers on this subject for comparison.


On 04/29/2014 01:14 PM, Gerard Jungman wrote:

On 04/29/2014 02:39 AM, Frank Reininghaus wrote:

  > you can create an array of many tree nodes in a single allocation
  > though. The allocator struct could then store a pointer to this array,
  > the total size of the array, and the index of the next node that is
  > not in use yet.

Right. This is what I meant by a pool allocator. Just to clarify, if
GSL provides an allocator mechanism, it should be more general than
this, but it is a good idea to provide an implementation of something
like this that a client can just invoke if they want. It also gives
clients an example implementation that they can use to create their
own variations when necessary.

This is why the avl library provides such a mechanism themselves.
They know that some clients really need it.

  > And saving memory for each individual node also means
  > that the entire tree will become more cache-friendly
  > and thus possibly more performant.

Even more important than the memory savings is the control that the
client gains over cache coherency and other memory layout issues.
The main lossage with malloc is that there is only one. So allocations
of different kinds of memory can get interleaved, and it may become
impossible to control your layout and caching. If the allocations
are not exposed, then the client is wedged.

  > If individual nodes could be deleted at some point, then the
  > allocator needed some bookkeeping of reusable nodes of its own,
  > but the only node deletion that can take place is that all nodes
  > are deleted together, then this would probably be quite
  > straightforward to implement.

In practice, you are probably right, the nodes would all
be deleted together. So we do not have to worry about it
too much.

For the sake of completeness though, it is worth pointing
out at least one simple way to handle the general case.
An allocator can implement a lazy-deletion strategy,
where the pool is allowed to fragment, until the
client requests a repacking. You can also trigger
a repacking when a certain fragmentation level is
reached, although this kind of approach (garbage collection)
is somewhat dangerous if the client loses control of when
it happens. In any case, it is not hard to implement this
sort of thing.

Anyway, the main point is that once a general mechanism
exists, the client can create any arbitrarily complicated
scheme for themselves. They get to take control of
allocations, if they need to.

On 04/28/2014 07:13 PM, Patrick Alken wrote:

  > Anyway the study found that AVL trees were much superior to hash
  > tables in terms of speed. The study is here:
  >
  > http://link.springer.com/chapter/10.1007%2F978-3-540-25944-2_135

It's a shame they don't explain why this is happening.
Those large factors are kind of nuts.

--
G. Jungman

Follow-Ups:
- Re: switched sparse format to binary trees
  - From: Gerard Jungman

References:
- switched sparse format to binary trees
  - From: Patrick Alken
- Re: switched sparse format to binary trees
  - From: Gerard Jungman
- Re: switched sparse format to binary trees
  - From: Patrick Alken
- Re: switched sparse format to binary trees
  - From: Frank Reininghaus
- Re: switched sparse format to binary trees
  - From: Gerard Jungman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]