This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: GC and threads (was Re: generational GC)


Jim Blandy writes:

   I think what you're describing here is equivalent to allocating new
   cells with the mark bit set.  The GC won't sweep up such cells until
   end of the next GC cycle.

Yes.  Thanks - I hadn't realised that.

   Here's the way the literature generally discusses this kind of thing.

   [...]

   Every traversal algorithm divides nodes into three classes:
   1) Nodes we've visited and dealt with.  Call these "black" nodes.
   2) Nodes we've not visited, or seen references to.  Call these "white" 
      nodes.  At the end of the traversal, any nodes that are still white
      are garbage.
   3) Nodes we've found pointers to, but haven't yet visited.  Call these
      "grey" nodes.  For example, if you're using a mark stack, the nodes
      on the stack are grey.

I don't understand the distinction between grey and black here.  In
what sense can we have found a pointer but not "visited" the
corresponding node?

Is this perhaps an implementation idea to avoid marking loops?

   [...]

   If you watch the traversal in progress, you'll see a wavefront of grey
   cells spreading out from the roots, leaving black cells behind it, and
   with white cells ahead of it.  By the end of the cycle, any nodes the
   grey wavefront never reached are still white, and will be freed.

Nice picture!

   Some GC's follow the "tricolor invariant", which states that no black
   cell may ever refer directly to a white cell.  Black cells may point
   only to grey cells, and other black cells.  Since the grey cells are
   yet to be visited, if you preserve this invariant, you're guaranteed
   to visit every reachable white cell.

Ah - so you might say that the point of the grey/black distinction is
that it allows us to state (1) this invariant, and (2) that the
endpoint of the GC mark cycle is when there are no grey cells.

   In this terminology, your strategy would be described as "allocate
   black."  That is, new nodes are assumed to be live, and are left alone
   until the next GC cycle begins, at which point everything is colored
   white.

   The problem with allocating black is that the new cells can't be
   collected until the end of the *next* GC cycle, even though they're
   the most likely to die quickly.

Right.  So the basic tradeoff here is

(memory loss by not collecting the set of cells that are most likely
to have died young)

versus

(performance advantages of doing GC asynchronously on a dedicated
thread rather than "in-line" on program threads).

   So it's more efficient to use some kind of heuristic to color them
   grey or white.

   For example, suppose we mark the global roots first, and leave
   scanning the threads' stacks at the very end.  When a thread allocates
   a new cell, we can color the new cell white if the thread's stack
   hasn't been scanned yet, or black if it has.  That way, if we can
   reach a good portion of the heap through the global roots, we can do
   most of our allocation white, and those cells can be reclaimed at the
   end of this GC cycle.

That would be very cool, but, if GC is asynchronous, I don't see how
it can be implemented efficiently, i.e. without requiring a mutex
acquisition inside SCM_NEWCELL.

Regards,

     Neil

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]