This is the mail archive of the
guile@sourceware.cygnus.com
mailing list for the Guile project.
Re: GC and threads (was Re: generational GC)
Jim Blandy writes:
I think what you're describing here is equivalent to allocating new
cells with the mark bit set. The GC won't sweep up such cells until
end of the next GC cycle.
Yes. Thanks - I hadn't realised that.
Here's the way the literature generally discusses this kind of thing.
[...]
Every traversal algorithm divides nodes into three classes:
1) Nodes we've visited and dealt with. Call these "black" nodes.
2) Nodes we've not visited, or seen references to. Call these "white"
nodes. At the end of the traversal, any nodes that are still white
are garbage.
3) Nodes we've found pointers to, but haven't yet visited. Call these
"grey" nodes. For example, if you're using a mark stack, the nodes
on the stack are grey.
I don't understand the distinction between grey and black here. In
what sense can we have found a pointer but not "visited" the
corresponding node?
Is this perhaps an implementation idea to avoid marking loops?
[...]
If you watch the traversal in progress, you'll see a wavefront of grey
cells spreading out from the roots, leaving black cells behind it, and
with white cells ahead of it. By the end of the cycle, any nodes the
grey wavefront never reached are still white, and will be freed.
Nice picture!
Some GC's follow the "tricolor invariant", which states that no black
cell may ever refer directly to a white cell. Black cells may point
only to grey cells, and other black cells. Since the grey cells are
yet to be visited, if you preserve this invariant, you're guaranteed
to visit every reachable white cell.
Ah - so you might say that the point of the grey/black distinction is
that it allows us to state (1) this invariant, and (2) that the
endpoint of the GC mark cycle is when there are no grey cells.
In this terminology, your strategy would be described as "allocate
black." That is, new nodes are assumed to be live, and are left alone
until the next GC cycle begins, at which point everything is colored
white.
The problem with allocating black is that the new cells can't be
collected until the end of the *next* GC cycle, even though they're
the most likely to die quickly.
Right. So the basic tradeoff here is
(memory loss by not collecting the set of cells that are most likely
to have died young)
versus
(performance advantages of doing GC asynchronously on a dedicated
thread rather than "in-line" on program threads).
So it's more efficient to use some kind of heuristic to color them
grey or white.
For example, suppose we mark the global roots first, and leave
scanning the threads' stacks at the very end. When a thread allocates
a new cell, we can color the new cell white if the thread's stack
hasn't been scanned yet, or black if it has. That way, if we can
reach a good portion of the heap through the global roots, we can do
most of our allocation white, and those cells can be reclaimed at the
end of this GC cycle.
That would be very cool, but, if GC is asynchronous, I don't see how
it can be implemented efficiently, i.e. without requiring a mutex
acquisition inside SCM_NEWCELL.
Regards,
Neil