This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
I read your most excellent notes on Guile's generational gc. I started thinking about all the issues you raised and came up an outline for a possible implementation. I am not very familiar with most of Guile so there may be some gaping holes in what follows. Guile's non-copying gc mode prevents the traditional technique of associating a generation with a heap segment. Hence, generational information must be recorded somewhere. Adding it to the cell doesn't seem feasible unless the tag structure is radically changed; even then few bits would be available. An obvious thought is to use a vector of bytes (one for each cell) in parallel with each heap segment; subtracting the base of the segment from a pointer and shifting right by log2 of cell size gives an index into the vector. However, finding the segment base for an arbitrary pointer also means doing a lookup in a segment address table which is somewhat clumsy, given that this must be done for every pointer encountered in the mark phase. I think the following scheme would make everything much less painless. (The discussion assumes a 32-bit implementation; there is an obvious 64-bit analog.) Consider a heap segment to be a collection of super-cells each containing space for 8 contiguous cells. The first cell is not used as a cell but as an 8-byte vector, containing a byte for each of the remaining cells (with the final byte open for other use). The byte corresponding to each cell (call it a Cell_Info) will contain the generation number and some other flags discussed below. The super-cells are aligned so that their addresses are a multiple of 0x40; this can always be done by wasting a few bytes in segments that do not begin on a page boundary. Now, given a pointer p to a cell we can derive a pointer to its Cell_Info by: #define SCM_CELL_INFOP(p) ((p) & 0x3f)/8 + ((p) & -0x40) (some casting also required) This code sequence should only take 4 to 6 instructions depending on the processor architecture. So what's in Cell_Info? My thoughts go something like this: struct Cell_Info { int generation:4; int car_newer:1; int car_older:1; int cdr_newer:1; int cdr_older:1; }; struct Super_Cell { Cell_Info cell_info[7]; char any_out; Cell cells[7]; }; The generation field is, of course, the cell's generation number. Four bits is probably overkill; three might be enough, with the fourth bit used for something else, like a pinned flag (because of reference from the stack), but I'm not sure if this is useful to be kept semi-permanently. The car/cdr-newer/older flags are set by the write barrier. The newer or older flag is set when the pointer is to a member of a younger or older generation respectively. As an optimization, we can set a flag in the any_out byte of the super-cell to indicate (at least) one of the cells has a pointer outside its generation. The initial marking phase from the root set now has lots of opportunities for tree pruning. If a pointer goes outside the generation being collected, it need not be followed for additional marking. Unfortunately, a second marking phase must scan the whole heap for pointers into the generation being collected, but we can make effective use of the Cell_Info bits to lighten the work load. (The normal copying collector wins big by not having to do this; this is the price we have to pay for a non-copying collector.) So the second mark phase looks something like: int n = generation_being_collected; Super_Cell *scp = initial_value; while (/* some condition */) { if (scp->any_out) for (i=0; i<7; ++i) { int sign = scp->cell_info[i].generation - n; if (sign < 0) { if (scp->cell_info[i].car_older) { SCM *the_car = SCM_CAR(scp->cells[i]); /* this is safe! */ if (SCM_CELL_INFOP(the_car)->generation == n) mark(the_car); } if (scp->cell_info[i].cdr_older) { SCM *the_cdr = SCM_CDR(scp->cells[i]); if (SCM_CELL_INFOP(the_cdr)->generation == n) mark(the_cdr); } } else if (sign > 0) { if (scp->cell_info[i].car_newer) { SCM *the_car = SCM_CAR(scp->cells[i]); if (SCM_CELL_INFOP(the_car)->generation == n) mark(the_car); } if (scp->cell_info[i].cdr_newer) { SCM *the_cdr = SCM_CDR(scp->cells[i]); if (SCM_CELL_INFOP(the_cdr)->generation == n) mark(the_cdr); } else ; /* cell is in generation being collected; */ /* it's either already marked or will be marked by */ /* being pointed to by another generation or is garbage */ } } ++scp; } Finally, the sweep phase will do the usual work of scanning for cells of the proper generation and unmarking them and either incrementing their generation number or placing them on the free list. Of course, the any_out flags of any super cells with a newly collected cell must be recomputed. After thinking about the issues you raised in your notes, I think that babies should be treated specially. I would propose that they be allocated in an "incubator" rather than in a normal generation heap segment. This incubator would not consist of super-cells but be treated as a vector of cells. Allocation would not require a free list, but just testing for exhaustion of the area and incrementing a pointer by 8. The write barrier for babies would be handled by keeping a list of pairs: the cell addresses that were stored into an older generation cell and the address of that cell. This way a full secondary scan of the heap to find references to babies is not required. The final difference of the incubator is that live cells that are not pinned are copied into the higher generation segments, using the normal copying gc techniques and allocating from the free list (but also updating older generation pointers from the write barrier list). If there are no pinned cells, the whole incubator is ready to be reused as one contiguous allocation area; if there are n pinned cells, then we have (at most) n+1 contiguous areas that can each be allocated sequentially. If we get too many pinned cells, we can just allocate a new incubator, but I think that long-term pinned cells will be fairly rare. The advantages of the incubator are that we have a lot of locality during allocation and during baby collection, which is the most frequent gc activity. Since the incubator will contain mostly garbage (if the generational assumptions hold), the copying is minimal and we can avoid building a free list for the incubator. We may prematurely copy some babies created at the end of the cycle that will soon become garbage. We also might have to allocate a new heap segment if the normal free list is exhausted by the copying. This affects the croaking geezer problem by filling out free cells, but may end up mixing generations too much. As you point out, we can order the free list as desired. Identifying babies complicates the calculation of the Cell_Info pointer (which doesn't exist for babies). The secondary mark phase described above doesn't need to worry about this as elders never have their newer bits set when pointing to babies. The root set trace and the write barrier must do a range check against the incubator bounds to see if a pointer is to a baby. The write barrier by now is non-trivial, but it only needs to be dealt with in set-car! and set-cdr! (and letrec?) which should be rare in truly schemeful programming. We need to determine the generations of the cell being modified and the cell being referenced (if the pointer is to a pair). If the referenced cell is a baby and the modified cell is not a baby, we add the two cell addresses to the incubator list; if both are non-babies and of different generations, we set the car/cdr-newer/older flags as appropriate and the any-out flag. IMHO the heap segment allocation strategy needs to be modified from the current Guile for a generational gc. It starts out by allocating 32k cells and then triples the heap on each further allocation (as long the OS will agree). While selecting the appropriate parameters will require lots of tuning, I would think the growth should be more linear. We are gc-ing the incubator fairly frequently so the normal heap segments should stay more in line with longer-term storage needs. There is no particular reason that the incubator should be the same size as a heap segment; it should be sized so that it can be gc-ed in a human-imperceptible amount of time. If we provide a gc-blocking mode that continues collecting the incubator, and doesn't try to collect older generations, but just allocates new heap segments as necessary, we would have a servicable semi-incremental collector that might satisfy MDJ's needs for smooth GUI performance. This would only work if there were eventually some pause or restart points in applications when normal gc could be resumed or there were very few non-baby pairs surviving. I don't have a strong feeling about scheduling higher generation collections; since we are not copying them, there is no area to fill up to trigger a collection. You suggested a frequency of 1/(n+1), which seems reasonable. Perhaps we need some sort of adaptive scheduling which pays attention to the rates of change of the sizes of various generations. I see two scenarios that most long-running applications will have some mixture of: 1) a substantial data base is built up during a startup phase and then a steady state phase is entered and various numbers of cells are allocated and most soon become garbage, with some minimal growth of the long term data; and 2) the application operates in a cyclical manner, accumulating data for a medium-long time period, after which a great deal of garbage is suddenly produced and the cycle repeats. In the first scenario spending a lot of time in higher generations is relatively wasteful, while it is essential in the second. Probing for which generations are changing size by running gcs shouldn't affect long-running programs too much. If they aren't long-running it doesn't really matter. Well, it seems no meaningful discussion of gc can be short. I hope some of these ideas fit in with your current thinking. Sorry if my presentation is too pedantic, but I have to work things out well enough that they don't seem completely idiotic or wishful. Dale Jordan.