nscd does not use sufficient locking, allowing gc() to be run between mempool_alloc() of data and its addition by cache_add(). Thus, gc() will free the data again before it could have been added and properly accounted for. The code should take the read database lock already before the mempool_alloc() call.
Created attachment 2103 [details] Proposed patch
Should be fixed in cvs on the trunk. No with the patch here which, if correct at all, is a terrible performance burden.
The current fix is not really working at all, since the mem_in_flight_list checking in gc() is incomplete. You do mark these areas as used during garbage collect, but then if you actually hit them, you do not handle the case at all, since you assume you always hit either hash entry or data. We cannot move a mem_in_flight_list entry around since we use the pointers in the cacheadd routines. Options: (i) Cover the cacheadd routines with wider-span locks (ii) Break the shifts on such an entry: moves->size = off_alloc - off_alloc_start off_free = off_allocend disp = 0 I still think the (i) is simpler, but even (ii) should be ok, since typically, the mem_in_flight entries will be at the top of the garbage-collected area and thus on the next garbage collect we will shake them down. I didn't have time to come up with an actual patch yet, I will do that next week unless you beat me to it (or point out any mistake).
Created attachment 3078 [details] proposed patch #2 This is a proposed patch; currently, it seems to work fine - we have one reported crash in nscd, but I don't think it is related so far. This is a very simple approach - in theory, there could be a "gap area" near the top of the database that could grow indefinitely across gc() calls if _every_ gc() call would happen while there is some memory in flight, but based on my real-world nscd observations, I don't think this is realistic scenario even with very busy nscd; if your experience is different, we can go for a more complex patch that makes mempool_alloc() try to fit data into this gap area.
Sorry, the date in the changelog entry should be 2008-11-26 instead of -13.
You haven't explained in the slightest where you _think_ there is a problem. I'm not even going to look at the patch until you explain the perceived problem in detail.
Memory in flight is tracked but not taken into account when relocating data in gc(). I don't know what do you want me to add to my comment #3.
*** Bug 9746 has been marked as a duplicate of this bug. ***
I made more changes to the code in cvs.