Bug 5381 - nscd: Race condition of mempool_alloc() .. cache_add() and gc()
Summary: nscd: Race condition of mempool_alloc() .. cache_add() and gc()
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: nscd (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
: 9746 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-11-21 04:58 UTC by Petr Baudis
Modified: 2009-02-13 20:36 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Proposed patch (2.03 KB, patch)
2007-11-21 04:58 UTC, Petr Baudis
Details | Diff
proposed patch #2 (1.08 KB, patch)
2008-11-26 13:40 UTC, Petr Baudis
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Baudis 2007-11-21 04:58:36 UTC
nscd does not use sufficient locking, allowing gc() to be run between
mempool_alloc() of data and its addition by cache_add(). Thus, gc() will free
the data again before it could have been added and properly accounted for. The
code should take the read database lock already before the mempool_alloc() call.
Comment 1 Petr Baudis 2007-11-21 04:58:54 UTC
Created attachment 2103 [details]
Proposed patch
Comment 2 Ulrich Drepper 2008-04-19 17:36:07 UTC
Should be fixed in cvs on the trunk.  No with the patch here which, if correct
at all, is a terrible performance burden.
Comment 3 Petr Baudis 2008-11-22 11:08:34 UTC
The current fix is not really working at all, since the mem_in_flight_list
checking in gc() is incomplete. You do mark these areas as used during garbage
collect, but then if you actually hit them, you do not handle the case at all,
since you assume you always hit either hash entry or data.

We cannot move a mem_in_flight_list entry around since we use the pointers in
the cacheadd routines.  Options:

(i) Cover the cacheadd routines with wider-span locks

(ii) Break the shifts on such an entry:

moves->size = off_alloc - off_alloc_start
off_free = off_allocend
disp = 0

I still think the (i) is simpler, but even (ii) should be ok, since typically,
the mem_in_flight entries will be at the top of the garbage-collected area and
thus on the next garbage collect we will shake them down.

I didn't have time to come up with an actual patch yet, I will do that next week
unless you beat me to it (or point out any mistake).
Comment 4 Petr Baudis 2008-11-26 13:40:17 UTC
Created attachment 3078 [details]
proposed patch #2

This is a proposed patch; currently, it seems to work fine - we have one
reported crash in nscd, but I don't think it is related so far.

This is a very simple approach - in theory, there could be a "gap area" near
the top of the database that could grow indefinitely across gc() calls if
_every_ gc() call would happen while there is some memory in flight, but based
on my real-world nscd observations, I don't think this is realistic scenario
even with very busy nscd; if your experience is different, we can go for a more
complex patch that makes mempool_alloc() try to fit data into this gap area.
Comment 5 Petr Baudis 2008-11-26 13:41:15 UTC
Sorry, the date in the changelog entry should be 2008-11-26 instead of -13.
Comment 6 Ulrich Drepper 2008-11-26 16:11:26 UTC
You haven't explained in the slightest where you _think_ there is a problem. 
I'm not even going to look at the patch until you explain the perceived problem
in detail.
Comment 7 Petr Baudis 2008-11-26 21:50:26 UTC
Memory in flight is tracked but not taken into account when relocating data in
gc(). I don't know what do you want me to add to my comment #3.
Comment 8 Petr Baudis 2009-01-16 15:37:27 UTC
*** Bug 9746 has been marked as a duplicate of this bug. ***
Comment 9 Ulrich Drepper 2009-02-13 20:36:52 UTC
I made more changes to the code in cvs.