Bug 30800

Summary: Improper assert in prune_cache triggers if clock jumps backwards
Product: glibc Reporter: Florian Weimer <fweimer>
Component: nscdAssignee: Florian Weimer <fweimer>
Status: RESOLVED FIXED    
Severity: normal CC: drepper.fsp, fweimer
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.39   
See Also: https://issues.redhat.com/browse/RHEL-1192
Host: Target:
Build: Last reconfirmed:

Description Florian Weimer 2023-08-28 06:12:50 UTC
In struct datahead, there are two fields that control validity of the cache entry, timeout and usable.  These are not updated atomically at the same time, yet in prune_cache, we have this code:

	  /* Check whether the entry timed out.  */
	  if (dh->timeout < now)
	    {
…
	    }
	  else
	    {
	      assert (dh->usable);
	      next_timeout = MIN (next_timeout, dh->timeout);
	    }

This assumes that if an entry has not timed out yet, it is always usable. The precise conditions under which entries become marked as not usable is still a bit mysterious to me. One such source of invalidation is the pass through the cache after start, to re-validate entries which have expired. If the date associated with a key has changed, the old entry is marked as not usable.

I terminated nscd immediately after an entry was marked as unusable and set the clock backwards, so that the timeout on the entry had no longer lapsed (but not by an hour, so that nscd would still consider the cache file valid). This was sufficient to trigger the assertion failure during the next time nscd started. That's why I think we have a real bug here.
Comment 1 Florian Weimer 2023-08-28 07:23:31 UTC
Patch posted:

[PATCH] nscd: Skip unusable entries in first pass in prune_cache (bug 30800)
<https://inbox.sourceware.org/libc-alpha/87o7iry6k6.fsf@oldenburg.str.redhat.com/T/#u>
Comment 2 Florian Weimer 2023-08-29 07:38:43 UTC
Fixed for 2.39 via:

commit c00b984fcd53f679ca2dafcd1aee2c89836e6e73
Author: Florian Weimer <fweimer@redhat.com>
Date:   Tue Aug 29 08:28:31 2023 +0200

    nscd: Skip unusable entries in first pass in prune_cache (bug 30800)
    
    Previously, if an entry was marked unusable for any reason, but had
    not timed out yet, the assert would trigger.
    
    One way to get into such state is if a data change is detected during
    re-validation of an entry.  This causes the entry to be marked as not
    usable.  If exits nscd soon after that, then the clock jumps
    backwards, and nscd restarted, the cache re-validation run after
    startup triggers the removed assert.
    
    The change is more complicated than just the removal of the assert
    because entries marked as not usable should be garbage-collected in
    the second pass.  To make this happen, it is necessary to update some
    book-keeping data.
    
    Reviewed-by: DJ Delorie <dj@redhat.com>