Bug 28514 - limit grooming time for many stale files
Summary: limit grooming time for many stale files
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: debuginfod (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-28 19:06 UTC by Frank Ch. Eigler
Modified: 2021-11-05 12:17 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2021-10-28 19:06:55 UTC
When a big debuginfod server starts grooming, and starts finding stale data (archives or files being removed), its self-cleaning efforts can take a long time.  It's been observed to take O(seconds) to do a single sqlite query.  In the metrics, see the sqlite3_milliseconds_count...{"nuke..."} ones.  And the groom() function will check every file for staleness, until interrupted by a SIGUSR1, so that O(50000) stale files could take a whole day.

During all this time, the server can service buildid requests, so it's not that bad, but it cannot scan for new files.

We should investigate whether a more time-bounded groom operation could serve about as well.  We could limit groom to a certain percentage of time, like 1 hr/day, then abort.  (We'd have to traverse the file list in some stateful or random way in order not to just recheck the same ones over and over.)  The post-loop cleanup ops ("nuke orphan buildids" ... end of function) are relatively quick and not worth worrying about at this time.

Alternately, there may be a way to accelerate the individual nuke queries, maybe with more indexes, at the cost of more storage.
Comment 1 Frank Ch. Eigler 2021-11-05 12:17:43 UTC
commit c1e8c8c6b25cb2b5c16553609f19a9ed5dd4e146
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Thu Nov 4 13:08:35 2021 -0400

    PR28514: debuginfod: limit groom operation times