28514 – limit grooming time for many stale files

Bug 28514 - limit grooming time for many stale files

Summary: limit grooming time for many stale files

Status:	RESOLVED FIXED

Alias:	None

Product:	elfutils
Classification:	Unclassified
Component:	debuginfod (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-10-28 19:06 UTC by Frank Ch. Eigler
Modified:	2021-11-05 12:17 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Frank Ch. Eigler 2021-10-28 19:06:55 UTC

When a big debuginfod server starts grooming, and starts finding stale data (archives or files being removed), its self-cleaning efforts can take a long time.  It's been observed to take O(seconds) to do a single sqlite query.  In the metrics, see the sqlite3_milliseconds_count...{"nuke..."} ones.  And the groom() function will check every file for staleness, until interrupted by a SIGUSR1, so that O(50000) stale files could take a whole day.

During all this time, the server can service buildid requests, so it's not that bad, but it cannot scan for new files.

We should investigate whether a more time-bounded groom operation could serve about as well.  We could limit groom to a certain percentage of time, like 1 hr/day, then abort.  (We'd have to traverse the file list in some stateful or random way in order not to just recheck the same ones over and over.)  The post-loop cleanup ops ("nuke orphan buildids" ... end of function) are relatively quick and not worth worrying about at this time.

Alternately, there may be a way to accelerate the individual nuke queries, maybe with more indexes, at the cost of more storage.

Comment 1 Frank Ch. Eigler 2021-11-05 12:17:43 UTC

commit c1e8c8c6b25cb2b5c16553609f19a9ed5dd4e146
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Thu Nov 4 13:08:35 2021 -0400

    PR28514: debuginfod: limit groom operation times