26125 – during debuginfod cache cleanup, try harder to rmdir empty dirs

Bug 26125 - during debuginfod cache cleanup, try harder to rmdir empty dirs

Summary: during debuginfod cache cleanup, try harder to rmdir empty dirs

Status:	RESOLVED FIXED

Alias:	None

Product:	elfutils
Classification:	Unclassified
Component:	debuginfod (show other bugs)
Version:	unspecified

Importance:	P2 critical
Target Milestone:	---
Assignee:	Alice Zhang

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-06-17 01:35 UTC by Frank Ch. Eigler
Modified:	2021-05-01 18:04 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Frank Ch. Eigler 2020-06-17 01:35:41 UTC

Cache cleanup sometimes leaves behind empty directories - lots of them.  Should make a greater effort to rmdir the bad boys, in order to prune the directory size.

Comment 1 Martin Liska 2020-11-27 11:08:36 UTC

All right Frank, do you speak about debuginfod daemon cache or about debuginfod-find cache (/home/marxin/.cache/debuginfod_client/ in my case)?

For the latter, I really see empty folders left.
The code container removal code:

        case FTS_DP:
          /* Remove if empty. */
          (void) rmdir (f->fts_path);
          break;

but it's unrachable as the following continue:

      if (regexec (&re, f->fts_path, 0, NULL, 0) != 0)
        continue;

for e.g.

(gdb) p f->fts_path
$4 = 0x425350 "/home/marxin/.cache/debuginfod_client/e0d3f78f324d5aeaa35e6e651ac24f9ac52ccdb7"

Comment 2 Mark Wielaard 2020-11-27 11:29:35 UTC

Could we simply always try to remove the parent dir if the file pattern matches and we do an unlink?

  case FTS_F:
    if (time(NULL) - f->fts_statp->st_atime >= max_unused_age)
      {
        unlink (f->fts_path);
        /* Remove if now empty. */
        (void) rmdir (dirname (f->fts_path));
      }
    break;

Comment 3 Frank Ch. Eigler 2020-11-27 17:47:04 UTC

Yeah, a speculative rmdir is probably just fine.

Martin, there is no difference between a pure client cache, and a debuginfod server's own federation cache.  They are both exactly the same thing.

Comment 4 Mark Wielaard 2020-11-28 01:32:08 UTC

Full patch:
https://sourceware.org/pipermail/elfutils-devel/2020q4/003198.html

Question is if this can ever cause a race with the creation of a new cache file. Can there be multiple clients trying to create a hex-build-id dir while putting a file in it, that is simultaneously deleted because of max-age plus then the parent dir getting deleted while some other client is testing if the dir exists and/or putting a tmp file there?

Comment 5 Frank Ch. Eigler 2020-11-28 02:04:47 UTC

Yeah, I suppose there is that race possibility.

One way to fix it is to use something like file locks, such as a flock(2) on a designated file such as $CACHE/cache_clean_interval_s.  During aging/cleanup, hold a LOCK_EX.  During normal operation (creation of build-id subdirectories & files), hold a LOCK_SH to permit multiple clients to work independently of one another.

Comment 6 Mark Wielaard 2020-11-30 11:46:07 UTC

(In reply to Frank Ch. Eigler from comment #5)
> Yeah, I suppose there is that race possibility.
> 
> One way to fix it is to use something like file locks, such as a flock(2) on
> a designated file such as $CACHE/cache_clean_interval_s.  During
> aging/cleanup, hold a LOCK_EX.  During normal operation (creation of
> build-id subdirectories & files), hold a LOCK_SH to permit multiple clients
> to work independently of one another.

That is somewhat unfortunate given that the current scheme is lock-free. I wonder if there is some way around it using temp names for the directories, like we do for the actual file names, so that we can atomically rename them in-place.

Comment 7 Frank Ch. Eigler 2020-11-30 12:47:02 UTC

> I wonder if there is some way around it using temp names for the directories,
> like we do for the actual file names, so that we can atomically rename them
> in-place.

I couldn't think of one without other drawbacks.  There can be races between
two clients downloading info for the same buildid, thus creating two temp dirs.  One of them would not be able to rename to the final name.

Another simplish approach would be to check the directory mtime before rmdir,
to ensure it's old, plus adding a retry loop at cache-insertion, in case a
client loses the maxage_s race with a cleanup.  But it would only lose it
once (due to the mtime check).

Comment 8 Frank Ch. Eigler 2021-03-18 17:14:56 UTC

Suggest implementation during cache cleanup,
- post-order traversal (files aged/deleted first), fstat buildid directories
- if mtime age is greater than maxage_s, attempt one rmdir
  (that rmdir will fail for nonempty directories, which is fine)

During download, always attempt the mkdir(target_cache_dir), independently of checking whether it already exists (i.e., drop stat(target_cache_dir,...), and ignore the errno==EEXIST case as normal.

Comment 9 Frank Ch. Eigler 2021-05-01 18:04:05 UTC

commit 95edde45e53fc84ce30449663d9f2145328bb877
D)
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Apr 26 09:58:45 2021 -0400

    PR26125: debuginfod client cache - rmdir harder