Cache cleanup sometimes leaves behind empty directories - lots of them. Should make a greater effort to rmdir the bad boys, in order to prune the directory size.
All right Frank, do you speak about debuginfod daemon cache or about debuginfod-find cache (/home/marxin/.cache/debuginfod_client/ in my case)?
For the latter, I really see empty folders left.
The code container removal code:
/* Remove if empty. */
(void) rmdir (f->fts_path);
but it's unrachable as the following continue:
if (regexec (&re, f->fts_path, 0, NULL, 0) != 0)
(gdb) p f->fts_path
$4 = 0x425350 "/home/marxin/.cache/debuginfod_client/e0d3f78f324d5aeaa35e6e651ac24f9ac52ccdb7"
Could we simply always try to remove the parent dir if the file pattern matches and we do an unlink?
if (time(NULL) - f->fts_statp->st_atime >= max_unused_age)
/* Remove if now empty. */
(void) rmdir (dirname (f->fts_path));
Yeah, a speculative rmdir is probably just fine.
Martin, there is no difference between a pure client cache, and a debuginfod server's own federation cache. They are both exactly the same thing.
Question is if this can ever cause a race with the creation of a new cache file. Can there be multiple clients trying to create a hex-build-id dir while putting a file in it, that is simultaneously deleted because of max-age plus then the parent dir getting deleted while some other client is testing if the dir exists and/or putting a tmp file there?
Yeah, I suppose there is that race possibility.
One way to fix it is to use something like file locks, such as a flock(2) on a designated file such as $CACHE/cache_clean_interval_s. During aging/cleanup, hold a LOCK_EX. During normal operation (creation of build-id subdirectories & files), hold a LOCK_SH to permit multiple clients to work independently of one another.
(In reply to Frank Ch. Eigler from comment #5)
> Yeah, I suppose there is that race possibility.
> One way to fix it is to use something like file locks, such as a flock(2) on
> a designated file such as $CACHE/cache_clean_interval_s. During
> aging/cleanup, hold a LOCK_EX. During normal operation (creation of
> build-id subdirectories & files), hold a LOCK_SH to permit multiple clients
> to work independently of one another.
That is somewhat unfortunate given that the current scheme is lock-free. I wonder if there is some way around it using temp names for the directories, like we do for the actual file names, so that we can atomically rename them in-place.
> I wonder if there is some way around it using temp names for the directories,
> like we do for the actual file names, so that we can atomically rename them
I couldn't think of one without other drawbacks. There can be races between
two clients downloading info for the same buildid, thus creating two temp dirs. One of them would not be able to rename to the final name.
Another simplish approach would be to check the directory mtime before rmdir,
to ensure it's old, plus adding a retry loop at cache-insertion, in case a
client loses the maxage_s race with a cleanup. But it would only lose it
once (due to the mtime check).
Suggest implementation during cache cleanup,
- post-order traversal (files aged/deleted first), fstat buildid directories
- if mtime age is greater than maxage_s, attempt one rmdir
(that rmdir will fail for nonempty directories, which is fine)
During download, always attempt the mkdir(target_cache_dir), independently of checking whether it already exists (i.e., drop stat(target_cache_dir,...), and ignore the errno==EEXIST case as normal.
Author: Frank Ch. Eigler <email@example.com>
Date: Mon Apr 26 09:58:45 2021 -0400
PR26125: debuginfod client cache - rmdir harder