Bug 28240 - debuginfod client cache falsely sticky for root user
Summary: debuginfod client cache falsely sticky for root user
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: debuginfod (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-17 10:25 UTC by Frank Ch. Eigler
Modified: 2023-10-16 15:11 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2021-08-17 10:25:13 UTC
bug #25628 introduced a negative-hit caching facility in the debuginfod client, which represents upstream misses with short-lived permission-000 files in the cache.  These files are used to shortcut repeated queries that are expected to fail, for a limited time.

The logic works for normal users, but breaks for root users.  The problem is that the way the client recognizes a 000 negative-hit file in the cache, vs. a good file, is by looking for -EACCES upon opening the file.  Unfortunately, root users never get -EACCES, even for perm-000 files.  So a 000 file for root is treated as though it was a successful fetch of a 0-length file, and poisons the cache indefinitely.

    745   struct stat st;
    746   time_t cache_miss;
    747   /* Check if the file exists and it's of permission 000*/
    748   if (errno == EACCES
    749       && stat(target_cache_path, &st) == 0
    750       && (st.st_mode & 0777) == 0)

Probably the simplest fix is to ditch the "errno == EACCESS" part of the test, and perform the stat every time.
Comment 1 Frank Ch. Eigler 2021-10-22 16:59:20 UTC
commit 7d64173fb11c66284a408e52d41d15b7755d65d2
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Wed Aug 18 18:29:34 2021 -0400

    PR28240: debuginfod client root-safe negative caching
    
    Negative cache (000-permission) files were incorrectly treated as
    valid cached files for the root user, because root can open even
    000-perm files without -EACCES.  Corrected this checking sequence.
    
    Fixed the debuginfod testsuite to run to completion as root or
    as an ordinary user, correcting corresponding permission checks:
        stat -c %A $FILE
    is right and
        [ -w $FILE]  [ -r $FILE ]
    were wrong.
    
    Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Comment 2 Mark Wielaard 2021-10-25 12:35:37 UTC
So this makes most uses for the user "root" correct, but still contains a race condition:

+        /* TOCTOU non-problem: if another task races, puts a working
+           download or a 000 file in its place, unlinking here just
+           means WE will try to download again as uncached. */
         unlink(target_cache_path);
     }
+  
+  /* If the target is already in the cache (known not-000 - PR28240), 
+     then we are done. */
+  int fd = open (target_cache_path, O_RDONLY);
+  if (fd >= 0)
+    {
+      /* Success!!!! */
+      if (path != NULL)
+        *path = strdup(target_cache_path);
+      rc = fd;
+      goto out;
+    }

The problem isn't when WE try to download and/or (re)set the 000 file. The problem is if someone other client races past us after the unlink and puts a 000 file back (because the server still doesn't have it). Then the open will again succeed for us, but the target is 000.
Comment 3 Mark Wielaard 2022-04-24 18:48:30 UTC
We got rid of the zero-permission 000 files with:

commit 8b568fdea8e1baea3d68cc38dec587e4c9219276
Author: Aaron Merey <amerey@redhat.com>
Date:   Fri Apr 8 19:37:11 2022 -0400

    PR29022: 000-permissions files cause problems for backups
    
    000-permission files currently used for negative caching can cause
    permission problems for some backup software and disk usage checkers.
    Fix this by using empty files for negative caching instead.
    
    Also use each empty file's mtime to determine the time since
    last download attempt instead of the cache_miss_s file's mtime.
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=29022
    
    Tested-by: Milian Wolff <mail@milianw.de>
    Signed-off-by: Aaron Merey <amerey@redhat.com>

And I think that also got rid of this race issue.

Any other client that races past us now will either create a new empty file with   
open (target_cache_path, O_CREAT|O_EXCL, DEFFILEMODE) or puts in a new non-empty file using rename (target_cache_tmppath, target_cache_path). Both of which are atomic.

So I think this is resolved now. But would like someone else to double check. These races are tricky.
Comment 4 Mark Wielaard 2023-10-09 16:14:42 UTC
(In reply to Mark Wielaard from comment #3)
> So I think this is resolved now. But would like someone else to double
> check. These races are tricky.

That was 3 months ago. Can we assume this is fixed?
Comment 5 Mark Wielaard 2023-10-16 15:11:14 UTC
(In reply to Mark Wielaard from comment #4)
> (In reply to Mark Wielaard from comment #3)
> > So I think this is resolved now. But would like someone else to double
> > check. These races are tricky.
> 
> That was 3 months ago. Can we assume this is fixed?

Lets assume so.