caching failed lookups of debuginfo?

Milian Wolff mail@milianw.de
Fri Apr 8 21:56:15 GMT 2022


On Freitag, 8. April 2022 23:34:06 CEST Aaron Merey wrote:
> Hi Milian,
> 
> On Fri, Apr 8, 2022 at 5:08 PM Milian Wolff <mail@milianw.de> wrote:
> > I can reproduce it now suddenly with debuginfod-find too:
> > 
> > ```
> > $
> > debuginfod-find debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
> > debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
> > server urls "https://debuginfod.archlinux.org/"
> > checking build-id
> > checking cache dir /home/milian/.cache/debuginfod_client
> > using timeout 90
> > init server 0 https://debuginfod.archlinux.org/buildid
> > url 0 https://debuginfod.archlinux.org/buildid/
> > 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo
> > query 1 urls in parallel
> > server response HTTP response code said error
> > url 0 The requested URL returned error: 404
> > not found No such file or directory (err=-2)
> > Server query failed: No such file or directory
> > ```
> > 
> > I do see an empty `/home/milian/.cache/debuginfod_client/
> > 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo` file. But the server
> > is
> > still queried (i.e. rerunning the above command always produces the same
> 
> > output for me). The lookup is costly too at ~70ms overall:
> I am not able to reproduce this on Fedora 35 using commit 8db849976f070.
> 
> The first time I run debuginfod-find I get similar output to you:
> ```
> debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
> server urls "https://debuginfod.archlinux.org/"
> checking build-id
> checking cache dir /home/amerey/.cache/debuginfod_client
> using timeout 90
> init server 0 https://debuginfod.archlinux.org/buildid
> url 0
> https://debuginfod.archlinux.org/buildid/85766e9d8458b16e9c7ce6e07c712c02b8
> 471dbc/debuginfo query 1 urls in parallel
> server response HTTP response code said error
> url 0 The requested URL returned error: 404
> not found No such file or directory (err=-2)
> Server query failed: No such file or directory
> ```
> 
> But when I rerun debuginfod-find I get output indicating that the cache was
> checked:
> ```
> debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
> server urls "https://debuginfod.archlinux.org/"
> checking build-id
> checking cache dir /home/amerey/.cache/debuginfod_client
> not found No such file or directory (err=-2)
> Server query failed: No such file or directory
> ```
> 
> No calls to curl_multi_wait occur. Is there a "cache_miss_s" file in the top
> level of your debuginfod cache? It should contain the number of seconds
> that must elapse after a failed attempt before another query would be
> attempted (default is 600 seconds).

Yes that file exists and it contains 600.

Running through strace, I see that it actually removes the empty file first. 
This looks totally bogus:

```
strace -e file -f debuginfod-find debuginfo 
85766e9d8458b16e9c7ce6e07c712c02b8471dbc |& grep "cache/debuginfod_client"
checking cache dir /home/milian/.cache/debuginfod_client
newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client", 
{st_mode=S_IFDIR|0755, st_size=12288, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
cache_clean_interval_s", {st_mode=S_IFREG|0644, st_size=5, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
cache_clean_interval_s", {st_mode=S_IFREG|0644, st_size=5, ...}, 0) = 0
openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
cache_clean_interval_s", O_RDONLY) = 5
newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", {st_mode=S_IFREG|000, 
st_size=0, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/cache_miss_s", 
{st_mode=S_IFREG|0644, st_size=3, ...}, 0) = 0
openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/cache_miss_s", 
O_RDONLY) = 5
unlink("/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo") = 0
openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", O_RDONLY) = -1 ENOENT (No 
such file or directory)
mkdir("/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc", 0700) = -1 EEXIST (File exists)
openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo.KdSnXl", O_RDWR|O_CREAT|
O_EXCL, 0600) = 5
openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", O_RDONLY|O_CREAT|O_EXCL, 
000) = 7
unlink("/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo.KdSnXl") = 0
rmdir("/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc") = -1 ENOTEMPTY (Directory not 
empty)
```

Note how there's an unlink first followed by an openat of the file that just 
got removed? Then a mkdir for a folder that still exists. Then a tmpfile for 
the download. Then follows the creation of the empty file. And then it tries 
to rmdir the whole folder, which fails, but that still sounds wrong too, no?

Backtraces for the unlink:

```
unlink("/home/milian/.cache/debuginfod_client/
85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo") = 0
 > /usr/lib/libc.so.6(unlink+0xb) [0x103aab]
 > /usr/lib/libdebuginfod-0.186.so(debuginfod_add_http_header+0x1821) [0x3de1]
 > /usr/bin/debuginfod-find() [0x122f]
 > /usr/lib/libc.so.6(__libc_start_call_main+0x80) [0x2d310]
 > /usr/lib/libc.so.6(__libc_start_main+0x81) [0x2d3c1]
 > /usr/bin/debuginfod-find() [0x1495]
```

That is:

```
eu-addr2line -fe /usr/lib/libdebuginfod-0.186.so -a 0x3de1
0x0000000000003de1
debuginfod_query_server
/usr/src/debug/elfutils-0.186/debuginfod/debuginfod-client.c:791:9
```

Which in turn points at the code that does cache cleanup in 
`debuginfod_query_server`. I now used `rr` to record such a bogus run and I 
clearly see that `(time(NULL) - st.st_mtime <= cache_miss)` is false and it 
goes into the unlink case.

I'll try to debug this further now - I definitely do not wait 600s inbetween 
these runs here...

-- 
Milian Wolff
mail@milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://sourceware.org/pipermail/elfutils-devel/attachments/20220408/8f0ee8af/attachment-0001.sig>


More information about the Elfutils-devel mailing list