In a range of libmicrohttpd versions, up to and including libmicrohttpd-0.9.75-3.fc36.x86_64, debuginfod occasionally crashes with messages like: https://builder.sourceware.org/testrun/920819ee86861130393e12933821c5b544afeee4?filename=tests%2Frun-debuginfod-federation-metrics.sh.log#line1669 Fatal error in GNU libmicrohttpd daemon.c:3831: Failed to remove FD from epoll set. Even without MHD_USE_EPOLL, a nearly identical message can come from a different code path.
Created attachment 14933 [details] Patch for 29696 The debuginfod cache config was using fdopen and then calling both fclose on the file stream & close on the original file descriptor. Since the fd is not dup'ed, this led to a race condition where if that fd was reused (by microhttpd) we'd end up prematurely closing their socket leading to the above issue.
Very nice find. How did you catch this btw? Are there any tools that help find such a "double closes"? If not maybe we can teach valgrind --track-fds=yes about it, which already can track fd leaks, so it shouldn't be too hard to make it also detect double/bad closes. The patch seems obviously correct to me. Nice to now log close () failures, which should help catch similar issues early. Small nitpick. The "}else{" is a bit of a style break with the rest of the code, which would say: ... } else { ...
I noticed that the issue was happening in run-debuginfod-federation-metrics.sh so to reproduce I was playing with sending lots of requests to a federation of servers. Only had the issue occur on the downstream so it was a client issue. This made it pretty quick to replicate, so I could go through debuginfod_query_server and see how far down I can put an early exit before seeing the race condition. That narrowed it down to debuginfod_config_cache. From there noticed the double close and it was smooth sailing. I'm not sure about tooling around the double close but is it possible to know that something is a double close if the fd may just be reused? Since in this case for instance the close won't fail, we're just closing someone else's open, good to go fd. fwiw I looked in elfutils at least and we didn't use fdopen with a double close again.
commit 938a52c22ee915ff2cea813edd5da66bc8184885 Author: Ryan Goldberg <rgoldber@redhat.com> Date: Fri Jun 16 10:20:04 2023 -0400 debuginfod: PR29696: Removed secondary fd close in cache config causing a race condition