If I start debuginfod without any concurrency limits: [Mon Jan 9 17:40:14 2023] (2356243/2356243): libmicrohttpd error: Failed to create worker inter-thread communication channel: Too many open files My machine has 256 cores, and stracing debuginfod shows that it fails to open more files after creating 510 epoll fds (twice): epoll_create1(EPOLL_CLOEXEC) = 1021 epoll_ctl(1021, EPOLL_CTL_ADD, 3, {events=EPOLLIN, data={u32=4027013664, u64=187651148175904}}) = 0 epoll_ctl(1021, EPOLL_CTL_ADD, 1020, {events=EPOLLIN, data={u32=2965961632, u64=281473647704992}}) = 0 mmap(NULL, 8454144, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xfff6b97b0000 mprotect(0xfff6b97c0000, 8388608, PROT_READ|PROT_WRITE) = 0 rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0 clone(child_stack=0xfff6b9fbea00, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2361982], tls=0xfff6b9fbf880, child_tidptr=0xfff6b9fbf210) = 2361982 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 1022 epoll_create1(EPOLL_CLOEXEC) = 1023 epoll_ctl(1023, EPOLL_CTL_ADD, 3, {events=EPOLLIN, data={u32=4027014456, u64=187651148176696}}) = 0 epoll_ctl(1023, EPOLL_CTL_ADD, 1022, {events=EPOLLIN, data={u32=2965961632, u64=281473647704992}}) = 0 mmap(NULL, 8454144, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xfff6b8fa0000 mprotect(0xfff6b8fb0000, 8388608, PROT_READ|PROT_WRITE) = 0 rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0 clone(child_stack=0xfff6b97aea00, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2361983], tls=0xfff6b97af880, child_tidptr=0xfff6b97af210) = 2361983 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = -1 EMFILE (Too many open files) ulimit -n is 1024, do I really need more just to start debuginfod if I have 256 cores? As the web connections is 2xthreads and it appears to be using two fds per connection, maybe I do. Should the connection pool have a hard limit when using the default? I doubt 512 incoming connections would be usual, and if that is needed then the user can specify -C.
What sets "ulimit -n -> 1000" in your case?
Honestly, no idea. Appears to be the default on ubuntu.
Yes, kernel defaults: 1024 soft, 4096 hard. I *can* change it to 4096 but there's still the point that: 1) debugging the failure case isn't trivial 2) cores*2 threads in the connection pool probably doesn't scale linearly
I assume "debuginfod -C $num -d $num" still works for you, in this battle of distro/site defaults.
Yes. My use case is a test that uses debuginfod, so it works everywhere and as it only has to service a few requests I'm just passing -C2 -c2.
please check out commit 7399e3bd7eb72d045 on elfutils.git for a test patch
Looks good to me!
Pushed to master as dcb40f9caa7ca30 Author: Frank Ch. Eigler <fche@redhat.com> Date: Tue Jan 10 17:59:35 2023 -0500 debuginfod PR29975 & PR29976: decrease default concurrency ... based on rlimit (rlimig -n NUM) ... based on cpu-affinity (taskset -c A,B,C,D ...) Signed-off-by: Frank Ch. Eigler <fche@redhat.com>