elfutils is built on various architectures with https://packit.dev/ in https://github.com/evverx/elfutils and since run-debuginfod-webapi-concurrency.sh was added it has been failing more or less consistently on ppc64le and intermittently on the other architectures. The log can be found at https://copr-be.cloud.fedoraproject.org/results/packit/evverx-elfutils-53/fedora-rawhide-ppc64le/03059293-elfutils/builder-live.log.gz. It will expire eventually though but as far as I can see it's reported by buildbot from time to time as well.
Created attachment 13859 [details] full log Just in case, I've just attached the full log.
Thanks for the report. The logs indicate some unexplained glitch within libmicrohttpd (rejecting connections for no explained reason). Maybe the builder is somehow strangely resource constrained? We could make the test less assertive about 100% success of all those parallel curl jobs.
I think they are constrained in the sense that those machines are much slower than usual. On top of that the packages are built in a sandbox environment and that makes them even slower.
Note that packit doesn't use real hardware for various architectures but "container emulation" which causes various testcases to fail. Although in this case it seems it is overloading the host. Maybe we can tune down the number of concurrent request tested, see also: https://sourceware.org/pipermail/elfutils-devel/2021q4/thread.html#4488 If you have a better lower/upper bound or a way to test the limits of the machine. We do have somewhat better buildbot workers for various architectures here: https://builder.wildebeest.org/buildbot/#/builders?tags=elfutils
(In reply to Mark Wielaard from comment #4) > Note that packit doesn't use real hardware for various architectures but > "container emulation" which causes various testcases to fail. > > Although in this case it seems it is overloading the host. [...] Is there some way of finding out the host's actual limits? Can we detect that we're running in an unusually constricted environment and skip this test? ulimit -u?
> Note that packit doesn't use real hardware for various architectures but > "container emulation" which causes various testcases to fail. > I think I ran into issues like that in https://github.com/evverx/elfutils/issues/32 and https://github.com/evverx/elfutils/issues/31. I ignore them for the most part. Though it would be great if they could be skipped there. Some of them seem to be easy to skip because they seem to trigger seccomp filters of some kind but I'm not sure about the rest. > Although in this case it seems it is overloading the host. Maybe we can tune > down the number of concurrent request tested, see also: > https://sourceware.org/pipermail/elfutils-devel/2021q4/thread.html#4488 > If you have a better lower/upper bound or a way to test the limits of the > machine. > Thanks for the link. I'll take a look. > We do have somewhat better buildbot workers for various architectures here: > https://builder.wildebeest.org/buildbot/#/builders?tags=elfutils As far as I understand the tests are run there on commits to the elfutils repository but I'm not sure how to test "PRs" there. If it was possible to use it before commits are merged into the master branch I wouldn't have started using Packit on GitHub probably. > Is there some way of finding out the host's actual limits? Can we detect that > we're running in an unusually constricted environment and skip this test > ulimit -u? I think I can run almost anything there but since I'm not familiar with the test I'm not sure what I should look for.
This test creates up to 100+few threads in debuginfod, and also 100 concurrent curl processes to talk to debuginfod.
(In reply to Evgeny Vereshchagin from comment #7) > > Note that packit doesn't use real hardware for various architectures but > > "container emulation" which causes various testcases to fail. > > > I think I ran into issues like that in > https://github.com/evverx/elfutils/issues/32 and > https://github.com/evverx/elfutils/issues/31. I ignore them for the most > part. Though it would be great if they could be skipped there. Some of them > seem to be easy to skip because they seem to trigger seccomp filters of some > kind but I'm not sure about the rest. Easiest is to run containers with --security-opt seccomp=unconfined to make sure seccomp doesn't arbitrarily blocks syscalls (or worse returns ENOPERM instead on ENOSYS). > > We do have somewhat better buildbot workers for various architectures here: > > https://builder.wildebeest.org/buildbot/#/builders?tags=elfutils > > As far as I understand the tests are run there on commits to the elfutils > repository but I'm not sure how to test "PRs" there. If it was possible to > use it before commits are merged into the master branch I wouldn't have > started using Packit on GitHub probably. There is a vacation and a nationwide lockdown coming up so I can see what I can do. I hope to connect the buildbot with patchworks so that you can easily test any submitted patch before committing.
(In reply to Mark Wielaard from comment #9) > (In reply to Evgeny Vereshchagin from comment #7) > > > Note that packit doesn't use real hardware for various architectures but > > > "container emulation" which causes various testcases to fail. > > > > > I think I ran into issues like that in > > https://github.com/evverx/elfutils/issues/32 and > > https://github.com/evverx/elfutils/issues/31. I ignore them for the most > > part. Though it would be great if they could be skipped there. Some of them > > seem to be easy to skip because they seem to trigger seccomp filters of some > > kind but I'm not sure about the rest. > > Easiest is to run containers with --security-opt seccomp=unconfined to make > sure seccomp doesn't arbitrarily blocks syscalls (or worse returns ENOPERM > instead on ENOSYS). > Those containers are launched by Packit (or, more precisely, by mock) so I can't control how they are run. According to systemd-detect --virt those are nspawn containers and I'm 50% sure those failures are caused by a bug in either systemd-nspawn or libseccomp. In the meantime, I added a couple of bash commands that show whether the test hit its "pid" limit set by either systemd on the host or systemd-nspawn (or both). pid.max is unfortunately set to "max" there so it isn't obvious how many tasks can be run there at the same time.
OK some findings, when a similar sounding problem intermittently occurred on an s390x VM. It seems that we were expecting too much of libmicrohttpd. When it offers a thread-pool (which we trigger in debuginfod via the -Cnnn option), it splits a hypothetical concurrent-connection limit amongst all those threads. When a new connection comes in, it seems to be just luck as to which thread gets woken up. And if that thread has some active connections still (such as from previous transmission operations that were enqueued previously and still in progress), then the new connection may go over its private daemon->connection_limit and fail. (At the same time, many threads may exist with much larger available connection limits, but they are not consulted.) This is probably why Mark's experimental MHD_OPTION_CONNECTION_LIMIT set helped (1000ish->4000ish), because then dividing all those limits among the 100ish threads leaves 40 each to work from rather than 10. Investigating some microhttpd modes/options that may trigger more favourable behaviour. But if nothing appears, we may just need to turn down the tight expectations of this test case.
FWIW with https://sourceware.org/git/?p=elfutils.git;a=commit;h=e646e363e72e06e0ed5574c929236d815ddcbbaf applied the test appears to be flaky on Packit on s390x: https://copr-be.cloud.fedoraproject.org/results/packit/evverx-elfutils-73/fedora-35-s390x/03942110-elfutils/builder-live.log.gz
(In reply to Evgeny Vereshchagin from comment #12) > FWIW with > https://sourceware.org/git/?p=elfutils.git;a=commit; > h=e646e363e72e06e0ed5574c929236d815ddcbbaf applied the test appears to be > flaky on Packit on s390x: > https://copr-be.cloud.fedoraproject.org/results/packit/evverx-elfutils-73/ > fedora-35-s390x/03942110-elfutils/builder-live.log.gz So that log contains the feared: error_count{libmicrohttpd="Server reached connection limit. Closing inbound connection.\n"} 35 And sadly I have also been able to replicate that on another s390x setup even with all the latest patches. The thing they seem to have in common is that they are both s390x and have only 2 cores. If I lower the -C100 to -C32 in run-debuginfod-webapi-concurrency.sh it does seem to always pass. But with -C50 or higher is does occasionally fail (the higher to more frequent it fails). BTW. run-debuginfod-webapi-concurrency.sh seems stable on any other system I've thrown it at. So it isn't exactly clear what "such a system" is? Is it s390x specific?
commit 3bcf887340fd47d0d8a3671cc45abe2989d1fd6c Author: Mark Wielaard <mark@klomp.org> Date: Sun Apr 24 12:16:58 2022 +0200 debuginfod: Use MHD_USE_ITC in MHD_start_daemon flags This prevents the "Server reached connection limit. Closing inbound connection." issue we have been seeing in the run-debuginfod-webapi-concurrency.sh testcase. From the manual: If the connection limit is reached, MHD’s behavior depends a bit on other options. If MHD_USE_ITC was given, MHD will stop accepting connections on the listen socket. This will cause the operating system to queue connections (up to the listen() limit) above the connection limit. Those connections will be held until MHD is done processing at least one of the active connections. If MHD_USE_ITC is not set, then MHD will continue to accept() and immediately close() these connections. https://sourceware.org/bugzilla/show_bug.cgi?id=28708 Signed-off-by: Mark Wielaard <mark@klomp.org>