In sysdeps/unix/sysv/linux/check_pf.c The following path can cause deferred cancellation to trigger: __check_pf -> make_request -> __sendto / __recvmsg. 296 void 297 attribute_hidden 298 __check_pf (bool *seen_ipv4, bool *seen_ipv6, 299 struct in6addrinfo **in6ai, size_t *in6ailen) 300 { 301 *in6ai = NULL; 302 *in6ailen = 0; 303 304 struct cached_data *olddata = NULL; 305 struct cached_data *data = NULL; 306 307 __libc_lock_lock (lock); Once cancellation happens for one thread, the above lock is locked, and deadlocks any other calls to __check_pf. We need to push a cancellation cleanup handler to unlock the lock.
Do you have a test case for this? Netlink is de-facto non-blocking, so we should use system calls which are not cancellation points, without a cancellation handler.
(In reply to Florian Weimer from comment #1) > Do you have a test case for this? > > Netlink is de-facto non-blocking, so we should use system calls which are > not cancellation points, without a cancellation handler. I do not have a test case for this. I reported this bug based on the inspection of the called functions. There is a related Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1405071
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a443bd3fb233186038b8b483959ecb7978d1abea commit a443bd3fb233186038b8b483959ecb7978d1abea Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue.
The release/2.37/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f5d377c896b95fefc712b0fd5e5804ae3f48d392 commit f5d377c896b95fefc712b0fd5e5804ae3f48d392 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
The release/2.36/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7df9a276563e201fd5680c46f0d8c6f719ce1fc9 commit 7df9a276563e201fd5680c46f0d8c6f719ce1fc9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
The release/2.35/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2b9906f9a0f27c1ffa329f23ae1664bc9925df0f commit 2b9906f9a0f27c1ffa329f23ae1664bc9925df0f Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
The release/2.34/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1cd6626a897304a28dc4e2ca1e303bb5774db6d1 commit 1cd6626a897304a28dc4e2ca1e303bb5774db6d1 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
The release/2.33/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=24302748fcf85023fd64630de241973ec17f2dc1 commit 24302748fcf85023fd64630de241973ec17f2dc1 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Apr 27 13:06:15 2023 -0700 __check_pf: Add a cancellation cleanup handler [BZ #20975] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
The release/2.37/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0e3e9dbb0ea3e0a4885e3dc075cdfe92fc29da66 commit 0e3e9dbb0ea3e0a4885e3dc075cdfe92fc29da66 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue May 23 16:44:01 2023 -0700 Document BZ #20975 fix
The release/2.36/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=93bd77104cdd7380823d80c778776d2eafb69a91 commit 93bd77104cdd7380823d80c778776d2eafb69a91 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue May 23 16:45:03 2023 -0700 Document BZ #20975 fix
The release/2.35/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=739de21d3043f927e93490ee33f9e1b948556f5b commit 739de21d3043f927e93490ee33f9e1b948556f5b Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue May 23 16:46:00 2023 -0700 Document BZ #20975 fix
The release/2.34/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cad3adf4ddeada37912c1c13b59a2ea5dd5d2832 commit cad3adf4ddeada37912c1c13b59a2ea5dd5d2832 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue May 23 16:46:54 2023 -0700 Document BZ #20975 fix
The release/2.33/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=eed27b3e46d0c92eb8bff6b2b5d7059a70996a8b commit eed27b3e46d0c92eb8bff6b2b5d7059a70996a8b Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue May 23 16:47:45 2023 -0700 Document BZ #20975 fix