Bug 20975 - Deferred cancellation triggers in __check_pf and looses lock leading to deadlock.
Summary: Deferred cancellation triggers in __check_pf and looses lock leading to deadl...
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-16 01:48 UTC by Carlos O'Donell
Modified: 2023-05-23 23:47 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos O'Donell 2016-12-16 01:48:54 UTC
In sysdeps/unix/sysv/linux/check_pf.c

The following path can cause deferred cancellation to trigger:

__check_pf -> make_request -> __sendto / __recvmsg.

296 void
297 attribute_hidden
298 __check_pf (bool *seen_ipv4, bool *seen_ipv6,
299             struct in6addrinfo **in6ai, size_t *in6ailen)
300 {
301   *in6ai = NULL;
302   *in6ailen = 0;
303 
304   struct cached_data *olddata = NULL;
305   struct cached_data *data = NULL;
306 
307   __libc_lock_lock (lock);

Once cancellation happens for one thread, the above lock is locked, and deadlocks any other calls to __check_pf.

We need to push a cancellation cleanup handler to unlock the lock.
Comment 1 Florian Weimer 2017-01-25 15:19:31 UTC
Do you have a test case for this?

Netlink is de-facto non-blocking, so we should use system calls which are not cancellation points, without a cancellation handler.
Comment 2 Carlos O'Donell 2017-01-25 20:31:19 UTC
(In reply to Florian Weimer from comment #1)
> Do you have a test case for this?
> 
> Netlink is de-facto non-blocking, so we should use system calls which are
> not cancellation points, without a cancellation handler.

I do not have a test case for this. I reported this bug based on the inspection of the called functions.

There is a related Red Hat bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1405071
Comment 3 Sourceware Commits 2023-04-28 21:55:20 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a443bd3fb233186038b8b483959ecb7978d1abea

commit a443bd3fb233186038b8b483959ecb7978d1abea
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
Comment 4 Sourceware Commits 2023-05-23 18:43:30 UTC
The release/2.37/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f5d377c896b95fefc712b0fd5e5804ae3f48d392

commit f5d377c896b95fefc712b0fd5e5804ae3f48d392
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
    
    (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
Comment 5 Sourceware Commits 2023-05-23 20:10:24 UTC
The release/2.36/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7df9a276563e201fd5680c46f0d8c6f719ce1fc9

commit 7df9a276563e201fd5680c46f0d8c6f719ce1fc9
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
    
    (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
Comment 6 Sourceware Commits 2023-05-23 21:52:52 UTC
The release/2.35/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2b9906f9a0f27c1ffa329f23ae1664bc9925df0f

commit 2b9906f9a0f27c1ffa329f23ae1664bc9925df0f
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
    
    (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
Comment 7 Sourceware Commits 2023-05-23 23:06:14 UTC
The release/2.34/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1cd6626a897304a28dc4e2ca1e303bb5774db6d1

commit 1cd6626a897304a28dc4e2ca1e303bb5774db6d1
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
    
    (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
Comment 8 Sourceware Commits 2023-05-23 23:41:20 UTC
The release/2.33/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=24302748fcf85023fd64630de241973ec17f2dc1

commit 24302748fcf85023fd64630de241973ec17f2dc1
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Apr 27 13:06:15 2023 -0700

    __check_pf: Add a cancellation cleanup handler [BZ #20975]
    
    There are reports for hang in __check_pf:
    
    https://github.com/JoeDog/siege/issues/4
    
    It is reproducible only under specific configurations:
    
    1. Large number of cores (>= 64) and large number of threads (> 3X of
    the number of cores) with long lived socket connection.
    2. Low power (frequency) mode.
    3. Power management is enabled.
    
    While holding lock, __check_pf calls make_request which calls __sendto
    and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
    lock held by __check_pf won't be released and can cause deadlock when
    thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
    cleanup handler for __check_pf to unlock the lock when cancelled by
    another thread.  This fixes BZ #20975 and the siege hang issue.
    
    (cherry picked from commit a443bd3fb233186038b8b483959ecb7978d1abea)
Comment 9 Sourceware Commits 2023-05-23 23:44:28 UTC
The release/2.37/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0e3e9dbb0ea3e0a4885e3dc075cdfe92fc29da66

commit 0e3e9dbb0ea3e0a4885e3dc075cdfe92fc29da66
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue May 23 16:44:01 2023 -0700

    Document BZ #20975 fix
Comment 10 Sourceware Commits 2023-05-23 23:45:20 UTC
The release/2.36/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=93bd77104cdd7380823d80c778776d2eafb69a91

commit 93bd77104cdd7380823d80c778776d2eafb69a91
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue May 23 16:45:03 2023 -0700

    Document BZ #20975 fix
Comment 11 Sourceware Commits 2023-05-23 23:46:16 UTC
The release/2.35/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=739de21d3043f927e93490ee33f9e1b948556f5b

commit 739de21d3043f927e93490ee33f9e1b948556f5b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue May 23 16:46:00 2023 -0700

    Document BZ #20975 fix
Comment 12 Sourceware Commits 2023-05-23 23:47:07 UTC
The release/2.34/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cad3adf4ddeada37912c1c13b59a2ea5dd5d2832

commit cad3adf4ddeada37912c1c13b59a2ea5dd5d2832
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue May 23 16:46:54 2023 -0700

    Document BZ #20975 fix
Comment 13 Sourceware Commits 2023-05-23 23:47:59 UTC
The release/2.33/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=eed27b3e46d0c92eb8bff6b2b5d7059a70996a8b

commit eed27b3e46d0c92eb8bff6b2b5d7059a70996a8b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue May 23 16:47:45 2023 -0700

    Document BZ #20975 fix