Bug 12889

Summary: Race condition in pthread_kill
Product: glibc Reporter: Rich Felker <bugdal>
Component: nptlAssignee: Florian Weimer <fweimer>
Status: RESOLVED FIXED    
Severity: normal CC: fweimer, ppluzhnikov
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.35   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1994068
https://sourceware.org/bugzilla/show_bug.cgi?id=28361
Host: Target:
Build: Last reconfirmed:

Description Rich Felker 2011-06-15 00:39:17 UTC
There is a race condition in pthread_kill: it is possible that, between the time pthread_kill reads the pid/tid from the target thread descriptor and the time it makes the tgkill syscall, the target thread terminates and the same tid gets assigned to a new thread in the same process.

(The tgkill syscall was designed to eliminate a similar race condition in tkill, but it only succeeded in eliminating races where the tid gets reused in a different process, and does not help if the same tid gets assigned to a new thread in the same process.)

The only solution I can see is to introduce a mutex that ensures that a thread cannot exit while pthread_kill is being called on it.

Note that in most real-world situations, like almost all race conditions, this one will be extremely rare. To make it measurable, one could exhaust all but 1-2 available pid values, possibly by lowering the max pid parameter in /proc, forcing the same tid to be reused rapidly.
Comment 1 Florian Weimer 2015-10-31 12:08:51 UTC
POSIX says:

“The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.”

How is this to be interpreted?  This way?

  TERMINATED && (CREATED-AS-DETACHED || DETACH-CALLED || JOIN-CALLED)

Or this way?

  (TERMINATED && CREATED-AS-DETACHED) || DETACH-CALLED || JOIN-CALLED

In the second case, pthread_detach and pthread_join could just clear the TID in the thread descriptor to avoid the race, before reaping the TID from the kernel.
Comment 2 Andreas Schwab 2015-10-31 12:37:26 UTC
If the second interpretation were the intented one, then the following paragraph would not have been necessary, since no function could be called on a detached thread.
Comment 3 Rich Felker 2015-10-31 20:27:15 UTC
The first interpretation is correct but it does not matter because there is no such thing as "reaping the tid". The tid is available for reuse immediately when the SYS_exit syscall is made by pthread_exit or equivalent.
Comment 4 Florian Weimer 2021-08-17 06:25:53 UTC
*** Bug 19193 has been marked as a duplicate of this bug. ***
Comment 5 Florian Weimer 2021-08-17 12:01:18 UTC
I believe we should fix bug 19193 separately.
Comment 6 Florian Weimer 2021-08-17 13:52:09 UTC
Patches posted: https://sourceware.org/pipermail/libc-alpha/2021-August/130207.html
Comment 7 Sourceware Commits 2021-09-13 10:41:32 UTC
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8af8456004edbab71f8903a60a3cae442cf6fe69

commit 8af8456004edbab71f8903a60a3cae442cf6fe69
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Sep 13 11:06:08 2021 +0200

    nptl: pthread_kill, pthread_cancel should not fail after exit (bug 19193)
    
    This closes one remaining race condition related to bug 12889: if
    the thread already exited on the kernel side, returning ESRCH
    is not correct because that error is reserved for the thread IDs
    (pthread_t values) whose lifetime has ended.  In case of a
    kernel-side exit and a valid thread ID, no signal needs to be sent
    and cancellation does not have an effect, so just return 0.
    
    sysdeps/pthread/tst-kill4.c triggers undefined behavior and is
    removed with this commit.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 8 Florian Weimer 2021-09-13 10:46:03 UTC
Fixed for 2.35 via:

commit 526c3cf11ee9367344b6b15d669e4c3cb461a2be
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Sep 13 11:06:08 2021 +0200

    nptl: Fix race between pthread_kill and thread exit (bug 12889)
    
    A new thread exit lock and flag are introduced.  They are used to
    detect that the thread is about to exit or has exited in
    __pthread_kill_internal, and the signal is not sent in this case.
    
    The test sysdeps/pthread/tst-pthread_cancel-select-loop.c is derived
    from a downstream test originally written by Marek Polacek.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 9 Sourceware Commits 2021-09-13 12:20:05 UTC
The release/2.34/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3abf3bd4edc86fb28c099cc85203cb46a811e0b8

commit 3abf3bd4edc86fb28c099cc85203cb46a811e0b8
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Sep 13 11:06:08 2021 +0200

    nptl: pthread_kill, pthread_cancel should not fail after exit (bug 19193)
    
    This closes one remaining race condition related to bug 12889: if
    the thread already exited on the kernel side, returning ESRCH
    is not correct because that error is reserved for the thread IDs
    (pthread_t values) whose lifetime has ended.  In case of a
    kernel-side exit and a valid thread ID, no signal needs to be sent
    and cancellation does not have an effect, so just return 0.
    
    sysdeps/pthread/tst-kill4.c triggers undefined behavior and is
    removed with this commit.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
    (cherry picked from commit 8af8456004edbab71f8903a60a3cae442cf6fe69)
Comment 10 Sourceware Commits 2021-09-23 08:54:35 UTC
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2849e2f53311b66853cb5159b64cba2bddbfb854

commit 2849e2f53311b66853cb5159b64cba2bddbfb854
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Sep 23 09:55:54 2021 +0200

    nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]
    
    As part of the fix for bug 12889, signals are blocked during
    thread exit, so that application code cannot run on the thread that
    is about to exit.  This would cause problems if the application
    expected signals to be delivered after the signal handler revealed
    the thread to still exist, despite pthread_kill can no longer be used
    to send signals to it.  However, glibc internally uses the SIGSETXID
    signal in a way that is incompatible with signal blocking, due to the
    way the setxid handshake delays thread exit until the setxid operation
    has completed.  With a blocked SIGSETXID, the handshake can never
    complete, causing a deadlock.
    
    As a band-aid, restore the previous handshake protocol by not blocking
    SIGSETXID during thread exit.
    
    The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
    a downstream test by Martin Osvald.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    Tested-by: Carlos O'Donell <carlos@redhat.com>
Comment 11 Sourceware Commits 2021-09-23 09:04:40 UTC
The release/2.34/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=33adeaa3e2b9143c38884bc5aa65ded222ed274e

commit 33adeaa3e2b9143c38884bc5aa65ded222ed274e
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Sep 23 09:55:54 2021 +0200

    nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]
    
    As part of the fix for bug 12889, signals are blocked during
    thread exit, so that application code cannot run on the thread that
    is about to exit.  This would cause problems if the application
    expected signals to be delivered after the signal handler revealed
    the thread to still exist, despite pthread_kill can no longer be used
    to send signals to it.  However, glibc internally uses the SIGSETXID
    signal in a way that is incompatible with signal blocking, due to the
    way the setxid handshake delays thread exit until the setxid operation
    has completed.  With a blocked SIGSETXID, the handshake can never
    complete, causing a deadlock.
    
    As a band-aid, restore the previous handshake protocol by not blocking
    SIGSETXID during thread exit.
    
    The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
    a downstream test by Martin Osvald.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    Tested-by: Carlos O'Donell <carlos@redhat.com>
    (cherry picked from commit 2849e2f53311b66853cb5159b64cba2bddbfb854)