Bug 12889

Summary: Race condition in pthread_kill
Product: glibc Reporter: Rich Felker <bugdal>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: fweimer, ppluzhnikov
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Rich Felker 2011-06-15 00:39:17 UTC
There is a race condition in pthread_kill: it is possible that, between the time pthread_kill reads the pid/tid from the target thread descriptor and the time it makes the tgkill syscall, the target thread terminates and the same tid gets assigned to a new thread in the same process.

(The tgkill syscall was designed to eliminate a similar race condition in tkill, but it only succeeded in eliminating races where the tid gets reused in a different process, and does not help if the same tid gets assigned to a new thread in the same process.)

The only solution I can see is to introduce a mutex that ensures that a thread cannot exit while pthread_kill is being called on it.

Note that in most real-world situations, like almost all race conditions, this one will be extremely rare. To make it measurable, one could exhaust all but 1-2 available pid values, possibly by lowering the max pid parameter in /proc, forcing the same tid to be reused rapidly.
Comment 1 Florian Weimer 2015-10-31 12:08:51 UTC
POSIX says:

“The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.”

How is this to be interpreted?  This way?

  TERMINATED && (CREATED-AS-DETACHED || DETACH-CALLED || JOIN-CALLED)

Or this way?

  (TERMINATED && CREATED-AS-DETACHED) || DETACH-CALLED || JOIN-CALLED

In the second case, pthread_detach and pthread_join could just clear the TID in the thread descriptor to avoid the race, before reaping the TID from the kernel.
Comment 2 Andreas Schwab 2015-10-31 12:37:26 UTC
If the second interpretation were the intented one, then the following paragraph would not have been necessary, since no function could be called on a detached thread.
Comment 3 Rich Felker 2015-10-31 20:27:15 UTC
The first interpretation is correct but it does not matter because there is no such thing as "reaping the tid". The tid is available for reuse immediately when the SYS_exit syscall is made by pthread_exit or equivalent.