Bug 3270

Summary: Setuid implementation has races and lockups
Product: glibc Reporter: Daniel Jacobowitz <drow>
Component: nptlAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED FIXED    
Severity: normal CC: fweimer, glibc-bugs, null, pasky, samandbernie, wade.colson
Priority: P2 Flags: fweimer: security+
Version: 2.4   
Target Milestone: ---   
Host: x86_64-pc-linux-gnu Target:
Build: Last reconfirmed:
Attachments: Testcase.
Patch.
Updated patch

Description Daniel Jacobowitz 2006-09-27 15:37:10 UTC
I discovered a problem with the existing code for __nptl_setxid.  It can set the
setxid bit in cancelhandling for a thread, and then fail to send it a signal,
leading to a lockup in start_thread during thread exit.  This can happen when
the thread's stack has been allocated (under stack_cache_lock) but the thread
has not yet been created, so TID is not set in the thread descriptor.

Similarly, __nptl_setxid can miss a thread being created just before its parent
is signalled, leaving that thread with the wrong UID.  There were also minor
problems, e.g. setxid_futex was never reset so the exit behavior was different
if the thread had experienced at least one prior setxid event during its lifetime.

I'll attach a patch and testcase.
Comment 1 Daniel Jacobowitz 2006-09-27 15:38:34 UTC
Created attachment 1329 [details]
Testcase.

This test illustrates the problem, but not reliably.  I have to run about
twenty copies of it in parallel; some of them will exit after 3000 iterations,
others will remain blocked with one thread in pthread_join.
Comment 2 Daniel Jacobowitz 2006-09-27 15:41:36 UTC
Created attachment 1330 [details]
Patch.

This patch fixes the problem; testsuite run on x86_64-pc-linux-gnu, no
regressions.  It makes the setuid path slightly slower but has no effect on the
non-setuid path, unlike my earlier attempts.

An earlier version of this patch with more assertions triggered this kernel
bug:
  http://bugzilla.kernel.org/show_bug.cgi?id=7210

A fix to that is not necessary for this version of the patch, but I recommend
it anyway.
Comment 3 Peter Watkins 2008-02-12 22:33:36 UTC
We seem to have hit this problem on our large cluster -- when we run 5500 jobs
of "seq 10" without this patch, our slurm process manager hangs. Just adding
this patch to glibc with no other changes, and 200 runs of the 5500 parallel
jobs of "seq 10" works OK.

Any chance this patch could be considered for a glibc release?
Comment 4 Vincent Arrat 2008-12-05 14:20:55 UTC
We also encounter this problem with the product we are providing.
I would want to know if this issue is now fixed.
And if yes, the glibc level in which the fix has been added.
We have customers using our product on Linux platforms with a glibc level
containing this issue.
Thank you very much.
Comment 5 Andreas Schwab 2009-10-29 16:53:32 UTC
*** Bug 10184 has been marked as a duplicate of this bug. ***
Comment 6 Andreas Schwab 2009-10-29 16:55:09 UTC
Created attachment 4339 [details]
Updated patch
Comment 7 Ulrich Drepper 2009-10-30 08:01:25 UTC
I've applied the patch.  I don't like it but it can be changed later.
Comment 8 Wade Colson 2014-04-15 23:01:08 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/chat-with-strangers
Marked for reference. Resolved as fixed @bugzilla.