15392 – Linux, setns, PID namespaces, fork: assert() about pid inequality hit sporadically

Bug 15392 - Linux, setns, PID namespaces, fork: assert() about pid inequality hit sporadically

Summary: Linux, setns, PID namespaces, fork: assert() about pid inequality hit sporadi...

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	nptl (show other bugs)
Version:	2.18

Importance:	P2 minor
Target Milestone:	2.25
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-04-23 18:16 UTC by Christian Seiler
Modified:	2017-04-20 11:18 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Double unshare CLONE_NEWPID example (311 bytes, text/x-csrc) 2014-11-14 22:58 UTC, Ricky Zhou	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christian Seiler 2013-04-23 18:16:16 UTC

In nptl/sysdeps/unix/sysv/linux/fork.c, as soon as the kernel syscall works, the following assertion is made: (line 141 in master branch)

      assert (THREAD_GETMEM (self, tid) != ppid);

There are two issues with that assertion:

 1. Since ppid is only defined if NDEBUG is not defined, from reading the
    source it appears to be that compiling with -DNDEBUG will never work.
    I didn't test that, just wanted to note that since I'm at it anyway.
    (This also applies to the other assertion in the sibling conditional
    branch)

 2. More importantly, the assertion is not always true. It is nearly always
    true, but there is one specific exception to this, namely in conjunction
    with PID namespaces:

If one wants to enter a pid namespace (Linux kernel 3.8+ or previous kernels with backported patches), one can use the setns(open("/proc/12345/ns/pid", O_RDONLY), CLONE_NEWPID) syscall to attach the current process to that pid namesapce. But the kernel doesn't really move the current process into that pid namespace, setns() will only cause the child processes to actually be in the selected pid namespace.

The common way of handling that is to immediately fork() after entering a pid
namespace. But that allows for the following situation (number are examples):

  Process    PID outside (i.e. in root pid ns)       PID in attached to pid ns
 ------------------------------------------------------------------------------
  parent       42                                       -
  child        108                                      42

In that case, the comparison pid before fork != pid after fork holds true if one compares pids within the same namespace - but the ppid variable is gathered from the outside namespace while the current pid is gathered from the inside namespace, so in the above example they are equal, even though they refer to different processes.

For the parent process, kernel calls to fork()/clone() will return the pid outside of the namespace (108 in this example), so waitpid() etc. work without a problem.

Since PIDs are assigned semi-randomly, this situation is hard to reproduce, attaching to namespaces will work the vast majority of times, but it might fail if the above coincidence happens.

This affects the nsenter utility from util-linux and also the lxc-attach utility from the lxc package - and possibly more. They will work most of the time but in some rare cases they will fail needlessly, stumbling over the assertion in fork().

The obvious solution is just to use clone() after setns() and never use fork() - and one can certainly patch both programs to do so. Nevertheless it would be nice to see if fork() also worked after setns(), especially since there is no inherent reason for it not to.

I see four possible ways to proceed:

 a) Remove the assertion altogether

 b) Also provide a wrapper for setns() that sets a global flag if a PID
    namespace was entered. If so, skip the assertion, otherwise keep it.

 c) assert that
       EITHER old pid and new pid are unequal (current assertion)
       OR getppid() returns 0 (that is the case when setns was called
                               in the parent process before fork())

 d) Say that this is expected behavior and document that after setns()
    one should only do clone() and never fork().

Comment 1 Andreas Schwab 2013-04-23 18:58:37 UTC

If NDEBUG is defined then assert expands to nothing.

Comment 2 Ricky Zhou 2014-11-14 22:58:03 UTC

Created attachment 7938 [details]
Double unshare CLONE_NEWPID example

I've run into this assertion as well. I've attached a small test program that trips it using unshare(CLONE_NEWPID). In this case, the issue is that the parent process has pid 1 inside its pid namespace, but since the child is created in a new pid namespace, it also has pid 1.

Will try to send a patch to get rid of the assert.

Comment 3 Torvald Riegel 2014-11-17 10:37:59 UTC

Synchronization facilities such as some of the PThreads mutexes rely on thread IDs being unique.  I believe we should deal with this first, and drop the assertion afterwards.

Comment 4 Florian Weimer 2017-04-20 11:17:55 UTC

Fixed in glibc 2.25.

Comment 5 Florian Weimer 2017-04-20 11:18:18 UTC

*** Bug 21386 has been marked as a duplicate of this bug. ***