This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [BZ #15392] Remove fork child pid assertion

On 11/18/2014 08:51 PM, Ricky Zhou wrote:
> Ah, I see, I didn't know about process-shared mutexes - those
> definitely won't work across PID namespaces (and this should
> definitely be documented). Do you know of any other places in glibc
> where we may be assuming that PIDs are unique between processes that
> may be in different PID namespaces? I think it'd be useful to have a
> list of these.

PID uniqueness is a requirement for POSIX, everything can rely on it.

Documenting the functions which depend on it will lock down the
implementation's flexibility.

The uniqueness of the PID continues to be true *before* you enter
the namespace and then *after* you enter the namespace. The only
problem is during the transition for the child.

We should clearly define *how* you transition and limit that transition
to a simple set of APIs.

> I'm a little bit confused about why we'd want to disallow fork after
> setns/unshare (since this is likely to be a common pattern that's
> already in use). I'm not super familiar with glibc internals/APIs, but
> are there many other instances where we rely on the uniqueness of PIDs
> between two different processes apart from process-shared
> mutexes/condvars/barriers? Would it be possible to document the
> specific APIs that won't work across PID namespaces instead of
> forbidding fork after setns/unshare with CLONE_NEWPID?

You are asking to support a POSIX API with a set of semantics that
violate the contract of that API. I would like to avoid this.

Using clone is fine, since clone has none of the POSIX requirements
that fork, or vfork have.

I would like to see one well documented way to transition to the child
safely after which you should be able to continue calling libc functions
including fork and vfork.

Today, the easiest solution is:
"After setns/unshare you must clone followed immediately by exec"

The exec starts the process over again in the new namespace and everything
works fine. The problem is that this is resource intensive and not what
users want.

> By the way, I am also interested in improving or extending the clone
> API as well (let me know if you'd rather split this into a separate
> thread). Two ideas that would solve our issues with the current clone
> wrapper:

Then let us go in this direction, make clone better, and recommend
it as the only way to create a new process after setns/unshare.

> 1) Provide an interface to reset the PID cache (this would allow us to
> use the syscall directly).

Why do you need this if you have a version of clone that resets it for you?

> 2) Provide an alternate fork-like version of the clone wrapper. This
> version would not take a child_stack, and would enforce that CLONE_VM
> is not set. Aside from this, it would invalidate the PID cache and
> perform the syscall.

I think a new fork-like clone wrapper should be acceptable, we'll have
to iterate on the design a bit, but I don't object to the idea.

We could document it as being the one safe way to create a new process
in the new namespace.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]