This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 3/3] posix: New Linux posix_spawn{p} implementation
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, libc-alpha at sourceware dot org
- Cc: nd at arm dot com
- Date: Tue, 12 Apr 2016 17:20:56 -0300
- Subject: Re: [PATCH 3/3] posix: New Linux posix_spawn{p} implementation
- Authentication-results: sourceware.org; auth=none
- References: <1456770820-21341-1-git-send-email-adhemerval dot zanella at linaro dot org> <1456770820-21341-4-git-send-email-adhemerval dot zanella at linaro dot org> <570D390D dot 1050708 at arm dot com> <570D3B2D dot 8050505 at linaro dot org>
On 12-04-2016 15:15, Adhemerval Zanella wrote:
>
>
> On 12-04-2016 15:06, Szabolcs Nagy wrote:
>> On 29/02/16 18:33, Adhemerval Zanella wrote:
>>> This patch implements a new posix_spawn{p} implementation for Linux. The main
>>> difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK
>>> flags and a direct allocated stack. The new stack and start function solves
>>> most the vfork limitation (possible parent clobber due stack spilling). The
>>> remaning issue are related to signal handling:
>>>
>>> 1. That no signal handlers must run in child context, to avoid corrupt
>>> parent's state.
>>> 2. Child must synchronize with parent to enforce stack deallocation and
>>> to possible return execv issues.
>>>
>>> The first one is solved by blocking all signals in child, even NPTL-internal
>>> ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation
>>> in parent and a synchronization with using a pipe or waitpid (in case or error).
>>> The pipe has the advantage of allowing the child signal an exec error (checked
>>> with new tst-spawn2 test).
>>>
>>> There is an inherent race condition in pipe2 usage for architectures that do not
>>> support the syscall directly. In such cases the a pipe plus fctnl is used
>>> instead and it may lead to file descriptor leak in parent (as decribed by fcntl
>>> documentation).
>>>
>>> The child process stack is allocate with a mmap with MAP_STACK flag using
>>> default architecture stack size. Although it is slower than use a stack buffer
>>> from parent, it allows some slack for the compatibility code to run scripts
>>> with no shebang (which may use a buffer with size depending of argument list
>>> count).
>>>
>>> Performance should be similar to the vfork default posix implementation and
>>> way faster than fork path (vfork on mostly linux ports are basically
>>> clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls
>>> required for the stack allocation/deallocation.
>>>
>>> It fixes BZ#10354, BZ#14750, and BZ#18433.
>>>
>>> Tested on i386, x86_64, powerpc64le, and aarch64.
>>>
>>
>> on aarch64 this caused
>>
>> FAIL: nptl/tst-exec1
>>
>> with error msg "join in thread in parent returned!?"
>>
>> i think the cause is that glibc clone sets
>> tcb.tid = tcb.pid = -1 if CLONE_VM|CLONE_VFORK
>> is used, which means the vfork child clobbers
>> the parent tcb.tid. (and pthread_join tests
>> tcb.tid and thus fails in tst-exec1.)
>> i believe all targets are affected.
>
> I think the problem is glibc's vfork only changes the TCB's pid fields
> while fork changes both pid and tid in case on CLONE_VM. Now I am trying
> to found out why exactly this difference handling of internal fields...
In fact I think clone.S should not reset the tid member in TCB for
fork(CLONE_VM|...) and it is what vfork does internally. It is
used on INVALID_TD_P and INVALID_NOT_TERMINATED_TD_P macros used on
pthread_join and pthread_cancel. A value equal to -1 will issue
an error for such case, which mean that if a multithread program
issues a clone(CLONE_VM|...) from a posix_spawn, a subsequent
pthread_join might fail due invalid tid value.
And I think it is what is happening in your system on tst-exec1:
although pthread_create is issue before posix_spawn, I think
posix_spawn clone might be executing before the pthread_join
in some cases leading to this issue.
I will propose a patch to modify clone to not reset the tid value.