This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] posix: Optimize Linux posix_spawn

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Florian Weimer <fweimer at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Tue, 7 May 2019 14:12:06 -0300
Subject: Re: [PATCH] posix: Optimize Linux posix_spawn
References: <20190506193119.20505-1-adhemerval.zanella@linaro.org> <87ftpq8y0v.fsf@oldenburg2.str.redhat.com> <8091f7bd-51a9-567f-5047-a333b4ed647b@linaro.org> <877eb25l6q.fsf@oldenburg2.str.redhat.com>


On 07/05/2019 12:38, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 07/05/2019 05:33, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> The current internal posix_spawn symbol for Linux (__spawni) requires
>>>> to allocate a dynamic stack based on input arguments to handle the
>>>> SPAWN_XFLAGS_USE_PATH internal flag, which re-issue the input binary
>>>> as a shell script if execve call return ENOEXEC (to execute shell
>>>> scripts with an initial shebang).
>>>>
>>>> This is done only for compatibility mode and the generic case does not
>>>> require the extra calculation plus the potential large mmap/munmap
>>>> call.  For default case, a pre-defined buffer is sufficed to use on the
>>>> clone call instead.
>>>>
>>>> This patch optimizes Linux spawni by allocating a dynamic stack only
>>>> for compatibility symbol (SPAWN_XFLAGS_USE_PATH).  For generic case,
>>>> a stack allocated buffer is used instead.
>>>
>>> I'm still not convinced that stack switching is required here.
>>
>> - Performance-wise it would be just a stack increment (which tends to
>>   better than a mmap/unmap), 
>>
>> - It also should work with any stack hardening (--enable-stack-protector=all
>>   shows no issue) and the stack uses uses the calling process stack as well
>>   (which should trigger overflow due guard pages).
> 
> No, since there is no guard page as part of the on-stack array, we do
> not detect any overflow into other parts of the stack frame if it
> happens to be stored below the frame.  So this still disables some of
> the hardening checks.

Using a stack allocated array is essentially the same as using the same
stack of the caller: stack overflow in created helper thread will trigger
a fault based on the guard page of the calling thread (since stack is
allocated object will be based on caller thread). I give you this is sketchy
standard-wise.

To fully provide stack hardening we will need to use current mmap mechanism
with a guard page as well, maybe by using the allocatestack code for
pthread. However I really doubtful that is really a net gain in this case,
since the semantic is meant that new process is still running in caller
memory context and stack overflows should be checked based on caller
context (i.e, stack overflow should be guarded by the caller guard page).

> 
>>   It also should issue the stackoverflow earlier since it allocated the
>>   stack prior helper process usage.
> 
> That is true, but we won't know if the region is large enough.

That's why I think we should rely on caller stack hardning and thread the
helper process as function call.  The dynamic path for compatibility mode
I still think it would be better served by mmap/unmap (maybe adding a guard
page as well).

> 
>> - We do not enable cancellation, neither its uses PLT resolution, so stack
>>   usage should be limited to around the functions usage (such as the execvpe
>>   provision).
> 
> Hmm.  Right, we should not trigger lazy binding if we applied PLT
> avoidance successfully.  So this is not a problem here.
> 
>> - It does not require a arch-specific way to query the stack pointer 
>>   (__builtin_frame_address (0) should work, but it would require some testing
>>   to see it is safe on all supported architectures and at least for ia64 we
>>   would need to set up a artificial maximum size which might overflow depending of
>>   the stack usage on the call).
> 
> It's also not clear to me how we could compute a safe stack adjustment
> from C code.

The slack code is indeed based on heuristic, as we are currently doing while
adjusting the mmap size value.

> 
>> I am more inclined to last point, CLONE would be a function call and it will
>> itself require some stack allocation in most ABIs. To add a real-safe clone
>> call, we will need to add a variant of clone which uses the current stack
>> pointer on clone call itself, not as an argument. And I really think this
>> is overkill for this change, taking on consideration the little gain for this
>> specific usage and required work.
> 
> I agree that this work should be undertaken as part of a clone_samestack
> or similar implementation.

But do you think it should be a blocker for this change?

Follow-Ups:
- Re: [PATCH] posix: Optimize Linux posix_spawn
  - From: Florian Weimer

References:
- [PATCH] posix: Optimize Linux posix_spawn
  - From: Adhemerval Zanella
- Re: [PATCH] posix: Optimize Linux posix_spawn
  - From: Florian Weimer
- Re: [PATCH] posix: Optimize Linux posix_spawn
  - From: Adhemerval Zanella
- Re: [PATCH] posix: Optimize Linux posix_spawn
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]