PING: V7 [PATCH] sysconf: Add _SC_MINSIGSTKSZ/_SC_SIGSTKSZ [BZ #20305]

Thu Nov 19 19:39:18 GMT 2020

On Thu, Nov 19, 2020 at 05:33:57PM +0000, Szabolcs Nagy via Libc-alpha wrote:
> The 11/19/2020 16:37, Dave Martin wrote:
> > On Thu, Nov 19, 2020 at 02:59:34PM +0000, Szabolcs Nagy via Libc-alpha wrote:
> > > the point is not to restart the critical section
> > > (which would require no side effect or mechanisms
> > > to roll side effects back and that the section is
> > > entirely written in asm between begin/end labels
> > > so the kernel knows when the section is left),
> > > 
> > > but to let the critical section with all its side
> > > effects complete and delay the signal handler
> > > until then. (the slow and easy way to do this is
> > > masking signals using syscalls around critical
> > > sections, a fast solution needs signal wrapping
> > > and saving the sigcontext.)
> > > 
> > > for example the entire malloc call can be a critical
> > > section and an incomming signal delayed until malloc
> > > completes. such solution allows hiding all libc
> > > internal inconsistent state from user code so async
> > > signal handlers can call all libc apis.
> > 
> > Isn't this a bit backwards.  This "makes" trivial signal handlers easier
> > to write, but this is a bit of a Trojan horse: precisely because signal
> > handlers can interrupt things, subtleties abound.  So, while there are
> > plenty of naive signal handlers out there, there are far fewer that are
> > genuinely trivial -- i.e., free from subtleties.
> > 
> > In any case, the problem of async signal safety remains: even if libc
> > uses internal locks to hide it, library functions in general may not.
> > 
> > A better approach would be to have function attributes that identify
> > code that may run in signal context and async-signal-safe functions, so
> > that the compiler can actually enforce that only reentrant functions are
> > called from signal context.
> 
> this does not solve any of the real problems with
> exposed libc internal state to signal handlers:
> 
> - linux has serious bugs that libc has to work around
> (e.g. missing abort syscall: abort and execve must be
> as-safe but abort must change SIGABRT handling that
> must not be inherited by execve and execve cannot
> block signals because that must not be inherited either.)

I can see the argument here, but this feels more like an implementation
detail of the execve() wrapper and libc's abort() implementation (?)

I guess this is analogous to spin_lock_irq() in the kernel: you take the
lock, but an irq that may be handled in the same (hardware) thread must
not attempt to take the same lock, since the spinlock code is not itself
reentrant for a given lock.  So, in the main thread you must also mask
IRQs before taking the lock.

> - signal handler can arbitrarily delay another thread
> (because it may happen when libc internal lock is
> held and user code in signal handlers may not return
> immediately so a handler in one thread can break
> real-time guarantees of another purely because of
> libc internal details that must not be observable).
> 
> - there can be libc operations that must be as-safe
> but internally needs to do non-as-safe operation
> (e.g. tls access must be as-safe but in glibc it
> may allocate or take internal locks.)

Seems reasonable.  I was concerned that these critical sections might be
long, potentially sleeping sequences of code.  It sounds like that's
definitley not the intention, so I guess it's workable.

The need to call sigprocmask() does suck here, and a way to get that
effect purely within userspace would be preferable.

> - compiler can generate non-as-safe libc calls into
> signal handler code (e.g. memcpy is not required to
> be as-safe but compiler can generate it, but a more

But presumably there is an understanding between the compiler and libcs
that it targets that the function called for out-of-line memcpy()s must
be as-safe.  By definition compiler output is not portable -- it assumes
a particular runtime environment.  So the lack of guranteed as-safety
for memcpy() in the standards is not necessarily an issue here.

(I wonder how may memcpy() implementations are really non-as-safe
though.  I guess that could happen if some kind of accelerator were used
for large copies.)

> realistic example is the various sanitizer
> instrumentations that break as-safety because print
> diagnostic messages on failures).

Well, you can do this in an as-safe way.  But it is unfortunate not to
be able to use the usual C functions to do the printing.

I generally assume that the sprintf() family of formats are at least
safe if you're not using locales or custom format specifiers.  I
probably shouldn't though, strictly speaking.

I suspect that unsafely printing diagnostics from handlers for fatal
signals is rather common in the wild, even if the standards say you
mustn't do it.

> - there are existing slow signal mask operations
> around critical sections in libc which can be
> improved by the proposed design.
> 
> > Finally, if a fault signal is delivered while blocked or ignored it
> > kills the process.  So handlers for fault signals raised by the kernel
> > still wouldn't be able to call random libc functions: to prevent sudden
> > death while in the middle of malloc etc., libc must not mask these
> > signals, and wouldn't be safely reentrant while handling then -- thus we
> > still have the problem we intended to solve.
> 
> synchronous faults cannot happen in libc unless
> the caller invoked ub so anything goes.
> 
> (i.e. not delaying those signals is fine)

Debatable.  The whole point of handling a fatal signal is to clean up
mess and/or capture diagnostic data.  By definition the process may be
in a partially unknown state but this point, but best efforts should
still be made to handle the signal -- otherwise handling the signal
doesn't make sense at all.  It doesn't seem right for this to fail
unpredictably depending on where the signal landed.

Cheers
---Dave