TLS and Signals
In general access to TLS is not asynchronous-siginal safe.
There are two potentially dangerous scenarios:
Secnario 1: Non-atomic initialization:
One specific case which causes problems is access to a TLS variable from a signal handling function that is part of a dynamically shared object loaded via dlopen. Such access to TLS variagles must be done via the general dynamic model (GD) and that model supports lazy initialization. If a thread is in the middle of initializing TLS, is interrupted by a signal that uses TLS, then the code in the signal handler may fault since TLS is only partially initialized.
Scenario 2: Asynchronous-signal unsafe functions called during lazy initialization:
Another case appears to be especially easily triggered via SIGPROF profiling:
- There is an existing thread.
- The program does a dlopen of a shared library that uses TLS.
- The shared library installs a signal handler that refers to a TLS variable.
- The signal handler is called on the existing thread.
- Because shared library TLS variables are installed lazily, the existing thread does not yet have a copy of the TLS variable.
- Therefore the signal handlers TLS reference calls malloc.
- If the signal occurred during a call to malloc, which is not asynchronous-signal safe, we have a deadlock.
Upstream discussions:
An original discussion of the issues was started by Paul Pluzhnikov from Google:
https://sourceware.org/ml/libc-alpha/2012-06/msg00335.html
The discussion has been resurrected here:
https://sourceware.org/ml/libc-alpha/2013-09/msg00563.html
Design Considerations
- Discuss some alternative solutions to the problem at hand and why they were rejected.
- Are there any ISO C11 implications?
- ISO C11 wording in 7.14.1.1 p5:
{{{If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t, or the signal handler calls any function in the standard library other than the abort function, the _Exit function, the quick_exit function, or the signal function with the first argument equal to the signal number corresponding to the signal that caused the invocation of the handler. Furthermore, if such a call to the signal function results in a SIG_ERR return, the value of errno is indeterminate.252) }}}
- Will our TLS variables become lock-free atomic objects?
- Post the patches for an implementation and allow time for upstream review and testing on various architectures.
- Provide detailed performance implications of the patches. Preference is to have some data that shows that standard uses of TLS are not negatively impacted.
- Provide changes to the manual to explain that accessing thread local storage is now async-signal safe and will be going forward.
- Consider providing some way to assure that old programs using TLS in a signal handler fail safe. If it can't be assured, explain why not.
- Similarly provide symbol versioning to prevent a new program from being run on an old glibc that doesn't provide AS-safe TLS vars.
- Provide proof that another arriving signal that interrupts the dynamic TLS setup won't cause the setup to fail.
- Provide proof that another thread calling fork that interrupts the dynamic TLS setup won't cause the setup to fail (similar to re-entrancy requirement, but should be considered separately).
- Provide proof that an asynchronous cancellation of the thread in the singal handler doing the dynamic TLS setup won't cause the setup to fail (again similar to re-entrancy ...).