This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Async signal safe TLS accesses
- From: Alexandre Oliva <aoliva at redhat dot com>
- To: Paul Pluzhnikov <ppluzhnikov at google dot com>, Andrew Hunter <ahh at google dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Thu, 28 Nov 2013 23:03:36 -0200
- Subject: Re: [PATCH] Async signal safe TLS accesses
- Authentication-results: sourceware.org; auth=none
- References: <1380830518-16721-1-git-send-email-ahh at google dot com> <1382477766-15770-1-git-send-email-ahh at google dot com> <Pine dot LNX dot 4 dot 64 dot 1310222145490 dot 13258 at digraph dot polyomino dot org dot uk> <CALoOobMsO6X86JFD4J7F-EL-J+xOTEOVbzH=6mwrvfCnFvw57Q at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1311052233090 dot 30260 at digraph dot polyomino dot org dot uk> <CALoOobM70-mix+=1zuDnSoK7SRqQChbL=03xBkUcFf1fyS1Mjw at mail dot gmail dot com> <CALoOobP7kdpZCZA0a7MZWCtONu81fW4H_qAWOEfpfvzxJgG_=Q at mail dot gmail dot com> <CALoOobP6rTDosadvLKhHY+deDsU-FtvyO8QX_Y4dZy716e2ATQ at mail dot gmail dot com> <CALoOobOCT-inwMZkzEr+JYT4c8qwpN-EGMPFu_kHQTpc2icj0g at mail dot gmail dot com> <CALoOobPHo7+jG0nfiZp9afC2rArLUMRYZEag21W+78UBTZF=CQ at mail dot gmail dot com>
On Nov 19, 2013, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> On Wed, Nov 13, 2013 at 7:45 AM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
>> Ping?
>> https://sourceware.org/ml/libc-alpha/2013-11/msg00183.html
> Ping? Ping?
Apologies for the delay in this review.
I'm a bit disappointed at how wasteful the allocation strategy is,
wasting a full page per thread even for a one-word TLS dynamic block. I
think using a TLS-only malloc arena, only locked with signals disabled,
would bring us much better memory efficiency, although there might be a
significant penalty for assigning dynamic segments of different threads
to the same page, should they happen to be accessed very often. This
page sharing by different threads is something we don't necessarily
avoid with the current AS-Unsafe allocation scheme, and avoiding that is
an improvement that the current strategy brings, but I wonder if we
could find some trade off that wouldn't waste so much memory but that
would keep segments used by different threads in separate pages... This
first comment is not meant as a blocker against the page, but as a
suggestion for future improvement.
Since __signal_safe_malloc is guaranteed to return just-mmapped memory,
and that is always zeroed out by the kernel, there's no real need for
__signal_safe_calloc to dirty the pages scribbling over them with
userland zeros; they're no different from kernel zeros ;-)
I was under the impression that one of the main problems with making TLS
AS-Safe was the fact that in some cases of TLS dynamic address
resolution we'd take the dl lock, and if we take a signal while we hold
the lock, and the signal handler attempts to resolve the same address
and take the lock, we'd deadlock if the lock was non-recursive, or risk
finding inconsistent state otherwise. The lock is recursive, but I
don't see anything in the patch besides the CAS loops that avoids
inconsistent state, particularly in the case of lazy TLS relocation
(introduced with GNU2 TLS). Was that not taken into account?
It seems to me that, in the absence of locking,
_dl_try_allocate_static_tls is racy, and in the presence of locking, the
CAS loop would be unnecessary. The race is because offset is computed
from GL(dl_tls_static_used) once, and if l->map_tls_offset could change
between that and its update, then so could GL(dl_tls_static_used),
requiring offset to be recomputed based on the modified value. But
GL(dl_tls_static_used) would have to be updated atomically as well,
otherwise concurrent runs for different maps could end up at the same
offset. The code doesn't take such cares, so I wonder if there's
evidence we don't need to worry about that and, if so, why do we still
need the CAS loop. Even if we do hold a lock to prevent other threads
from changing dl_tls_static_used concurrently, couldn't a signal handler
that interrupts _dl_try_allocate_static_tls change it? If so, the same
âraceâ applies, otherwise, I don't see why we need even a single CAS,
let alone a CAS loop.
I'm somewhat concerned about priority inversion in the atomic_delay()
and CAS loops. Considering that the locks the patch drops are all
recursive, they won't pose an AS-Safety problem; if we keep them in
place, but take the measures that have apparently been taken to avoid
the problems of using inconsistent state when a handler interrupts a
lock-holding operation, and of using outdated state after a signal
handler modifies it, it should all work, and then it would be clear that
the CAS loops are only in place for the case of functions being
interrupted by signal handlers, not to deal with inter-thread
concurrency. With the locks in place, we could at least get rid of the
atomic_delay() loop.
This is all I got for now. Thank you both for your work on this issue.
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer