This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Async signal safe TLS accesses


On Thu, Nov 28, 2013 at 11:03:36PM -0200, Alexandre Oliva wrote:
> On Nov 19, 2013, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> 
> > On Wed, Nov 13, 2013 at 7:45 AM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> >> Ping?
> >> https://sourceware.org/ml/libc-alpha/2013-11/msg00183.html
> 
> > Ping? Ping?
> 
> Apologies for the delay in this review.
> 
> I'm a bit disappointed at how wasteful the allocation strategy is,
> wasting a full page per thread even for a one-word TLS dynamic block.  I
> think using a TLS-only malloc arena, only locked with signals disabled,
> would bring us much better memory efficiency, although there might be a
> significant penalty for assigning dynamic segments of different threads
> to the same page, should they happen to be accessed very often.  This
> page sharing by different threads is something we don't necessarily
> avoid with the current AS-Unsafe allocation scheme, and avoiding that is
> an improvement that the current strategy brings, but I wonder if we
> could find some trade off that wouldn't waste so much memory but that
> would keep segments used by different threads in separate pages...  This
> first comment is not meant as a blocker against the page, but as a
> suggestion for future improvement.
>
A proper solution would be make generic malloc async-signal safe. It is 
question of what is acceptable overhead. I can make a wrapper over any
allocator that does that at cost of two tls access. You would pay a mmap
overhead only when you are in signal where we do not care about
effectivity.

As I plan to use a thread-local cache for small allocations which
eliminates most of this cost when one has cheap tls.

I attached a signal-safe part that I have now, it is part of alternative
allocator I now work.

As for freeing a malloc already uses mmap for allocation bigger than
mmap_treshold so it is possible with filling appropriate header use normal free.
> 
> I was under the impression that one of the main problems with making TLS
> AS-Safe was the fact that in some cases of TLS dynamic address
> resolution we'd take the dl lock, and if we take a signal while we hold
> the lock, and the signal handler attempts to resolve the same address
> and take the lock, we'd deadlock if the lock was non-recursive, or risk
> finding inconsistent state otherwise.  The lock is recursive, but I
> don't see anything in the patch besides the CAS loops that avoids
> inconsistent state, particularly in the case of lazy TLS relocation
> (introduced with GNU2 TLS).  Was that not taken into account?
>
In theory, in practice these are much less probable than malloc problem as 
these happen in initialization and it takes time signal handler can
access object that will cause tls access.

A lazy relocation is one of dubious features, did somebody tried to
measure if these actually save anything? I would be interested in ratio
of resolved / symbols left unresolved.

Copying mostly unused tls entries could be a problem but so can be
repeated locks and allocations when these entries are actually used.


> 
> It seems to me that, in the absence of locking,
> _dl_try_allocate_static_tls is racy, and in the presence of locking, the
> CAS loop would be unnecessary.  The race is because offset is computed
> from GL(dl_tls_static_used) once, and if l->map_tls_offset could change
> between that and its update, then so could GL(dl_tls_static_used),
> requiring offset to be recomputed based on the modified value.  But
> GL(dl_tls_static_used) would have to be updated atomically as well,
> otherwise concurrent runs for different maps could end up at the same
> offset.  The code doesn't take such cares, so I wonder if there's
> evidence we don't need to worry about that and, if so, why do we still
> need the CAS loop.  Even if we do hold a lock to prevent other threads
> from changing dl_tls_static_used concurrently, couldn't a signal handler
> that interrupts _dl_try_allocate_static_tls change it? 

See _dl_mask_all_signals (&old);

> If so, the same
> âraceâ applies, otherwise, I don't see why we need even a single CAS,
> let alone a CAS loop.
> 
> 
> I'm somewhat concerned about priority inversion in the atomic_delay()
> and CAS loops.  Considering that the locks the patch drops are all
> recursive, they won't pose an AS-Safety problem; if we keep them in
> place, but take the measures that have apparently been taken to avoid
> the problems of using inconsistent state when a handler interrupts a
> lock-holding operation, and of using outdated state after a signal
> handler modifies it, it should all work, and then it would be clear that
> the CAS loops are only in place for the case of functions being
> interrupted by signal handlers, not to deal with inter-thread
> concurrency.  With the locks in place, we could at least get rid of the
> atomic_delay() loop.
> 
> 
> This is all I got for now.  Thank you both for your work on this issue.
> 
> -- 
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist      Red Hat Brazil Compiler Engineer

-- 

Support staff hung over, send aspirin and come back LATER.

Attachment: signal_safe.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]