This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Async signal safe TLS accesses

On Thu, Nov 28, 2013 at 11:03:36PM -0200, Alexandre Oliva wrote:
> On Nov 19, 2013, Paul Pluzhnikov <> wrote:
> > On Wed, Nov 13, 2013 at 7:45 AM, Paul Pluzhnikov <> wrote:
> >> Ping?
> >>
> > Ping? Ping?
> Apologies for the delay in this review.
> I'm a bit disappointed at how wasteful the allocation strategy is,
> wasting a full page per thread even for a one-word TLS dynamic block.  I
> think using a TLS-only malloc arena, only locked with signals disabled,
> would bring us much better memory efficiency, although there might be a
> significant penalty for assigning dynamic segments of different threads
> to the same page, should they happen to be accessed very often.  This
> page sharing by different threads is something we don't necessarily
> avoid with the current AS-Unsafe allocation scheme, and avoiding that is
> an improvement that the current strategy brings, but I wonder if we
> could find some trade off that wouldn't waste so much memory but that
> would keep segments used by different threads in separate pages...  This
> first comment is not meant as a blocker against the page, but as a
> suggestion for future improvement.
A proper solution would be make generic malloc async-signal safe. It is 
question of what is acceptable overhead. I can make a wrapper over any
allocator that does that at cost of two tls access. You would pay a mmap
overhead only when you are in signal where we do not care about

As I plan to use a thread-local cache for small allocations which
eliminates most of this cost when one has cheap tls.

I attached a signal-safe part that I have now, it is part of alternative
allocator I now work.

As for freeing a malloc already uses mmap for allocation bigger than
mmap_treshold so it is possible with filling appropriate header use normal free.
> I was under the impression that one of the main problems with making TLS
> AS-Safe was the fact that in some cases of TLS dynamic address
> resolution we'd take the dl lock, and if we take a signal while we hold
> the lock, and the signal handler attempts to resolve the same address
> and take the lock, we'd deadlock if the lock was non-recursive, or risk
> finding inconsistent state otherwise.  The lock is recursive, but I
> don't see anything in the patch besides the CAS loops that avoids
> inconsistent state, particularly in the case of lazy TLS relocation
> (introduced with GNU2 TLS).  Was that not taken into account?
In theory, in practice these are much less probable than malloc problem as 
these happen in initialization and it takes time signal handler can
access object that will cause tls access.

A lazy relocation is one of dubious features, did somebody tried to
measure if these actually save anything? I would be interested in ratio
of resolved / symbols left unresolved.

Copying mostly unused tls entries could be a problem but so can be
repeated locks and allocations when these entries are actually used.

> It seems to me that, in the absence of locking,
> _dl_try_allocate_static_tls is racy, and in the presence of locking, the
> CAS loop would be unnecessary.  The race is because offset is computed
> from GL(dl_tls_static_used) once, and if l->map_tls_offset could change
> between that and its update, then so could GL(dl_tls_static_used),
> requiring offset to be recomputed based on the modified value.  But
> GL(dl_tls_static_used) would have to be updated atomically as well,
> otherwise concurrent runs for different maps could end up at the same
> offset.  The code doesn't take such cares, so I wonder if there's
> evidence we don't need to worry about that and, if so, why do we still
> need the CAS loop.  Even if we do hold a lock to prevent other threads
> from changing dl_tls_static_used concurrently, couldn't a signal handler
> that interrupts _dl_try_allocate_static_tls change it? 

See _dl_mask_all_signals (&old);

> If so, the same
> âraceâ applies, otherwise, I don't see why we need even a single CAS,
> let alone a CAS loop.
> I'm somewhat concerned about priority inversion in the atomic_delay()
> and CAS loops.  Considering that the locks the patch drops are all
> recursive, they won't pose an AS-Safety problem; if we keep them in
> place, but take the measures that have apparently been taken to avoid
> the problems of using inconsistent state when a handler interrupts a
> lock-holding operation, and of using outdated state after a signal
> handler modifies it, it should all work, and then it would be clear that
> the CAS loops are only in place for the case of functions being
> interrupted by signal handlers, not to deal with inter-thread
> concurrency.  With the locks in place, we could at least get rid of the
> atomic_delay() loop.
> This is all I got for now.  Thank you both for your work on this issue.
> -- 
> Alexandre Oliva, freedom fighter
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! --   FSF Latin America board member
> Free Software Evangelist      Red Hat Brazil Compiler Engineer


Support staff hung over, send aspirin and come back LATER.

Attachment: signal_safe.c
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]