This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [BZ 18034] [AArch64] Lazy TLSDESC relocation data race fix

On Wed, Apr 22, 2015 at 01:27:15PM +0100, Szabolcs Nagy wrote:
> Other thoughts:
> - Lazy binding for static TLS may be unnecessary complexity: it seems
> that gcc generates at most two static TLSDESC relocation entries for a
> translation unit (zero and non-zero initialized TLS), so there has to be
> a lot of object files with static TLS linked into a shared object to
> make the load time relocation overhead significant. Unless there is some
> evidence that lazy static TLS relocation makes sense I would suggest
> removing that logic (from all archs). (The dynamic case is probably a
> similar micro-optimization, but there the lazy vs non-lazy semantics are
> different in case of undefined symbols and some applications may depend
> on that).
I agree with this. As undefined symbols there is bug that we segfault
with weak tls variable so I doubt that anyone depends on that.

You didn't mention one essential argument in your analysis as overhead
is caused by unused tls variables. When variable is used then you need
to do relocation anyway and lazy one could be mangitude more expensive
than eager.

There is project on my backlog to improve tls access
in libraries which is still slow even with gnu2 so it would need
introduce -mtls-dialect=gnu3.

Eager binding of TLS is prerequisite. Now tls models are flawed design
as they try to save space by being lazy without any evidence that there
is application that uses it.

Main idea that total tls usage is for most application less than 65536
bytes. So unless application uses more we could preallocate that when
loading dso. That makes a code for dynamic library require only one
extra read versus tls access in main binary when you use following.

static int foo_offset;
#define foo *(foo_offset > 0 ? tcb + foo_offset: get_tls_ptr (- foo_offset))
Here you could do lazy allocation in get_tls_ptr without taking lock.
There is bug that when you do allocation in signal handler it calls
malloc which could result in deadlock, there was patch to use mmap to
fix it which was reverted.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]