This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] [BZ 18034] [AArch64] Lazy TLSDESC relocation data race fix
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- Cc: libc-alpha at sourceware dot org, Marcus Shawcroft <marcus dot shawcroft at arm dot com>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>
- Date: Sat, 11 Jul 2015 12:16:08 +0200
- Subject: Re: [PATCH] [BZ 18034] [AArch64] Lazy TLSDESC relocation data race fix
- Authentication-results: sourceware.org; auth=none
- References: <553793A3 dot 7030206 at arm dot com>
On Wed, Apr 22, 2015 at 01:27:15PM +0100, Szabolcs Nagy wrote:
> Other thoughts:
> - Lazy binding for static TLS may be unnecessary complexity: it seems
> that gcc generates at most two static TLSDESC relocation entries for a
> translation unit (zero and non-zero initialized TLS), so there has to be
> a lot of object files with static TLS linked into a shared object to
> make the load time relocation overhead significant. Unless there is some
> evidence that lazy static TLS relocation makes sense I would suggest
> removing that logic (from all archs). (The dynamic case is probably a
> similar micro-optimization, but there the lazy vs non-lazy semantics are
> different in case of undefined symbols and some applications may depend
> on that).
I agree with this. As undefined symbols there is bug that we segfault
with weak tls variable so I doubt that anyone depends on that.
You didn't mention one essential argument in your analysis as overhead
is caused by unused tls variables. When variable is used then you need
to do relocation anyway and lazy one could be mangitude more expensive
There is project on my backlog to improve tls access
in libraries which is still slow even with gnu2 so it would need
Eager binding of TLS is prerequisite. Now tls models are flawed design
as they try to save space by being lazy without any evidence that there
is application that uses it.
Main idea that total tls usage is for most application less than 65536
bytes. So unless application uses more we could preallocate that when
loading dso. That makes a code for dynamic library require only one
extra read versus tls access in main binary when you use following.
static int foo_offset;
#define foo *(foo_offset > 0 ? tcb + foo_offset: get_tls_ptr (- foo_offset))
Here you could do lazy allocation in get_tls_ptr without taking lock.
There is bug that when you do allocation in signal handler it calls
malloc which could result in deadlock, there was patch to use mmap to
fix it which was reverted.