For static TLS, glibc and musl allocate a TLS block such that p_vaddr and the run-time address are congruent with respect to p_align:
(tpoffset % p_align) == (p_vaddr % p_align)
For dynamic TLS, musl appears to satisfy the same formula, but glibc doesn't appear to. Is this considered a bug in glibc?
For context, I'm working on ELF TLS in Android's Bionic libc, and a problem recently came up where a TLS overalignment hack in LLVM lld resulted in TLS segments where (p_vaddr % p_align) was non-zero. See https://bugs.llvm.org/show_bug.cgi?id=41527 and https://reviews.llvm.org/D61824. The lld hack has since been reverted, and lld will usually satisfy (p_vaddr % p_align == 0) again, unless 8KiB-or-larger TLS alignment is used.
Test case (I have only tested this on x86-64):
$ gcc -fpic -shared test1_dso.c -o libtest1_dso.so
$ gcc test1_main.c -o test1 -ldl -Wl,-rpath,'$ORIGIN'
$ gcc trim-pt-tls.c -o trim-pt-tls
$ ./trim-pt-tls libtest1_dso.so
tp = 0x7f4bcfc34700
&tlsvar = 0x56124db2f9e0
tlsvar = 7
$ readelf -lW libtest1_dso.so
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
TLS 0x000ded 0x0000000000200ded 0x0000000000200ded 0x000003 0x000003 R 0x4
(v_addr % p_align) is 1. Static linkers typically ensure that (v_addr % p_align) is 0 instead. musl appears to align the DTLS block at 1-mod-4 instead:
tp = 0x7fa4faea6b48
&tlsvar = 0x7fa4faea72d1
tlsvar = 7
Created attachment 11793 [details]
test case DSO
Created attachment 11794 [details]
Created attachment 11795 [details]
Created attachment 11796 [details]
i think the intention in glibc is to satisfy the % formula, based on the big comment in _dl_determine_tlsoffset (and on the presence of map->l_tls_firstbyte_offset which is the % p_align remainder).
it seems to work on aarch64, at least the test gives &tlsvar % 4 == 1.
x86 tls offsets are handled with a different code path because that's "tls variant 2" (TLS_TCB_AT_TP) instead of "tls variant 1" (TLS_DTV_AT_TP) so it's possible that there is a bug in that (but right now i dont have an x86 machine available to test that).
I see the same thing with aarch64 glibc:
tp = 0x77f7144710
&tlsvar = 0x77f71447b1
Based on the value of &tlsvar vs __builtin_thread_pointer(), it appears that a small-enough TLS segment is automatically allocated in reserved static TLS space. If I increase the size of tlsvar from 4 to 0x1000, then the test case fails:
tp = 0x771d65b710
&tlsvar = 0x64be7458e0
On x86-64, if I mark tlsvar with __attribute__((tls_model("initial-exec"))), forcing it into the reserved static TLS area, then the test case passes:
tp = 0x7f6033d7c700
&tlsvar = 0x7f6033d7c685
sorry somehow i missed that we are talking about dynamic tls and i was
looking at the static tls code.
indeed the dynamic tls code does not access the l_tls_firstbyte_offset,
only alignment and size are passed to allocate_dtv_entry.
i guess there were historical linkers that produced p_vaddr % p_align != 0
for static tls, but not for dynamic tls or the dynamic tls case were not
visible because affected targets supported unaligned access, so it was only
fixed in the static tls case. (with local exec tls the runtime tp offset
must match the value computed by the static linker, there is no such
requirement in the dynamic case, so if the user code does not care about
alignment then wrong alignment works.)
i assume nowadays linkers ensure the beginning of the tls block is aligned,
so it's not an issue in practice, but still an ELF TLS correctness issue.