Bug 24606 - Dynamic TLS address is not congruent with p_vaddr when (p_vaddr % p_align != 0)
Summary: Dynamic TLS address is not congruent with p_vaddr when (p_vaddr % p_align != 0)
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.31
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2019-05-23 03:55 UTC by Ryan Prichard
Modified: 2019-08-01 05:10 UTC (History)
6 users (show)

See Also:
Last reconfirmed: 2019-05-24 00:00:00
fweimer: security-

test case DSO (293 bytes, text/x-csrc)
2019-05-23 03:56 UTC, Ryan Prichard
test1_main.c (158 bytes, text/x-csrc)
2019-05-23 03:57 UTC, Ryan Prichard
trim-pt-tls.c (539 bytes, text/x-csrc)
2019-05-23 03:57 UTC, Ryan Prichard
test1_dso.c (293 bytes, text/x-csrc)
2019-05-23 03:58 UTC, Ryan Prichard

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Prichard 2019-05-23 03:55:55 UTC
For static TLS, glibc and musl allocate a TLS block such that p_vaddr and the run-time address are congruent with respect to p_align:

  (tpoffset % p_align) == (p_vaddr % p_align)

For dynamic TLS, musl appears to satisfy the same formula, but glibc doesn't appear to. Is this considered a bug in glibc?

For context, I'm working on ELF TLS in Android's Bionic libc, and a problem recently came up where a TLS overalignment hack in LLVM lld resulted in TLS segments where (p_vaddr % p_align) was non-zero. See https://bugs.llvm.org/show_bug.cgi?id=41527 and https://reviews.llvm.org/D61824. The lld hack has since been reverted, and lld will usually satisfy (p_vaddr % p_align == 0) again, unless 8KiB-or-larger TLS alignment is used.

Test case (I have only tested this on x86-64):

$ gcc -fpic -shared test1_dso.c -o libtest1_dso.so
$ gcc test1_main.c -o test1 -ldl -Wl,-rpath,'$ORIGIN'
$ gcc trim-pt-tls.c -o trim-pt-tls
$ ./trim-pt-tls libtest1_dso.so
$ ./test1
tp          = 0x7f4bcfc34700
&tlsvar     = 0x56124db2f9e0
tlsvar[1]   = 7

$ readelf -lW libtest1_dso.so
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  TLS            0x000ded 0x0000000000200ded 0x0000000000200ded 0x000003 0x000003 R   0x4

(v_addr % p_align) is 1. Static linkers typically ensure that (v_addr % p_align) is 0 instead. musl appears to align the DTLS block at 1-mod-4 instead:

tp          = 0x7fa4faea6b48
&tlsvar     = 0x7fa4faea72d1
tlsvar[1]   = 7
Comment 1 Ryan Prichard 2019-05-23 03:56:29 UTC
Created attachment 11793 [details]
test case DSO
Comment 2 Ryan Prichard 2019-05-23 03:57:01 UTC
Created attachment 11794 [details]
Comment 3 Ryan Prichard 2019-05-23 03:57:29 UTC
Created attachment 11795 [details]
Comment 4 Ryan Prichard 2019-05-23 03:58:01 UTC
Created attachment 11796 [details]
Comment 5 Szabolcs Nagy 2019-05-23 16:12:13 UTC
i think the intention in glibc is to satisfy the % formula, based on the big comment in _dl_determine_tlsoffset (and on the presence of map->l_tls_firstbyte_offset which is the % p_align remainder).

it seems to work on aarch64, at least the test gives &tlsvar % 4 == 1.

x86 tls offsets are handled with a different code path because that's "tls variant 2" (TLS_TCB_AT_TP) instead of "tls variant 1" (TLS_DTV_AT_TP) so it's possible that there is a bug in that (but right now i dont have an x86 machine available to test that).
Comment 6 Ryan Prichard 2019-05-23 20:38:49 UTC
I see the same thing with aarch64 glibc:

tp          = 0x77f7144710
&tlsvar     = 0x77f71447b1

Based on the value of &tlsvar vs __builtin_thread_pointer(), it appears that a small-enough TLS segment is automatically allocated in reserved static TLS space. If I increase the size of tlsvar from 4 to 0x1000, then the test case fails:

tp          = 0x771d65b710
&tlsvar     = 0x64be7458e0

On x86-64, if I mark tlsvar with __attribute__((tls_model("initial-exec"))), forcing it into the reserved static TLS area, then the test case passes:

tp          = 0x7f6033d7c700
&tlsvar     = 0x7f6033d7c685
Comment 7 Szabolcs Nagy 2019-05-24 09:26:52 UTC
sorry somehow i missed that we are talking about dynamic tls and i was
looking at the static tls code.

indeed the dynamic tls code does not access the l_tls_firstbyte_offset,
only alignment and size are passed to allocate_dtv_entry.

i guess there were historical linkers that produced p_vaddr % p_align != 0
for static tls, but not for dynamic tls or the dynamic tls case were not
visible because affected targets supported unaligned access, so it was only
fixed in the static tls case. (with local exec tls the runtime tp offset
must match the value computed by the static linker, there is no such
requirement in the dynamic case, so if the user code does not care about
alignment then wrong alignment works.)

i assume nowadays linkers ensure the beginning of the tls block is aligned,
so it's not an issue in practice, but still an ELF TLS correctness issue.