This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Tests that use clone directly race against SSE register save/restore.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, Arjun Shankar <arjun at redhat dot com>, Roland McGrath <roland at hack dot frob dot com>
- Date: Mon, 20 Jul 2015 14:02:50 -0400
- Subject: Tests that use clone directly race against SSE register save/restore.
- Authentication-results: sourceware.org; auth=none
H.J.,
On some systems we see random failures in tst-getpid1. Arjun Shankar
reported this and I did a quick look, and found some problems with our
tests and the use of TLS with RTLD_*CALL.
The test itself is interesting because it uses clone to create
a second thread via CLONE_VM which means that on x86_64 we have
the same $fs for both concurrently running threads.
Then the dynamic loader attempts to use TLS header.rtld_must_xmm_save
to decide if a save/restore of the SSE/AVX/AVX512 registers is
required. That state is now global though and shared both both racing
threads which try to write and read from that location as they process
a symbol lookups.
The fact that both threads might write and read to the same memory
makes this a data race and is undefined behaviour. Is the test faulty
or should the loader implementation have used atomic operations to
write to thread data?
An example ordering that causes problems on non-AVX-enabled hardware:
T1:
399 # define RTLD_ENABLE_FOREIGN_CALL \
400 int old_rtld_must_xmm_save = THREAD_GETMEM (THREAD_SELF, \
401 header.rtld_must_xmm_save); \
402 THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 1)
T2:
399 # define RTLD_ENABLE_FOREIGN_CALL \
400 int old_rtld_must_xmm_save = THREAD_GETMEM (THREAD_SELF, \
401 header.rtld_must_xmm_save); \
402 THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 1)
fs:header.rtld_must_xmm_save == 1
T2:
110
111 result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
112 version, ELF_RTYPE_CLASS_PLT, flags, NULL);
113
404 # define RTLD_PREPARE_FOREIGN_CALL \
405 do if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save)) \
406 { \
407 _dl_x86_64_save_sse (); \
408 THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 0); \
409 } \
410 while (0)
fs:header.rtld_must_xmm_save == 0
have_avx is initialized on this thread, but not yet visible to T1.
411
412 # define RTLD_FINALIZE_FOREIGN_CALL \
413 do { \
414 if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save) == 0) \
415 _dl_x86_64_restore_sse (); \
416 THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, \
417 old_rtld_must_xmm_save); \
418 } while (0)
419 # endif
T1:
Despite never having called RTLD_PREPARE_FOREIGN_CALL we reach here in T1
with headers.rtld_must_xmm_save == 0, and the writes from T2 not being
visible to T1 yet.
411
412 # define RTLD_FINALIZE_FOREIGN_CALL \
413 do { \
414 if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save) == 0) \
415 _dl_x86_64_restore_sse (); \
416 THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, \
417 old_rtld_must_xmm_save); \
418 } while (0)
419 # endif
This results in a SIGILL as T1 sees an uninitialized have_avx and attempts to
issue avx restore instructions that the hardware doesn't support.
How do we fix this? Atomic accesses to have_avx and the header.rtld_must_xmm_save?
What else isn't safe with two threads using the same memory?
Feels to me like the test case is invalid and should not attempt to clone a thread
that glibc doesn't know about. However, this is apparently common in some low-level
tools, so we may wish to try continue support what is done in this test case?
Now, keep in mind the above is merely a hypothesis, but the SIGILL's are real:
[root@intel-d3c4702-01 ~]# ./a.out
new thread: 10435
new thread: 10435
pid = 10435
Illegal instruction
And reproducible, but only in this CLONE_VM case.
Cheers,
Carlos.