This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Tests that use clone directly race against SSE register save/restore.


H.J.,

On some systems we see random failures in tst-getpid1. Arjun Shankar
reported this and I did a quick look, and found some problems with our
tests and the use of TLS with RTLD_*CALL.

The test itself is interesting because it uses clone to create
a second thread via CLONE_VM which means that on x86_64 we have
the same $fs for both concurrently running threads.

Then the dynamic loader attempts to use TLS header.rtld_must_xmm_save
to decide if a save/restore of the SSE/AVX/AVX512 registers is
required. That state is now global though and shared both both racing
threads which try to write and read from that location as they process
a symbol lookups.

The fact that both threads might write and read to the same memory
makes this a data race and is undefined behaviour. Is the test faulty
or should the loader implementation have used atomic operations to
write to thread data?

An example ordering that causes problems on non-AVX-enabled hardware:

T1:
399 #  define RTLD_ENABLE_FOREIGN_CALL \
400   int old_rtld_must_xmm_save = THREAD_GETMEM (THREAD_SELF,                    \
401                                               header.rtld_must_xmm_save);     \
402   THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 1)

T2:
399 #  define RTLD_ENABLE_FOREIGN_CALL \
400   int old_rtld_must_xmm_save = THREAD_GETMEM (THREAD_SELF,                    \
401                                               header.rtld_must_xmm_save);     \
402   THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 1)

fs:header.rtld_must_xmm_save == 1

T2:

110       
111       result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
112                                     version, ELF_RTYPE_CLASS_PLT, flags, NULL);
113       

404 #  define RTLD_PREPARE_FOREIGN_CALL \
405   do if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save))              \
406     {                                                                         \
407       _dl_x86_64_save_sse ();                                                 \
408       THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save, 0);              \
409     }                                                                         \
410   while (0)

fs:header.rtld_must_xmm_save == 0
have_avx is initialized on this thread, but not yet visible to T1.

411     
412 #  define RTLD_FINALIZE_FOREIGN_CALL \
413   do {                                                                        \
414     if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save) == 0)          \
415       _dl_x86_64_restore_sse ();                                              \
416     THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save,                    \
417                    old_rtld_must_xmm_save);                                   \
418   } while (0)
419 # endif   

T1:

Despite never having called RTLD_PREPARE_FOREIGN_CALL we reach here in T1
with headers.rtld_must_xmm_save == 0, and the writes from T2 not being
visible to T1 yet.

411     
412 #  define RTLD_FINALIZE_FOREIGN_CALL \
413   do {                                                                        \
414     if (THREAD_GETMEM (THREAD_SELF, header.rtld_must_xmm_save) == 0)          \
415       _dl_x86_64_restore_sse ();                                              \
416     THREAD_SETMEM (THREAD_SELF, header.rtld_must_xmm_save,                    \
417                    old_rtld_must_xmm_save);                                   \
418   } while (0)
419 # endif   

This results in a SIGILL as T1 sees an uninitialized have_avx and attempts to
issue avx restore instructions that the hardware doesn't support.

How do we fix this? Atomic accesses to have_avx and the header.rtld_must_xmm_save?
What else isn't safe with two threads using the same memory?

Feels to me like the test case is invalid and should not attempt to clone a thread
that glibc doesn't know about. However, this is apparently common in some low-level
tools, so we may wish to try continue support what is done in this test case?

Now, keep in mind the above is merely a hypothesis, but the SIGILL's are real:

[root@intel-d3c4702-01 ~]# ./a.out
new thread: 10435
new thread: 10435
pid = 10435
Illegal instruction

And reproducible, but only in this CLONE_VM case.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]