This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
mixing clone(CLONE_VM|CLONE_FILES) with libc
- From: Lukasz Lempart <llempart at gmail dot com>
- To: libc-help at sourceware dot org
- Date: Fri, 9 Jan 2009 16:07:56 -0800
- Subject: mixing clone(CLONE_VM|CLONE_FILES) with libc
I am currently working on a project where I use thread-processes
created with clone(CLONE_VM|CLONE_FILES). The scenario is as follows
(and I apologize that this is a little long):
I am running Linux 2.6.9 with glibc 2.3.4 installed. I have this
problem on both 32-bit and 64-bit x86 architectures.
The application I am working on is a multi-threaded one. At some
point, one of the threads, creates a new thread using pthread_create.
I will call this thread M.
M in turn does the clone system call with CLONE_VM|CLONE_FILES flags.
I'll call the new one C. M then does some other work and exits. At any
point there may be multiple C's running, but only one M at a time. C's
may be terminated at any point and this is done by sending them
SIGQUIT. C's always call _exit() to terminate.
After much testing, it became apparent that the TLS area used by M
could at some point be deallocated and C would crash when making libc
calls that use this area. I wrote my own wrappers for clone to allow
for the CLONE_SETTLS feature. To create a new TLS area, I allocate a
zeroed 4k page, and copy over the 16*sizeof(void *) header.
This works fine as long as C does not call printf().
If C calls printf() I seem to run into some sort of deadlock. M will
get stuck in __lll_mutex_lock_wait() as per the backtrace below:
#0 0x0000003e56bd2d2b in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
#1 0x0000003e56d315f0 in _IO_stdfile_2_lock () from /lib64/tls/libc.so.6
#2 0x00000000412089f0 in ?? ()
#3 0x0000003e56b5c9b4 in ?? () from /lib64/tls/libc.so.6
#4 0x0000000000000003 in ?? ()
#5 0x0000000001305f67 in do_free (ptr=0x3e56d2f8c0) at src/tcmalloc.cc:2342
#6 0x0000003e56b5c830 in puts () from /lib64/tls/libc.so.6
#7 0x00000000006fc643 in CloneManager::do_clone (arg=0x1baae40)
at somedirectory/clone.cc:694
#8 0x0000003e57606137 in start_thread () from /lib64/tls/libpthread.so.0
#9 0x0000003e56bc7543 in clone () from /lib64/tls/libc.so.6
#10 0x0000000000000000 in ?? ()
The code is compiled to inline everything so the stack trace is not
very informative. do_clone() is the body of M and printf() is being
executed (clone.cc +694). Unfortunately I cannot provide any of my
code since it is proprietary.
Whether there are any C's running at this point or not does not make a
difference.
Any ideas on what might be causing this? Do I perhaps need to
initialize any of the remainder of the pthread structure? I have been
stuck on this for a few days and any help whatsoever would be much
appreciated. Please let me know if you want/need more information.
Thank you,
Lukasz