mixing clone(CLONE_VM|CLONE_FILES) with libc

Lukasz Lempart llempart@gmail.com
Sat Jan 10 01:23:00 GMT 2009


On Fri, Jan 9, 2009 at 4:07 PM, Lukasz Lempart <llempart@gmail.com> wrote:
> I am currently working on a project where I use thread-processes
> created with clone(CLONE_VM|CLONE_FILES). The scenario is as follows
> (and I apologize that this is a little long):
>
> I am running Linux 2.6.9 with glibc 2.3.4 installed. I have this
> problem on both 32-bit and 64-bit x86 architectures.
>
> The application I am working on is a multi-threaded one. At some
> point, one of the threads, creates a new thread using pthread_create.
> I will call this thread M.
>
> M in turn does the clone system call with CLONE_VM|CLONE_FILES flags.
> I'll call the new one C. M then does some other work and exits. At any
> point there may be multiple C's running, but only one M at a time. C's
> may be terminated at any point and this is done by sending them
> SIGQUIT. C's always call _exit() to terminate.
>
> After much testing, it became apparent that the TLS area used by M
> could at some point be deallocated and C would crash when making libc
> calls that use this area. I wrote my own wrappers for clone to allow
> for the CLONE_SETTLS feature. To create a new TLS area, I allocate a
> zeroed 4k page, and copy over the 16*sizeof(void *) header.
>
> This works fine as long as C does not call printf().
> If C calls printf() I seem to run into some sort of deadlock. M will
> get stuck in __lll_mutex_lock_wait() as per the backtrace below:
>
> #0  0x0000003e56bd2d2b in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
> #1  0x0000003e56d315f0 in _IO_stdfile_2_lock () from /lib64/tls/libc.so.6
> #2  0x00000000412089f0 in ?? ()
> #3  0x0000003e56b5c9b4 in ?? () from /lib64/tls/libc.so.6
> #4  0x0000000000000003 in ?? ()
> #5  0x0000000001305f67 in do_free (ptr=0x3e56d2f8c0) at src/tcmalloc.cc:2342
> #6  0x0000003e56b5c830 in puts () from /lib64/tls/libc.so.6
> #7  0x00000000006fc643 in CloneManager::do_clone (arg=0x1baae40)
>    at somedirectory/clone.cc:694
> #8  0x0000003e57606137 in start_thread () from /lib64/tls/libpthread.so.0
> #9  0x0000003e56bc7543 in clone () from /lib64/tls/libc.so.6
> #10 0x0000000000000000 in ?? ()
>
> The code is compiled to inline everything so the stack trace is not
> very informative. do_clone() is the body of M and printf() is being
> executed (clone.cc +694). Unfortunately I cannot provide any of my
> code since it is proprietary.
>
> Whether there are any C's running at this point or not does not make a
> difference.
>
> Any ideas on what might be causing this? Do I perhaps need to
> initialize any of the remainder of the pthread structure? I have been
> stuck on this for a few days and any help whatsoever would be much
> appreciated. Please let me know if you want/need more information.
>
> Thank you,
>
> Lukasz
>
I think I figured it out. The problem is when I get SIGQUIT while
inside printf. The handler will run on return from write and the mutex
will never be unlocked.

Thanks,

Lukasz



More information about the Libc-help mailing list