21793 – glibc does not update the pid/tid cache after clone()

Bug 21793 - glibc does not update the pid/tid cache after clone()

Summary: glibc does not update the pid/tid cache after clone()

Status:	UNCONFIRMED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.25

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-07-19 23:05 UTC by Thomas Anderson
Modified:	2024-05-27 10:02 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas Anderson 2017-07-19 23:05:35 UTC

After commit [1], glibc no longer updates the TID cache after clone().  However, pthread still uses it, in eg. [2].  This is causing bug [3] in Chromium where after a clone(), pthread_mutex_lock() erroneously says that the lock is already owned by the calling thread when it is not.  As a workaround, doing an otherwise pointless fork() after the clone() forces glibc to update the TID cache which fixes the issue.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=c579f48edba88380635ab98cb612030e3ed8691e
[2] https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;h=dc9ca4c4764be2654141493330bd8a91a989f601;hb=HEAD#l148
[3] https://bugs.chromium.org/p/chromium/issues/detail?id=735048

Comment 1 Adhemerval Zanella 2017-07-20 00:48:04 UTC

There are some discussion about this change on systemd-devel maillist regarding another interface assumption that GLIBC does not really guarantee [1].  And the problem, as Florian described there, is GLIBC can not really intercept anymore all the ways the PID/TID of the process can change.  And these caching also have cause some issue in the past on GLIBC (some information at [2] [3] [4]).

And the problem IMHO is we can not guarantee that all pthread interfaces would work correctly after a clone call, specially now with the proliferation of different clone flags.  If you check the pthread_create, not only the tid cache is updated, but also the joinid (which at glance seems that might be problematic after clone), the cancel handling, the stack guard, etc. I am not sure which kind of pthread_t internal fields would need to correctly sync along with the right combination of clone flags to make pthread interface to work correctly, but this is out of POSIX pthread scope definition and a can of worms imho.

I think for this specific issue it would be possible to issue a direct syscall to actually get the tid of the thread and avoid using the cached, but this would pessimize performance for a *very* speicifc case. I would like to hear more community input if it would be a reasonable approach.

[1] https://lists.freedesktop.org/archives/systemd-devel/2017-July/039195.html
[2] https://sourceware.org/ml/libc-alpha/2016-10/msg00136.html
[2] https://sourceware.org/ml/libc-alpha/2016-10/msg00233.html
[2] https://sourceware.org/ml/libc-alpha/2016-11/msg00247.html
[2] https://sourceware.org/ml/libc-alpha/2016-11/msg00303.html

Comment 2 Carlos O'Donell 2017-07-20 03:05:41 UTC

(In reply to Adhemerval Zanella from comment #1)
> And the problem IMHO is we can not guarantee that all pthread interfaces
> would work correctly after a clone call, specially now with the
> proliferation of different clone flags.  If you check the pthread_create,
> not only the tid cache is updated, but also the joinid (which at glance
> seems that might be problematic after clone), the cancel handling, the stack
> guard, etc. I am not sure which kind of pthread_t internal fields would need
> to correctly sync along with the right combination of clone flags to make
> pthread interface to work correctly, but this is out of POSIX pthread scope
> definition and a can of worms imho.

Once you call clone() you own the created thread, and you cannot use any POSIX Thread functions provided by the libc you left behind when you cloned a new thread manually. If you are using clone() as a convenience function for unshare() then you should use unshare().

Take care that calling clone() yourself has implications on all the thread APIs, from thread local storage, to cancellation, to joining, to exiting threads, etc. You really really have to be ready to take over all aspects of the new thread using your own threading runtime. The point of clone() is to implement threading libraries, like those provided by glibc.

I'm all ears when it comes to hearing about what problems you use clone() to solve and finding a better solution for you.

Comment 3 Florian Weimer 2017-07-26 10:29:19 UTC

(In reply to Carlos O'Donell from comment #2)
> Once you call clone() you own the created thread, and you cannot use any
> POSIX Thread functions provided by the libc you left behind when you cloned
> a new thread manually. If you are using clone() as a convenience function
> for unshare() then you should use unshare().

That is very misleading.  If you call clone to create a new *thread*, you cannot use any glibc function at all, whether POSIX threads or not.  *Nothing* will work on the new thread, not even async-signal-safe functions.  A lot of things worked by accident, most of the time, including malloc, but with the thread cache in 2.26, malloc will no longer be reliable, for instance.

We still need a correct TID if the clone call created a new process, though.  There is really no way around that.  This will be tricky due to vfork.

Comment 4 Adhemerval Zanella 2017-07-26 13:12:42 UTC

(In reply to Florian Weimer from comment #3)
> (In reply to Carlos O'Donell from comment #2)
> > Once you call clone() you own the created thread, and you cannot use any
> > POSIX Thread functions provided by the libc you left behind when you cloned
> > a new thread manually. If you are using clone() as a convenience function
> > for unshare() then you should use unshare().
> 
> That is very misleading.  If you call clone to create a new *thread*, you
> cannot use any glibc function at all, whether POSIX threads or not. 
> *Nothing* will work on the new thread, not even async-signal-safe functions.
> A lot of things worked by accident, most of the time, including malloc, but
> with the thread cache in 2.26, malloc will no longer be reliable, for
> instance.
> 
> We still need a correct TID if the clone call created a new process, though.
> There is really no way around that.  This will be tricky due to vfork.

Should we considere this NOTABUG/WONTFIX or add back tid update on clone? Chrome developers seemed to not agree with us that glibc function are not reliable after a clone call since they are now updating the tid cache by themselves [1] (still a fragile fix imho).

I still would prefer to document clone and glibc interaction limitation rather than provide incomplete and possible problematic solutions.

[1] https://chromium.googlesource.com/chromium/src.git/+/24df419517bb8ca8ac2b75eb752bd9a133267108%5E%21/#F1

Comment 5 Florian Weimer 2017-07-26 13:21:58 UTC

(In reply to Adhemerval Zanella from comment #4)

> Should we considere this NOTABUG/WONTFIX or add back tid update on clone?
> Chrome developers seemed to not agree with us that glibc function are not
> reliable after a clone call

They *should* be reliable after a clone call if the call creates a new process.

>  since they are now updating the tid cache by
> themselves [1] (still a fragile fix imho).

> [1]
> https://chromium.googlesource.com/chromium/src.git/+/
> 24df419517bb8ca8ac2b75eb752bd9a133267108%5E%21/#F1

That's clearly a bogus change and will break once we change the TCB layout.
 
> I still would prefer to document clone and glibc interaction limitation
> rather than provide incomplete and possible problematic solutions.

Right, that seems worthwhile.  We could perhaps push the responsibility to request a TID cache update to the caller, rather than trying to parse the flags argument.  That is, provide a different interface for a vfork-style clone which would not update the cache (because it is shared with the parent).

Comment 6 Adhemerval Zanella 2017-07-26 13:53:03 UTC

(In reply to Florian Weimer from comment #5)
> (In reply to Adhemerval Zanella from comment #4)
> 
> > Should we considere this NOTABUG/WONTFIX or add back tid update on clone?
> > Chrome developers seemed to not agree with us that glibc function are not
> > reliable after a clone call
> 
> They *should* be reliable after a clone call if the call creates a new
> process.

Should it? We have 'fork' exactly for this case which setup us all the required internal field for correct glibc usage. I give you that with current clone flags this is a limited API.

> 
> >  since they are now updating the tid cache by
> > themselves [1] (still a fragile fix imho).
> 
> > [1]
> > https://chromium.googlesource.com/chromium/src.git/+/
> > 24df419517bb8ca8ac2b75eb752bd9a133267108%5E%21/#F1
> 
> That's clearly a bogus change and will break once we change the TCB layout.
>  
> > I still would prefer to document clone and glibc interaction limitation
> > rather than provide incomplete and possible problematic solutions.
> 
> Right, that seems worthwhile.  We could perhaps push the responsibility to
> request a TID cache update to the caller, rather than trying to parse the
> flags argument.  That is, provide a different interface for a vfork-style
> clone which would not update the cache (because it is shared with the
> parent).

I am not very found of providing a limited clone interface mainly because interaction with glibc internals could have subtle bugs. Some I can think of:

1. It will need to pass CLONE_CHILD_SETTID and CLONE_CHILD_CLEARTID along with TCB tid field. What happens if caller uses any of these flags along with a different tid cached field? Would it just copy the tid value to argument passed pointer? Can we have any synchronization issue with it?

2. What about fork handler, should it ignore it? 

3. What about malloc internal locks so malloc is consistent on new process?

4. What about pthread robust list? 

5. What if CLONE_THREAD is specified? Should we clear or ignore it? CLONE_VM also poses some issues.

I still think that we already have a good interface to interact along with clone and it the execv function family.

Comment 7 Florian Weimer 2017-07-27 07:24:53 UTC

At the very least, I think we should try to support a clone of a non-multi-threaded processes so that the subprocesses can create threads using pthread_create.  This still requires updating the TID cache.

Comment 8 Carlos O'Donell 2017-07-27 12:15:23 UTC

(In reply to Florian Weimer from comment #3)
> (In reply to Carlos O'Donell from comment #2)
> > Once you call clone() you own the created thread, and you cannot use any
> > POSIX Thread functions provided by the libc you left behind when you cloned
> > a new thread manually. If you are using clone() as a convenience function
> > for unshare() then you should use unshare().
> 
> That is very misleading.  If you call clone to create a new *thread*, you
> cannot use any glibc function at all, whether POSIX threads or not. 
> *Nothing* will work on the new thread, not even async-signal-safe functions.
> A lot of things worked by accident, most of the time, including malloc, but
> with the thread cache in 2.26, malloc will no longer be reliable, for
> instance.

You are absolutely right, I didn't mean to limit the scope to just the POSIX thread functions, but was answering in the context of this issue and that made my response misleading.
 
> We still need a correct TID if the clone call created a new process, though.
> There is really no way around that.  This will be tricky due to vfork.

When you say 'we' do you mean just glibc internally? If so, then I agree. The library needs a correct TID in order for it's own functions to work reliably, the user doesn't need a TID and we don't provide gettid() nor document what a TID is in respect to a PID or pthread_t.

I think what's under contention here is: Should clone() with the appropriate flags to behave like fork(), or vfork(), actually be a supported API which after calling still allows you to call back into the runtime?

Are you arguing that when clone() is used in a fork-like manner that it should work just like fork. Why not just use fork() or vfork()?

I agree with Adhemerval that the use of clone() is bound to cause us no end of problems, and that we should document that after a clone() call you can't call any libc functions. You have effectively ended the ability to use any library functions until you execve, much like the official story for vfork().

If anything we should be adding documentation to state which functions are, as a GNU extension, supported and safe to call between vfork() and exec()/exit(). Which would be easier to explain and document those semantics.

Comment 9 Carlos O'Donell 2017-07-27 12:16:52 UTC

(In reply to Florian Weimer from comment #7)
> At the very least, I think we should try to support a clone of a
> non-multi-threaded processes so that the subprocesses can create threads
> using pthread_create.  This still requires updating the TID cache.

Exactly which flags do you propose we support in this case?

Comment 10 Florian Weimer 2017-07-27 12:21:36 UTC

(In reply to Carlos O'Donell from comment #9)
> (In reply to Florian Weimer from comment #7)
> > At the very least, I think we should try to support a clone of a
> > non-multi-threaded processes so that the subprocesses can create threads
> > using pthread_create.  This still requires updating the TID cache.
> 
> Exactly which flags do you propose we support in this case?

I suggested to provide two wrappers, perhaps called clone_fork and clone_vfork.  clone_fork updates the TID cache in the child, clone_vfork does not, but perhaps sets a flag that in child that the TID cache is invalid, which is later reset in the parent.

We should move this discussion to libc-alpha.

Comment 11 李福帮 2024-05-27 02:15:03 UTC

After golang 1.22 swith to `pthread_getattr_np` to get the stack low address and the size, runc(a wide spred used container runtime) has hit this issue now.
Please see:
https://github.com/opencontainers/runc/issues/4233
https://github.com/golang/go/issues/65625#issuecomment-1939390070

Many people were working on this issue, I think the core issue is the dirty tid value in `pthread` struct after the `clone(2)` syscall.

If there was a chance, I suggest to we should restart working on this issue? For example:
1) add a public api to get the tid address;
2) update the tid value after clone(2).

Thanks.

Comment 12 Florian Weimer 2024-05-27 07:32:03 UTC

The runc usage is quite different: it uses longjmp to jump out of the clone callback. The original process exits, but the new process continues to run. Effectively, the process has been switched underneath glibc. This means we'd have to redo all initialization steps that the kernel did not copy over to the new process. It's not merely about changing glibc's notion of the TID.

Comment 13 李福帮 2024-05-27 10:02:42 UTC

(In reply to Florian Weimer from comment #12)
> The runc usage is quite different: it uses longjmp to jump out of the clone
> callback. The original process exits, but the new process continues to run.
> Effectively, the process has been switched underneath glibc. This means we'd
> have to redo all initialization steps that the kernel did not copy over to
> the new process. It's not merely about changing glibc's notion of the TID.

Yes, it is. Thanks your reply.
Could you tell us how to redo this initialization? Is it should be worked around in glibc or it can be done in Runc? Thanks.

There is a interesting thing, when we update the TID value, the bug disappeared.
Runc maintainer @cyphar has a work to overwrite the TID value in clone call, he uses `prctl(PR_GET_TID_ADDRESS, &tid_addr)` to get the TID address, and pass it to clone syscall with `CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID` flag. It looks like `clone(child_func, ca.stack_ptr, CLONE_PARENT | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, &ca, NULL, NULL, tid_addr)` . With this method, runc with go 1.22 works fine. But this method has a precondition which is that we should have a config `CONFIG_CHECKPOINT_RESTORE=y` in the kernel.

Please see: https://github.com/opencontainers/runc/pull/4247