Bug 6399

Summary: gettid() should have a wrapper
Product: glibc Reporter: Michael Kerrisk <mtk.manpages>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: REOPENED ---    
Severity: normal CC: bugdal, desrt, gabriele.svelto, glibc-bugs, michael.kerrisk, wbrana
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Michael Kerrisk 2008-04-14 13:04:01 UTC
Currently, glibc does not provide a wrapper for gettid().  Nevertheless, 
There are number of corners of the Linux syscall API where the use of Linux
thread IDs, as returned by gettid(), is essential:

a) the SIGEV_THREAD_ID notification mechanism (a Linux extension) of POSIX
timers (timer_create()). 

b) sched_setaffinity()/sched_getaffinity() can be used to set the CPU affinity
of specific threads.

c) The F_SETOWN and F_GETOWN commands of fcntl() can specify a thread ID.

Given this, it seems time that gettid() should be promoted to full member status
in glibc, and have a wrapper provided.
Comment 1 Ulrich Drepper 2008-04-14 14:03:36 UTC
Never.  No program must ever assume that a thread runs on the same kernel thread
all the time.
Comment 2 Michael Kerrisk 2008-04-14 14:45:30 UTC
Point taken, I suppose.  However, programs that make use of the features I 
mention currently have no real choice[*] other than to make this assumption 
(which of course has held true with glibc since gettid() first appeared). 

[*] Well, I suppose they do have a choice: bypass the use of Pthreads 
altogether and use direct calls to clone(), but that's not a very palatable 
choice.
Comment 3 Michael Kerrisk 2008-04-16 10:36:38 UTC
> Never.  No program must ever assume that a thread runs on the same
> kernel thread all the time.

Looking at nptl/sysdeps/unix/sysv/linux/raise.c it certainly appears that any 
program that is statically linked against glibc embeds this assumption (or it 
contains a race).
Comment 4 Michael Kerrisk 2012-03-28 23:20:22 UTC
Reopening this for reconsideration. various parts of the kernel/glibc API need
kernel thread IDs. gettid() should be promoted to full member status
in glibc.
Comment 5 Rich Felker 2012-11-05 20:56:55 UTC
Michael, your argument about static linking is invalid; it is not the
application embedding the assumption, it's part of the implementation embedding
the assumption. Being static-linked, both lie in the same file.

With that said, I think there is at least _some_ merit to the idea of exposing
kernel TIDs to applications, since the time when thread implementations without
a one-to-one correspondence between kernel threads and userspace threads seemed
viable is long-past. (It is virtually impossible for such implementations to
meet all the requirements of POSIX regarding scheduling, signals, cancellation,
blocking syscalls, etc., and they have no benefits with regard to performance.)

Such an interface (gettid) should of course not be part of the general glibc
API, but if exposed, would be a linux-specific function, like numerous other
linux extensions, for use only with interfaces like SIGEV_THREAD_ID, fcntl,
etc., and should be clearly documented as non-portable.
Comment 6 Michael Kerrisk 2013-02-03 22:21:05 UTC
So, just as further background, the syscalls that are already exposed by glibc
and that (can) make use of kernel thread IDs are at least the following:

capget(2), fcntl(2), get_robust_list(2), ioprio_set(2), sched_setaffinity(2), 
sched_setparam(2), sched_setscheduler(2), timer_create(2)
Comment 7 Rich Felker 2013-02-03 22:55:35 UTC
That list is incomplete. ALL of the sched_* functions take tids, not pids. See
bug #15088. I suspect there are more functions affected too.
Comment 8 Michael Kerrisk 2013-02-03 23:11:05 UTC
> That list is incomplete.

Agreed. (That's why I said "at least"). 

Obviously also all of the sched_get* analogues of the sched_set*() I listed.
AFAICT, though not 100% sure, the following belong in the list also:

kcmp(2),
move_pages(2),
migrate_pages(2),
process_vm_readv(2),
process_vm_writev(2),
ptrace (2)
Comment 9 desrt 2013-02-08 21:19:52 UTC
I'd argue that this should not be a syscall wrapper.  The libc can get this
information out of the TLS segment _very_ quickly.
Comment 10 Michael Kerrisk 2013-02-08 23:25:45 UTC
> I'd argue that this should not be a syscall wrapper.  The libc 
> can get this information out of the TLS segment _very_ quickly.

Given the unpleasant consequences that have resulted from PID caching for
getpid(), I'm sceptical that this is a good idea. At the very least, I wonder
if there is some subtle consequence that will bite us later.
Comment 11 Rich Felker 2013-02-09 00:01:04 UTC
If an application is calling gettid frequently enough for performance to
matter, it's probably doing something very wrong anyway.

With that said, most of the problems with caching pid/tid come from use of
clone() (or worse, vfork) directly by applications, which should probably not
be a supported use. With TLS being a mandatory feature in modern glibc and the
thread-pointer being always-initialized for purposes like ssp, I don't think
there's any way applications can safely clone, whereby "safely" I mean "without
a risk that internal libc state is inconsistent afterwards".

Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary,
but I don't think it would be a problem, either.
Comment 12 desrt 2013-02-09 00:19:16 UTC
I'm trying to roll my own recursive mutex.
Comment 13 Rich Felker 2013-02-09 00:40:09 UTC
In that case, you can always do the caching yourself:

int my_caching_gettid()
{
    _Thread_local int tid;
    return tid ? tid : gettid();
}

This might be mildly slower than having glibc do it just because of which TLS
model gets used; whether that matters would require some measurement, I think.
Comment 14 desrt 2013-02-09 00:41:43 UTC
I did the benchmarking on various tricks to get around this... unfortunately,
TLS from shared libraries is quite slow.
Comment 15 Michael Kerrisk 2013-02-09 02:06:11 UTC
(In reply to comment #11)

> With that said, most of the problems with caching pid/tid come from use of
> clone() (or worse, vfork) directly by applications, 

Not just that. Also, the caching of PIDs forced some limitations on how PID
namespaces could be implemented, AFAIK.

> Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary,
> but I don't think it would be a problem, either.

Given the history, I'd say caution is the best approach--i.e., don't cache.
Comment 16 Rich Felker 2013-02-09 02:38:55 UTC
I don't see how caching could have any effect on namespaces. Any application is
able (and entitled) to store its own pid and assume that remains constant for
the lifetime of the process. Whether this happens in application-level code or
libc-level code is rather irrelevant.
Comment 17 Michael Kerrisk 2013-02-09 02:53:06 UTC
(In reply to comment #16)
> I don't see how caching could have any effect on namespaces. Any application is
> able (and entitled) to store its own pid and assume that remains constant for
> the lifetime of the process. Whether this happens in application-level code or
> libc-level code is rather irrelevant.

So, a possible implementation of PID namespaces would have allowed setns() to
change the caller's PID namespace, which in effect would change the caller's
PID. Of course, this is not done. Instead, setns() into a PID namespace only
changes the PID namespace of children subsequently created by the caller. 

One of the cited reasons that setns() didn't change the PID namespace of the
caller is because glibc caches PIDs, and the result of getpid() would thus no
longer be correct. 

Now, you could say that the issue equally affects the application itself, but
there is a difference: if an application calls setns(), then it would know (in
that alternative implementation model) that its PID was about to change and
that any PID that *it* had cached was now invalid.
Comment 18 Rich Felker 2013-02-09 03:24:31 UTC
I still maintain that it's a broken design for setns to change the caller's
pid. A pid is a fundamentally invariant property of a process. Even if the
_application_ knew its pid would change as a result of calling setns, it could
be linked to any number of non-libc libraries which are entitled to make the
assumption that pids are an invariant property of the process. If the pid were
to change when setns is called, then the only valid action after setns should
be calling an exec-family function or _exit.
Comment 19 Michael Kerrisk 2013-02-09 23:04:20 UTC
(In reply to comment #18)
> I still maintain that it's a broken design for setns to change the caller's
> pid. A pid is a fundamentally invariant property of a process.

Rich, 

I think what you mean is: "this is the way it's always been done". But this was
not handed to us on stone tablets. Linux has already changed a lot of old
assumptions in favor useful innovations. We can argue endlessly about whether
or not the alternative that I talked about it is a broken design. I'm actually
fairly agnostic on that point, but my bottom line point is that glibc
effectively imposed policy on kernel user space (i.e., "PIDs are invariant"),
and I think that was A Bad Idea (TM) for a library that provides the
fundamental plumbing from user space to the kernel.