Bug 6399

Summary: gettid() should have a wrapper
Product: glibc Reporter: Michael Kerrisk <mtk.manpages>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: REOPENED ---    
Severity: normal CC: bugdal, carlos, desrt, fweimer, gabriele.svelto, glibc-bugs, justin.lebar, michael.kerrisk, nmiell, spatz, tim, wbrana
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Michael Kerrisk 2008-04-14 13:04:01 IST
Currently, glibc does not provide a wrapper for gettid().  Nevertheless, 
There are number of corners of the Linux syscall API where the use of Linux
thread IDs, as returned by gettid(), is essential:

a) the SIGEV_THREAD_ID notification mechanism (a Linux extension) of POSIX
timers (timer_create()). 

b) sched_setaffinity()/sched_getaffinity() can be used to set the CPU affinity
of specific threads.

c) The F_SETOWN and F_GETOWN commands of fcntl() can specify a thread ID.

Given this, it seems time that gettid() should be promoted to full member status
in glibc, and have a wrapper provided.
Comment 1 Ulrich Drepper 2008-04-14 14:03:36 IST
Never.  No program must ever assume that a thread runs on the same kernel thread
all the time.
Comment 2 Michael Kerrisk 2008-04-14 14:45:30 IST
Point taken, I suppose.  However, programs that make use of the features I 
mention currently have no real choice[*] other than to make this assumption 
(which of course has held true with glibc since gettid() first appeared). 

[*] Well, I suppose they do have a choice: bypass the use of Pthreads 
altogether and use direct calls to clone(), but that's not a very palatable 
choice.
Comment 3 Michael Kerrisk 2008-04-16 10:36:38 IST
> Never.  No program must ever assume that a thread runs on the same
> kernel thread all the time.

Looking at nptl/sysdeps/unix/sysv/linux/raise.c it certainly appears that any 
program that is statically linked against glibc embeds this assumption (or it 
contains a race).
Comment 4 Michael Kerrisk 2012-03-28 23:20:22 IST
Reopening this for reconsideration. various parts of the kernel/glibc API need kernel thread IDs. gettid() should be promoted to full member status
in glibc.
Comment 5 Rich Felker 2012-11-05 20:56:55 IST
Michael, your argument about static linking is invalid; it is not the application embedding the assumption, it's part of the implementation embedding the assumption. Being static-linked, both lie in the same file.

With that said, I think there is at least _some_ merit to the idea of exposing kernel TIDs to applications, since the time when thread implementations without a one-to-one correspondence between kernel threads and userspace threads seemed viable is long-past. (It is virtually impossible for such implementations to meet all the requirements of POSIX regarding scheduling, signals, cancellation, blocking syscalls, etc., and they have no benefits with regard to performance.)

Such an interface (gettid) should of course not be part of the general glibc API, but if exposed, would be a linux-specific function, like numerous other linux extensions, for use only with interfaces like SIGEV_THREAD_ID, fcntl, etc., and should be clearly documented as non-portable.
Comment 6 Michael Kerrisk 2013-02-03 22:21:05 IST
So, just as further background, the syscalls that are already exposed by glibc and that (can) make use of kernel thread IDs are at least the following:

capget(2), fcntl(2), get_robust_list(2), ioprio_set(2), sched_setaffinity(2),  sched_setparam(2), sched_setscheduler(2), timer_create(2)
Comment 7 Rich Felker 2013-02-03 22:55:35 IST
That list is incomplete. ALL of the sched_* functions take tids, not pids. See bug #15088. I suspect there are more functions affected too.
Comment 8 Michael Kerrisk 2013-02-03 23:11:05 IST
> That list is incomplete.

Agreed. (That's why I said "at least"). 

Obviously also all of the sched_get* analogues of the sched_set*() I listed. AFAICT, though not 100% sure, the following belong in the list also:

kcmp(2),
move_pages(2),
migrate_pages(2),
process_vm_readv(2),
process_vm_writev(2),
ptrace (2)
Comment 9 desrt 2013-02-08 21:19:52 IST
I'd argue that this should not be a syscall wrapper.  The libc can get this information out of the TLS segment _very_ quickly.
Comment 10 Michael Kerrisk 2013-02-08 23:25:45 IST
> I'd argue that this should not be a syscall wrapper.  The libc 
> can get this information out of the TLS segment _very_ quickly.

Given the unpleasant consequences that have resulted from PID caching for getpid(), I'm sceptical that this is a good idea. At the very least, I wonder if there is some subtle consequence that will bite us later.
Comment 11 Rich Felker 2013-02-09 00:01:04 IST
If an application is calling gettid frequently enough for performance to matter, it's probably doing something very wrong anyway.

With that said, most of the problems with caching pid/tid come from use of clone() (or worse, vfork) directly by applications, which should probably not be a supported use. With TLS being a mandatory feature in modern glibc and the thread-pointer being always-initialized for purposes like ssp, I don't think there's any way applications can safely clone, whereby "safely" I mean "without a risk that internal libc state is inconsistent afterwards".

Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary, but I don't think it would be a problem, either.
Comment 12 desrt 2013-02-09 00:19:16 IST
I'm trying to roll my own recursive mutex.
Comment 13 Rich Felker 2013-02-09 00:40:09 IST
In that case, you can always do the caching yourself:

int my_caching_gettid()
{
    _Thread_local int tid;
    return tid ? tid : gettid();
}

This might be mildly slower than having glibc do it just because of which TLS model gets used; whether that matters would require some measurement, I think.
Comment 14 desrt 2013-02-09 00:41:43 IST
I did the benchmarking on various tricks to get around this... unfortunately, TLS from shared libraries is quite slow.
Comment 15 Michael Kerrisk 2013-02-09 02:06:11 IST
(In reply to comment #11)

> With that said, most of the problems with caching pid/tid come from use of
> clone() (or worse, vfork) directly by applications, 

Not just that. Also, the caching of PIDs forced some limitations on how PID namespaces could be implemented, AFAIK.

> Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary,
> but I don't think it would be a problem, either.

Given the history, I'd say caution is the best approach--i.e., don't cache.
Comment 16 Rich Felker 2013-02-09 02:38:55 IST
I don't see how caching could have any effect on namespaces. Any application is able (and entitled) to store its own pid and assume that remains constant for the lifetime of the process. Whether this happens in application-level code or libc-level code is rather irrelevant.
Comment 17 Michael Kerrisk 2013-02-09 02:53:06 IST
(In reply to comment #16)
> I don't see how caching could have any effect on namespaces. Any application is
> able (and entitled) to store its own pid and assume that remains constant for
> the lifetime of the process. Whether this happens in application-level code or
> libc-level code is rather irrelevant.

So, a possible implementation of PID namespaces would have allowed setns() to change the caller's PID namespace, which in effect would change the caller's PID. Of course, this is not done. Instead, setns() into a PID namespace only changes the PID namespace of children subsequently created by the caller. 

One of the cited reasons that setns() didn't change the PID namespace of the caller is because glibc caches PIDs, and the result of getpid() would thus no longer be correct. 

Now, you could say that the issue equally affects the application itself, but there is a difference: if an application calls setns(), then it would know (in that alternative implementation model) that its PID was about to change and that any PID that *it* had cached was now invalid.
Comment 18 Rich Felker 2013-02-09 03:24:31 IST
I still maintain that it's a broken design for setns to change the caller's pid. A pid is a fundamentally invariant property of a process. Even if the _application_ knew its pid would change as a result of calling setns, it could be linked to any number of non-libc libraries which are entitled to make the assumption that pids are an invariant property of the process. If the pid were to change when setns is called, then the only valid action after setns should be calling an exec-family function or _exit.
Comment 19 Michael Kerrisk 2013-02-09 23:04:20 IST
(In reply to comment #18)
> I still maintain that it's a broken design for setns to change the caller's
> pid. A pid is a fundamentally invariant property of a process.

Rich, 

I think what you mean is: "this is the way it's always been done". But this was not handed to us on stone tablets. Linux has already changed a lot of old assumptions in favor useful innovations. We can argue endlessly about whether or not the alternative that I talked about it is a broken design. I'm actually fairly agnostic on that point, but my bottom line point is that glibc effectively imposed policy on kernel user space (i.e., "PIDs are invariant"), and I think that was A Bad Idea (TM) for a library that provides the fundamental plumbing from user space to the kernel.
Comment 20 Justin Lebar 2013-09-21 00:14:59 IST
> the syscalls that are already exposed by glibc and that (can) make use of 
> kernel thread IDs are at least the following:

FWIW {get,set}priority() also take tid's on Linux.  It's very hard to use these correctly in a multithread environment, but it's basically impossible to do so without gettid().  :)
Comment 21 Ondrej Bilka 2013-10-09 19:31:59 IST
*** Bug 14300 has been marked as a duplicate of this bug. ***
Comment 22 Carlos O'Donell 2014-01-10 20:44:58 IST
This issue was raised again recently and I'm not opposed to it, but we would need to very carefully describe what "tid" means e.g. task id (not thread) and enhance the documentation to describe which API functions accept tid's.

To be clear the next step is two things:
* Add documentation for gettid, describe what a tid *is*, and it's properties and the fact that it is specific to linux. Find all other functions that also accept tid's and enhance their descriptions (or provide stubs) that say they accept tids on Linux.
* Add the gettid wrapper.
Comment 23 Rich Felker 2014-01-10 20:56:24 IST
I'm unclear on what you mean by clarifying that it's not a thread id. Is this just a matter of distinguishing it from pthread_t? Or are you hesitant to establish a permanent one-to-one correlation between threads from the application's perspective and threads from the kernel's perspective? The idea of implementing POSIX threads with an M:N threading approach is dead and buried as far as I can tell; it's largely incompatible with POSIX semantics for scheduling and blocking syscalls, unless you add a huge userspace emulation layer even uglier than what LinuxThreads was. This is not an implementation flaw in glibc or Linux but a fundamental limitation.

I think it's perfectly reasonable to add gettid, documenting it as Linux-specific, and documenting that each thread (in the POSIX or C11 sense) has a corresponding kernelspace identifier, in the form of a 32-bit positive signed integer, used for certain Linux-specific features. In addition, since this identifier is guaranteed to be unique per thread, it may be used by applications implementing their own synchronization mechanisms via atomics and futex. (Note that pthread_t is not useful for this purpose, since it may be 64-bit, whereas futex only supports 32-bit values.)
Comment 24 Michael Kerrisk 2014-01-10 23:52:51 IST
(In reply to Carlos O'Donell from comment #22)
> This issue was raised again recently and I'm not opposed to it, but we would
> need to very carefully describe what "tid" means e.g. task id (not thread)
> and enhance the documentation to describe which API functions accept tid's.
> 
> To be clear the next step is two things:
> * Add documentation for gettid, describe what a tid *is*, and it's
> properties and the fact that it is specific to linux. Find all other
> functions that also accept tid's and enhance their descriptions (or provide
> stubs) that say they accept tids on Linux.
> * Add the gettid wrapper.

These are the APIs that I know of the expose or use Linux kernel IDs (what you call "tid"):

clone()
gettid()
fcntl() F_SETOWN, F_GETOWN, G_GETOWN_EX, F_SETOWN_EX
get_robust_list()
set_robust_list()
perf_event_open()
sched_getaffinity()
sched_setaffinity()
timer_create()
sched_getaffinity()  
sched_setaffinity()  
sched_getparam() 
sched_setparam()
sched_getscheduler()
sched_setscheduler()
ioprio_set()
ioprio_get()
tgkill()
Comment 25 Nicholas Miell 2014-01-11 00:17:43 IST
Should gettid() actually be exposed to userspace? My suggestion in bug 14300 was the introduction of a pid_t "pthread_gettid_np(pthread_t *thr)" (which, naturally, could take pthread_self() as an argument).

Leaving gettid() hidden and only exposing the pthread function would act as an implicit hint that you really should be using the pthread functions where possible.

As such, acknowledging that the following functions take tids would be a mistake:

sched_getaffinity()
sched_setaffinity()
sched_getparam() 
sched_setparam()
sched_getscheduler()
sched_setscheduler()

because the following functions already exist:

pthread_getaffinity_np() pthread_attr_getaffinity_np()
pthread_setaffinity_np() pthread_attr_setaffinity_np()
pthread_getschedparam() pthread_attr_getschedparam()
pthread_setschedparam() pthread_attr_setschedparam()
pthread_attr_getschedpolicy()
pthread_attr_setschedpolicy()

And ideally the other functions like fcntl() or timer_create() would get small wrappers that that transparently convert a pthread_t to a tid before invoking the system call.
Comment 26 Carlos O'Donell 2014-01-11 03:03:18 IST
(In reply to Rich Felker from comment #23)
> I'm unclear on what you mean by clarifying that it's not a thread id. Is
> this just a matter of distinguishing it from pthread_t? Or are you hesitant
> to establish a permanent one-to-one correlation between threads from the
> application's perspective and threads from the kernel's perspective? The
> idea of implementing POSIX threads with an M:N threading approach is dead
> and buried as far as I can tell; it's largely incompatible with POSIX
> semantics for scheduling and blocking syscalls, unless you add a huge
> userspace emulation layer even uglier than what LinuxThreads was. This is
> not an implementation flaw in glibc or Linux but a fundamental limitation.

The tid as we call it in userspace is actually a kernel pid. I don't know if we want to talk about it at this level, but it is what it is, and many interfaces that take a pid will also accept a tid because they are the same thing. In practice the pid or tid are just identifiers of the kernel task that you are trying to manipulate. All that I am looking for is a consistent use of terminology at a particular level of abstraction. If we decide that "gettid" returns a "Thread id" then that IMO can only be used with interfaces that are documented as accepting a "thread id." I wish there to be no ambiguitiy between pid and tid and pthread_t, and lines should be drawn between them.

Does that make sense?
Comment 27 Michael Kerrisk 2014-01-11 03:41:13 IST
(In reply to Nicholas Miell from comment #25)
> Should gettid() actually be exposed to userspace? My suggestion in bug 14300
> was the introduction of a pid_t "pthread_gettid_np(pthread_t *thr)" (which,
> naturally, could take pthread_self() as an argument).
> 
> Leaving gettid() hidden and only exposing the pthread function would act as
> an implicit hint that you really should be using the pthread functions where
> possible.
> 
> As such, acknowledging that the following functions take tids would be a
> mistake:
> 
> sched_getaffinity()
> sched_setaffinity()
> sched_getparam() 
> sched_setparam()
> sched_getscheduler()
> sched_setscheduler()
> 
> because the following functions already exist:
> 
> pthread_getaffinity_np() pthread_attr_getaffinity_np()
> pthread_setaffinity_np() pthread_attr_setaffinity_np()
> pthread_getschedparam() pthread_attr_getschedparam()
> pthread_setschedparam() pthread_attr_setschedparam()
> pthread_attr_getschedpolicy()
> pthread_attr_setschedpolicy()
> 
> And ideally the other functions like fcntl() or timer_create() would get
> small wrappers that that transparently convert a pthread_t to a tid before
> invoking the system call.

This makes no sense. System calls and the pthreads API are not the same; just as kernel TIDs and Pthreads IDs are not the same. There are legitimate uses of the system calls in applications that want nothing to do with Pthreads.  It happens that there is a one-to-one correspondence between kernel TIDs and Pthreads IDs, but that is an side effect of the NPTL implemntation. Any attempt to force the two IDs to be conflated in the API would be a mistake.
Comment 28 Nicholas Miell 2014-01-11 04:09:11 IST
If the application wants to implement their own threading library, they can implement their own gettid() wrapper.

If the application isn't implementing their own threading library, then the application is using pthreads, which operates on pthread_t, not TIDs.

If the application isn't implementing their own threading library and isn't using pthreads, they it has no reason to ever use TIDs because it only has the one TID which is also the PID.
Comment 29 Justin Lebar 2014-01-11 07:30:12 IST
> If the application wants to 
> implement their own threading 
> library, they can implement 
> their own gettid() wrapper.

They /can/, but are we confusing "can" and "should"?

Maybe my opinion isn't helpful here, but every big system I've worked on (gecko, google), and they all have not one but
Comment 30 Justin Lebar 2014-01-11 07:38:14 IST
(I should have known better than to try to edit Bugzilla on my phone.)

Anyway, I contend that things aren't as black and white as they seem, and even if you don't think they should, the fact remains that systems do make this syscall.  I think it's a library's job to enable that.
Comment 31 Nicholas Miell 2014-01-11 07:41:08 IST
(In reply to Justin Lebar from comment #29)
> They /can/, but are we confusing "can" and "should"?

That's my point. Userspace should be using pthreads. Pthreads operates on pthread_t, not TIDs. TIDs are an internal kernel implementation detail that have unfortunately leaked to userspace in a couple of instances (SIGEV_THREAD_ID, F_SETOWN_EX), and this leakage should be reversed, not expanded.

The userspace interface exposed by glibc doesn't necessarily match the syscall interface (c.f. pipe, cpu_set_t, etc.), and there's no reason increase the mess here.
Comment 32 Gabriele Svelto 2014-01-11 09:16:18 IST
(In reply to Nicholas Miell from comment #28)
> If the application isn't implementing their own threading library, then the
> application is using pthreads, which operates on pthread_t, not TIDs.

The issue here is that the pthread API doesn't support some IMHO basic functionality that is available when manipulating TIDs directly. To give a concrete example, on Linux you can't adjust the priority of a POSIX thread with pthread_setschedparam()/pthread_setpriority() unless it's running on a non-standard scheduler (i.e. anything but SCHED_OTHER). In practice this means that on a regular system without special permissions you simply can't adjust thread priorities relative to each other via the pthread interfaces.

However every process can increase its nice value without requiring special permissions; so since a PID and a TID are the same thing we work around this problem in Gecko by retrieving each thread TID with gettid() and then adjusting their nice values via setpriority() (which accepts a PID).

While this is an ugly hack it works fine and offers a bit of functionality that is currently not available via the pthread interfaces while being well supported by the Linux kernel. This is unfortunate as we're basically keeping around a non-POSIX code path in addition to a regular POSIX path using pthread_setschedparam() which works fine on BSDs for example.
Comment 33 Nicholas Miell 2014-01-11 09:43:13 IST
Your complaint as I understand it is "pthreads on Linux doesn't let me renice invidual threads, so instead of fixing that, gettid() should be exposed to userspace", which doesn't make much sense. :)
Comment 34 Gabriele Svelto 2014-01-11 10:07:40 IST
(In reply to Nicholas Miell from comment #33)
> Your complaint as I understand it is "pthreads on Linux doesn't let me
> renice invidual threads, so instead of fixing that, gettid() should be
> exposed to userspace", which doesn't make much sense. :)

I was just giving a practical example of where we're using gettid() and why. Of course in our case the best solution would be to have pthreads scheduling primitives work properly with SCHED_OTHER but since that doesn't sound like a realistic option in the short run not having to wrap the SYS_gettid syscall ourselves would be already an improvement.

A brief search of SYS_gettid shows quite a few pieces of code wrapping it already in their own gettid() so it might be time to provide this function directly in libc:

http://code.google.com/query/#q=SYS_gettid

I found other, different uses in Gecko too: once the resulting value is used with tgkill() and once as the 'nl_pid' field of a netlink socket address.
Comment 35 Rich Felker 2014-01-11 15:17:36 IST
I have a proposal for making pthread_setschedparam work with SCHED_OTHER: on error, determine if the failure is lack of kernel support for priorities with SCHED_OTHER, and use the legacy setpriority syscall instead.
Comment 36 Michael Kerrisk 2014-01-11 19:48:35 IST
(In reply to Nicholas Miell from comment #31)
> (In reply to Justin Lebar from comment #29)
> Userspace should be using pthreads. 

Wy? Why should glibc be imposing policy on how user space makes use of Linux kernel features? 

Glibc (rightly) exposes many non-POSIX features of the Linux kernel. gettid() is just one more such case. By all means add pthreads wrappers for these various other pieces, if someone wants them. But don't require everyone to use the the pthreads API in order to access TIDs.
Comment 37 Rich Felker 2014-01-11 20:01:01 IST
glibc already requires applications to use the pthreads API to use threads. Attempting to "roll your own" with clone will result in random but serious failures due to glibc's assumption that the thread pointer is valid and that it can find the values it expects in the TCB. I don't think this was ever intended as "imposing policy" but rather just a consequence of the fact that it's HARD to support applications doing things behind the implementation's back.

I'm still a bit undecided as to whether exposing gettid is a good idea, but I don't think avoiding imposing policy about bypassing pthreads is a good argument either way.
Comment 38 Carlos O'Donell 2014-01-12 17:05:45 IST
(In reply to Rich Felker from comment #37)
> glibc already requires applications to use the pthreads API to use threads.
> Attempting to "roll your own" with clone will result in random but serious
> failures due to glibc's assumption that the thread pointer is valid and that
> it can find the values it expects in the TCB. I don't think this was ever
> intended as "imposing policy" but rather just a consequence of the fact that
> it's HARD to support applications doing things behind the implementation's
> back.
> 
> I'm still a bit undecided as to whether exposing gettid is a good idea, but
> I don't think avoiding imposing policy about bypassing pthreads is a good
> argument either way.

Agreed. My guiding principle here is that we need to think the change through and document the changes. I'd like to see interested parties create a glibc wiki page with the design. That way others can add use cases and wrinkles that should get ironed out like the scheduler functions which are POSIX and don't work like POSIX intended.
Comment 39 Jackie Rosen 2014-02-16 17:47:12 IST Comment hidden (spam)
Comment 40 Florian Weimer 2015-10-07 08:02:30 IST
We are committed to a 1:1 threading model because userspace code manipulates task attributes such as CPU affinity or capabilities, and all kinds of things will break if we start switching userspace threads to different (still userspace, obviously) kernel tasks.  (Restoring all those attributes on context switch is not possible for performance reasons.)

This means that Ulrich's objection to adding gettid is no longer valid.
Comment 41 Carlos O'Donell 2015-10-07 14:05:58 IST
(In reply to Florian Weimer from comment #40)
> We are committed to a 1:1 threading model because userspace code manipulates
> task attributes such as CPU affinity or capabilities, and all kinds of
> things will break if we start switching userspace threads to different
> (still userspace, obviously) kernel tasks.  (Restoring all those attributes
> on context switch is not possible for performance reasons.)
> 
> This means that Ulrich's objection to adding gettid is no longer valid.

I've talked about this a bit before, and I'll mention it again for good measure.

The biggest problem is that the design in Linux for this stinks. The reuse of pid_t as a type for tid's is confusing IMO and was a design flaw.

Therefore if we are to carry this out we need to thoroughly think about what it means to expose a tid, should we use a distinct type, what is that type going to look like? Should users be able to convert from pthread_t to to tid_t and vice versa? Do we want new interfaces that take tid_t types for all those things you'd do with native threads or should the interfaces that take pid_t document their support for being called with tid_t values?

Either way, someone needs to start with a coherent design for what this is going to look like.
Comment 42 Michael Kerrisk 2015-10-09 12:38:50 IST
(In reply to Carlos O'Donell from comment #41)
> (In reply to Florian Weimer from comment #40)
> > We are committed to a 1:1 threading model because userspace code manipulates
> > task attributes such as CPU affinity or capabilities, and all kinds of
> > things will break if we start switching userspace threads to different
> > (still userspace, obviously) kernel tasks.  (Restoring all those attributes
> > on context switch is not possible for performance reasons.)
> > 
> > This means that Ulrich's objection to adding gettid is no longer valid.
> 
> I've talked about this a bit before, and I'll mention it again for good
> measure.
> 
> The biggest problem is that the design in Linux for this stinks. The reuse
> of pid_t as a type for tid's is confusing IMO and was a design flaw.
>
> Therefore if we are to carry this out we need to thoroughly think about what
> it means to expose a tid, should we use a distinct type, what is that type
> going to look like? Should users be able to convert from pthread_t to to
> tid_t and vice versa? Do we want new interfaces that take tid_t types for
> all those things you'd do with native threads or should the interfaces that
> take pid_t document their support for being called with tid_t values?

I'm not so convinced about this. 

1. At the kernel level, we have APIs operate on PIDs (e.g., kill()) and APIs that operate on threads (e.g., rt_tgsigqueueinfo()).

2. In kernel thread groups, the so-called thread group leader is the one whose PID is the same its thread ID. That is, these two values are exactly the same. In APIs that operate of threads, specifying the "PID" means operate on the thread-group leader.

3. There is plenty of precedent for the preceding model. We have the session-process group-process hierarchy. Session IDs, process group IDs, and process IDs are all represented using the same type: pid_t. And we have the corresponding notion that a session leader has SID == PID and that a process group leader has PGID == PID.

At the kernel level, there is really only one kind of kernel scheduling entity (LSE) -- commonly called a "task" in Linux parlance. And that one kind of KSE is identified by one kind of data type. Creating an artificial distinction at the glibc level seems illogical and confusing. Furthermore, the clone(2) system call, which creates kernel "threads", returns a thread ID. But really, this is the same for processes: clone() is equally the creator of "processes". And of course, glibc itself already assumes that TIDs and PIDs are the same thing, since nowadays glibc's fork() is a wrapper around clone(), and that wrapper assumes that clone() returns a PID.

In short, I'd say that everything should be a pid_t.

> Either way, someone needs to start with a coherent design for what this is
> going to look like.

The only sane thing to do, AFAICS, is to use pid_t. And on that assumption, much of the design work is greatly simplified.

And yes, for Pthreads applications that want to exploit nonportable Linux pieces, it would be nice to have functions that translate pthread_t <==> kernel-TID.
Comment 43 Carlos O'Donell 2015-10-09 18:52:14 IST
(In reply to Michael Kerrisk from comment #42)
> The only sane thing to do, AFAICS, is to use pid_t. And on that assumption,
> much of the design work is greatly simplified.

It sounds like you have a design in mind, please by all means start a wiki page to document the design. You need not implement it yourself, but if we can get consensus across the board and document which functions need implementing or documenting, then that's a win, and myself or others can implement it. Right now we have no wiki page and not even a strawman design solution.
Comment 44 Nicholas Miell 2015-10-09 19:01:51 IST
Well, if we're going to be making design proposals:
 
PIDs (and TIDs) are inherently racy and every API that uses them is broken by design. No new APIs that use pid_t should be created, all existing should be deprecated.

They should be completely replaced by file descriptors obtained either from clone()'s return value (since fork() can't take flags) or by opening /proc/$PID (at which point you can safely inspect the process's attributes to verify you have a handle to the thing you wanted).
Comment 45 Michael Kerrisk 2015-10-09 19:16:23 IST
(In reply to Nicholas Miell from comment #44)
> Well, if we're going to be making design proposals:
>  
> PIDs (and TIDs) are inherently racy and every API that uses them is broken
> by design. No new APIs that use pid_t should be created, all existing should
> be deprecated.
> 
> They should be completely replaced by file descriptors obtained either from
> clone()'s return value (since fork() can't take flags) or by opening
> /proc/$PID (at which point you can safely inspect the process's attributes
> to verify you have a handle to the thing you wanted).

Nicholas,

I don't disagree with you, but you speak of an ideal world that does not (yet) exist, and proposing to deprecate all of the existing APIs won't fly. Until that ideal world exists, we need a solution to the current problems that applications face. One day, assuming that Josh Triplett's clonefd() work hits Linux mainline, we'll be closer to the ideal.

Thanks,

Michael
Comment 46 Rich Felker 2015-10-09 19:41:53 IST
In response to comment 44, there is nothing racy about tids as long as they are only used from within the process they belong to. A tid's lifetime cannot asynchronously end; only pthread_exit (or SYS_exit at the syscall level) can end it, and this is fully under application control. Process ids, on the other hand, are racy.

In response to comment 42, from a glibc standpoint that's pretty much all Linux implementation details. There is no reason for portable applications that want to use thread-directed sigevent delivery, etc. to care that tids happen to be allocated in a common namespace with pids, that the initial thread's tid is equal to its pid, etc. Adopting these conventions as part of a public interface is an option, but one which potentially constrains future directions, and the pros and cons should be weighed.
Comment 47 Carlos O'Donell 2016-04-08 14:31:26 IST
Note that C++ is considering coroutines, and if implemented in C they would represent a case where we could conceptually have multiple execution contexts within the same OS thread. Exposing the OS tid would further complicate programming using these execution contexts since the tid's would be the same. If we stay with pthread_t, we can at least under the hood have a pthread_t per execution context that might be used to allow some threading operations to be shared between the various execution abstractions. I understand this objection is not a full formed idea, but I wanted to put my recent thoughts down on this issue.

If anything needs fixing it's the interfaces that take a tid. We need SIGEV_PTHREAD_ID, we need new functions to set CPU affinity for threads (Rich mentions he is looking at a fix), and fcntl options for threads.
Comment 48 Florian Weimer 2016-04-08 14:39:46 IST
(In reply to Carlos O'Donell from comment #47)
> Note that C++ is considering coroutines, and if implemented in C they would
> represent a case where we could conceptually have multiple execution
> contexts within the same OS thread. Exposing the OS tid would further
> complicate programming using these execution contexts since the tid's would
> be the same. If we stay with pthread_t, we can at least under the hood have
> a pthread_t per execution context that might be used to allow some threading
> operations to be shared between the various execution abstractions.

I'm afraid this argument isn't valid because we use %fs-relative (or %gs-relative) addressing, instead of loading the TCB from the %fs:0 and addressing relative to that.  As a result, a co-routine switch would have to copy the *entire* TCB as part of the context switch, which would not offer the performance characteristics people expect from coroutines.

We are committed to the kernel thread identity, so we might as well expose it to applications explicitly.

There is also a possibility that coroutines will be implemented as Python-style generators, where only select functions using a new syntactic element can perform context switches, and there aren't any stack switches involved at all or anything the lower-level run-time environment would notice.
Comment 49 Carlos O'Donell 2016-04-08 15:47:05 IST
(In reply to Florian Weimer from comment #48)
> (In reply to Carlos O'Donell from comment #47)
> > Note that C++ is considering coroutines, and if implemented in C they would
> > represent a case where we could conceptually have multiple execution
> > contexts within the same OS thread. Exposing the OS tid would further
> > complicate programming using these execution contexts since the tid's would
> > be the same. If we stay with pthread_t, we can at least under the hood have
> > a pthread_t per execution context that might be used to allow some threading
> > operations to be shared between the various execution abstractions.
> 
> I'm afraid this argument isn't valid because we use %fs-relative (or
> %gs-relative) addressing, instead of loading the TCB from the %fs:0 and
> addressing relative to that.  As a result, a co-routine switch would have to
> copy the *entire* TCB as part of the context switch, which would not offer
> the performance characteristics people expect from coroutines.

This depends on the semantics of TLS access from coroutines. What is relevant today is that the decisions we make here may constrain the implementation of the features defined in the standard, or even the standard itself.

> We are committed to the kernel thread identity, so we might as well expose
> it to applications explicitly.

We should solve problems in ways that leave maximum degrees of flexibility for future implementations.

Just because it is easier to expose gettid() than to fix the routines that now expect tid, doesn't mean it's a good idea in the long term.
 
> There is also a possibility that coroutines will be implemented as
> Python-style generators, where only select functions using a new syntactic
> element can perform context switches, and there aren't any stack switches
> involved at all or anything the lower-level run-time environment would
> notice.

Right, generators would not need anything from the runtime, but they are only one possibility, and it may be that all such options are pursued by the standard.

Again, I think we should solve this problem while still retaining the maximum amount of flexibility for the future.
Comment 50 Florian Weimer 2016-04-12 13:34:44 IST
OS/400 has something similar to gettid:

http://publib.boulder.ibm.com/iseries/v5r1/ic2924/index.htm?info/apis/users_22.htm

However, its IDs are actual IDs, unique across thread death, so arguably more useful than the temporary thread IDs we get from the Linux kernel.