Bug 6399 - gettid() should have a wrapper
Summary: gettid() should have a wrapper
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.30
Assignee: Florian Weimer
URL:
Keywords:
: 14300 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-04-14 13:04 UTC by Michael Kerrisk
Modified: 2021-05-19 20:14 UTC (History)
13 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Kerrisk 2008-04-14 13:04:01 UTC
Currently, glibc does not provide a wrapper for gettid().  Nevertheless, 
There are number of corners of the Linux syscall API where the use of Linux
thread IDs, as returned by gettid(), is essential:

a) the SIGEV_THREAD_ID notification mechanism (a Linux extension) of POSIX
timers (timer_create()). 

b) sched_setaffinity()/sched_getaffinity() can be used to set the CPU affinity
of specific threads.

c) The F_SETOWN and F_GETOWN commands of fcntl() can specify a thread ID.

Given this, it seems time that gettid() should be promoted to full member status
in glibc, and have a wrapper provided.
Comment 1 Ulrich Drepper 2008-04-14 14:03:36 UTC
Never.  No program must ever assume that a thread runs on the same kernel thread
all the time.
Comment 2 Michael Kerrisk 2008-04-14 14:45:30 UTC
Point taken, I suppose.  However, programs that make use of the features I 
mention currently have no real choice[*] other than to make this assumption 
(which of course has held true with glibc since gettid() first appeared). 

[*] Well, I suppose they do have a choice: bypass the use of Pthreads 
altogether and use direct calls to clone(), but that's not a very palatable 
choice.
Comment 3 Michael Kerrisk 2008-04-16 10:36:38 UTC
> Never.  No program must ever assume that a thread runs on the same
> kernel thread all the time.

Looking at nptl/sysdeps/unix/sysv/linux/raise.c it certainly appears that any 
program that is statically linked against glibc embeds this assumption (or it 
contains a race).
Comment 4 Michael Kerrisk 2012-03-28 23:20:22 UTC
Reopening this for reconsideration. various parts of the kernel/glibc API need kernel thread IDs. gettid() should be promoted to full member status
in glibc.
Comment 5 Rich Felker 2012-11-05 20:56:55 UTC
Michael, your argument about static linking is invalid; it is not the application embedding the assumption, it's part of the implementation embedding the assumption. Being static-linked, both lie in the same file.

With that said, I think there is at least _some_ merit to the idea of exposing kernel TIDs to applications, since the time when thread implementations without a one-to-one correspondence between kernel threads and userspace threads seemed viable is long-past. (It is virtually impossible for such implementations to meet all the requirements of POSIX regarding scheduling, signals, cancellation, blocking syscalls, etc., and they have no benefits with regard to performance.)

Such an interface (gettid) should of course not be part of the general glibc API, but if exposed, would be a linux-specific function, like numerous other linux extensions, for use only with interfaces like SIGEV_THREAD_ID, fcntl, etc., and should be clearly documented as non-portable.
Comment 6 Michael Kerrisk 2013-02-03 22:21:05 UTC
So, just as further background, the syscalls that are already exposed by glibc and that (can) make use of kernel thread IDs are at least the following:

capget(2), fcntl(2), get_robust_list(2), ioprio_set(2), sched_setaffinity(2),  sched_setparam(2), sched_setscheduler(2), timer_create(2)
Comment 7 Rich Felker 2013-02-03 22:55:35 UTC
That list is incomplete. ALL of the sched_* functions take tids, not pids. See bug #15088. I suspect there are more functions affected too.
Comment 8 Michael Kerrisk 2013-02-03 23:11:05 UTC
> That list is incomplete.

Agreed. (That's why I said "at least"). 

Obviously also all of the sched_get* analogues of the sched_set*() I listed. AFAICT, though not 100% sure, the following belong in the list also:

kcmp(2),
move_pages(2),
migrate_pages(2),
process_vm_readv(2),
process_vm_writev(2),
ptrace (2)
Comment 9 desrt 2013-02-08 21:19:52 UTC
I'd argue that this should not be a syscall wrapper.  The libc can get this information out of the TLS segment _very_ quickly.
Comment 10 Michael Kerrisk 2013-02-08 23:25:45 UTC
> I'd argue that this should not be a syscall wrapper.  The libc 
> can get this information out of the TLS segment _very_ quickly.

Given the unpleasant consequences that have resulted from PID caching for getpid(), I'm sceptical that this is a good idea. At the very least, I wonder if there is some subtle consequence that will bite us later.
Comment 11 Rich Felker 2013-02-09 00:01:04 UTC
If an application is calling gettid frequently enough for performance to matter, it's probably doing something very wrong anyway.

With that said, most of the problems with caching pid/tid come from use of clone() (or worse, vfork) directly by applications, which should probably not be a supported use. With TLS being a mandatory feature in modern glibc and the thread-pointer being always-initialized for purposes like ssp, I don't think there's any way applications can safely clone, whereby "safely" I mean "without a risk that internal libc state is inconsistent afterwards".

Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary, but I don't think it would be a problem, either.
Comment 12 desrt 2013-02-09 00:19:16 UTC
I'm trying to roll my own recursive mutex.
Comment 13 Rich Felker 2013-02-09 00:40:09 UTC
In that case, you can always do the caching yourself:

int my_caching_gettid()
{
    _Thread_local int tid;
    return tid ? tid : gettid();
}

This might be mildly slower than having glibc do it just because of which TLS model gets used; whether that matters would require some measurement, I think.
Comment 14 desrt 2013-02-09 00:41:43 UTC
I did the benchmarking on various tricks to get around this... unfortunately, TLS from shared libraries is quite slow.
Comment 15 Michael Kerrisk 2013-02-09 02:06:11 UTC
(In reply to comment #11)

> With that said, most of the problems with caching pid/tid come from use of
> clone() (or worse, vfork) directly by applications, 

Not just that. Also, the caching of PIDs forced some limitations on how PID namespaces could be implemented, AFAIK.

> Anyway, I'm pretty indifferent on tid caching -- I don't see it as necessary,
> but I don't think it would be a problem, either.

Given the history, I'd say caution is the best approach--i.e., don't cache.
Comment 16 Rich Felker 2013-02-09 02:38:55 UTC
I don't see how caching could have any effect on namespaces. Any application is able (and entitled) to store its own pid and assume that remains constant for the lifetime of the process. Whether this happens in application-level code or libc-level code is rather irrelevant.
Comment 17 Michael Kerrisk 2013-02-09 02:53:06 UTC
(In reply to comment #16)
> I don't see how caching could have any effect on namespaces. Any application is
> able (and entitled) to store its own pid and assume that remains constant for
> the lifetime of the process. Whether this happens in application-level code or
> libc-level code is rather irrelevant.

So, a possible implementation of PID namespaces would have allowed setns() to change the caller's PID namespace, which in effect would change the caller's PID. Of course, this is not done. Instead, setns() into a PID namespace only changes the PID namespace of children subsequently created by the caller. 

One of the cited reasons that setns() didn't change the PID namespace of the caller is because glibc caches PIDs, and the result of getpid() would thus no longer be correct. 

Now, you could say that the issue equally affects the application itself, but there is a difference: if an application calls setns(), then it would know (in that alternative implementation model) that its PID was about to change and that any PID that *it* had cached was now invalid.
Comment 18 Rich Felker 2013-02-09 03:24:31 UTC
I still maintain that it's a broken design for setns to change the caller's pid. A pid is a fundamentally invariant property of a process. Even if the _application_ knew its pid would change as a result of calling setns, it could be linked to any number of non-libc libraries which are entitled to make the assumption that pids are an invariant property of the process. If the pid were to change when setns is called, then the only valid action after setns should be calling an exec-family function or _exit.
Comment 19 Michael Kerrisk 2013-02-09 23:04:20 UTC
(In reply to comment #18)
> I still maintain that it's a broken design for setns to change the caller's
> pid. A pid is a fundamentally invariant property of a process.

Rich, 

I think what you mean is: "this is the way it's always been done". But this was not handed to us on stone tablets. Linux has already changed a lot of old assumptions in favor useful innovations. We can argue endlessly about whether or not the alternative that I talked about it is a broken design. I'm actually fairly agnostic on that point, but my bottom line point is that glibc effectively imposed policy on kernel user space (i.e., "PIDs are invariant"), and I think that was A Bad Idea (TM) for a library that provides the fundamental plumbing from user space to the kernel.
Comment 20 Justin Lebar 2013-09-21 00:14:59 UTC
> the syscalls that are already exposed by glibc and that (can) make use of 
> kernel thread IDs are at least the following:

FWIW {get,set}priority() also take tid's on Linux.  It's very hard to use these correctly in a multithread environment, but it's basically impossible to do so without gettid().  :)
Comment 21 Ondrej Bilka 2013-10-09 19:31:59 UTC
*** Bug 14300 has been marked as a duplicate of this bug. ***
Comment 22 Carlos O'Donell 2014-01-10 20:44:58 UTC
This issue was raised again recently and I'm not opposed to it, but we would need to very carefully describe what "tid" means e.g. task id (not thread) and enhance the documentation to describe which API functions accept tid's.

To be clear the next step is two things:
* Add documentation for gettid, describe what a tid *is*, and it's properties and the fact that it is specific to linux. Find all other functions that also accept tid's and enhance their descriptions (or provide stubs) that say they accept tids on Linux.
* Add the gettid wrapper.
Comment 23 Rich Felker 2014-01-10 20:56:24 UTC
I'm unclear on what you mean by clarifying that it's not a thread id. Is this just a matter of distinguishing it from pthread_t? Or are you hesitant to establish a permanent one-to-one correlation between threads from the application's perspective and threads from the kernel's perspective? The idea of implementing POSIX threads with an M:N threading approach is dead and buried as far as I can tell; it's largely incompatible with POSIX semantics for scheduling and blocking syscalls, unless you add a huge userspace emulation layer even uglier than what LinuxThreads was. This is not an implementation flaw in glibc or Linux but a fundamental limitation.

I think it's perfectly reasonable to add gettid, documenting it as Linux-specific, and documenting that each thread (in the POSIX or C11 sense) has a corresponding kernelspace identifier, in the form of a 32-bit positive signed integer, used for certain Linux-specific features. In addition, since this identifier is guaranteed to be unique per thread, it may be used by applications implementing their own synchronization mechanisms via atomics and futex. (Note that pthread_t is not useful for this purpose, since it may be 64-bit, whereas futex only supports 32-bit values.)
Comment 24 Michael Kerrisk 2014-01-10 23:52:51 UTC
(In reply to Carlos O'Donell from comment #22)
> This issue was raised again recently and I'm not opposed to it, but we would
> need to very carefully describe what "tid" means e.g. task id (not thread)
> and enhance the documentation to describe which API functions accept tid's.
> 
> To be clear the next step is two things:
> * Add documentation for gettid, describe what a tid *is*, and it's
> properties and the fact that it is specific to linux. Find all other
> functions that also accept tid's and enhance their descriptions (or provide
> stubs) that say they accept tids on Linux.
> * Add the gettid wrapper.

These are the APIs that I know of the expose or use Linux kernel IDs (what you call "tid"):

clone()
gettid()
fcntl() F_SETOWN, F_GETOWN, G_GETOWN_EX, F_SETOWN_EX
get_robust_list()
set_robust_list()
perf_event_open()
sched_getaffinity()
sched_setaffinity()
timer_create()
sched_getaffinity()  
sched_setaffinity()  
sched_getparam() 
sched_setparam()
sched_getscheduler()
sched_setscheduler()
ioprio_set()
ioprio_get()
tgkill()
Comment 25 Nicholas Miell 2014-01-11 00:17:43 UTC
Should gettid() actually be exposed to userspace? My suggestion in bug 14300 was the introduction of a pid_t "pthread_gettid_np(pthread_t *thr)" (which, naturally, could take pthread_self() as an argument).

Leaving gettid() hidden and only exposing the pthread function would act as an implicit hint that you really should be using the pthread functions where possible.

As such, acknowledging that the following functions take tids would be a mistake:

sched_getaffinity()
sched_setaffinity()
sched_getparam() 
sched_setparam()
sched_getscheduler()
sched_setscheduler()

because the following functions already exist:

pthread_getaffinity_np() pthread_attr_getaffinity_np()
pthread_setaffinity_np() pthread_attr_setaffinity_np()
pthread_getschedparam() pthread_attr_getschedparam()
pthread_setschedparam() pthread_attr_setschedparam()
pthread_attr_getschedpolicy()
pthread_attr_setschedpolicy()

And ideally the other functions like fcntl() or timer_create() would get small wrappers that that transparently convert a pthread_t to a tid before invoking the system call.
Comment 26 Carlos O'Donell 2014-01-11 03:03:18 UTC
(In reply to Rich Felker from comment #23)
> I'm unclear on what you mean by clarifying that it's not a thread id. Is
> this just a matter of distinguishing it from pthread_t? Or are you hesitant
> to establish a permanent one-to-one correlation between threads from the
> application's perspective and threads from the kernel's perspective? The
> idea of implementing POSIX threads with an M:N threading approach is dead
> and buried as far as I can tell; it's largely incompatible with POSIX
> semantics for scheduling and blocking syscalls, unless you add a huge
> userspace emulation layer even uglier than what LinuxThreads was. This is
> not an implementation flaw in glibc or Linux but a fundamental limitation.

The tid as we call it in userspace is actually a kernel pid. I don't know if we want to talk about it at this level, but it is what it is, and many interfaces that take a pid will also accept a tid because they are the same thing. In practice the pid or tid are just identifiers of the kernel task that you are trying to manipulate. All that I am looking for is a consistent use of terminology at a particular level of abstraction. If we decide that "gettid" returns a "Thread id" then that IMO can only be used with interfaces that are documented as accepting a "thread id." I wish there to be no ambiguitiy between pid and tid and pthread_t, and lines should be drawn between them.

Does that make sense?
Comment 27 Michael Kerrisk 2014-01-11 03:41:13 UTC
(In reply to Nicholas Miell from comment #25)
> Should gettid() actually be exposed to userspace? My suggestion in bug 14300
> was the introduction of a pid_t "pthread_gettid_np(pthread_t *thr)" (which,
> naturally, could take pthread_self() as an argument).
> 
> Leaving gettid() hidden and only exposing the pthread function would act as
> an implicit hint that you really should be using the pthread functions where
> possible.
> 
> As such, acknowledging that the following functions take tids would be a
> mistake:
> 
> sched_getaffinity()
> sched_setaffinity()
> sched_getparam() 
> sched_setparam()
> sched_getscheduler()
> sched_setscheduler()
> 
> because the following functions already exist:
> 
> pthread_getaffinity_np() pthread_attr_getaffinity_np()
> pthread_setaffinity_np() pthread_attr_setaffinity_np()
> pthread_getschedparam() pthread_attr_getschedparam()
> pthread_setschedparam() pthread_attr_setschedparam()
> pthread_attr_getschedpolicy()
> pthread_attr_setschedpolicy()
> 
> And ideally the other functions like fcntl() or timer_create() would get
> small wrappers that that transparently convert a pthread_t to a tid before
> invoking the system call.

This makes no sense. System calls and the pthreads API are not the same; just as kernel TIDs and Pthreads IDs are not the same. There are legitimate uses of the system calls in applications that want nothing to do with Pthreads.  It happens that there is a one-to-one correspondence between kernel TIDs and Pthreads IDs, but that is an side effect of the NPTL implemntation. Any attempt to force the two IDs to be conflated in the API would be a mistake.
Comment 28 Nicholas Miell 2014-01-11 04:09:11 UTC
If the application wants to implement their own threading library, they can implement their own gettid() wrapper.

If the application isn't implementing their own threading library, then the application is using pthreads, which operates on pthread_t, not TIDs.

If the application isn't implementing their own threading library and isn't using pthreads, they it has no reason to ever use TIDs because it only has the one TID which is also the PID.
Comment 29 Justin Lebar 2014-01-11 07:30:12 UTC
> If the application wants to 
> implement their own threading 
> library, they can implement 
> their own gettid() wrapper.

They /can/, but are we confusing "can" and "should"?

Maybe my opinion isn't helpful here, but every big system I've worked on (gecko, google), and they all have not one but
Comment 30 Justin Lebar 2014-01-11 07:38:14 UTC
(I should have known better than to try to edit Bugzilla on my phone.)

Anyway, I contend that things aren't as black and white as they seem, and even if you don't think they should, the fact remains that systems do make this syscall.  I think it's a library's job to enable that.
Comment 31 Nicholas Miell 2014-01-11 07:41:08 UTC
(In reply to Justin Lebar from comment #29)
> They /can/, but are we confusing "can" and "should"?

That's my point. Userspace should be using pthreads. Pthreads operates on pthread_t, not TIDs. TIDs are an internal kernel implementation detail that have unfortunately leaked to userspace in a couple of instances (SIGEV_THREAD_ID, F_SETOWN_EX), and this leakage should be reversed, not expanded.

The userspace interface exposed by glibc doesn't necessarily match the syscall interface (c.f. pipe, cpu_set_t, etc.), and there's no reason increase the mess here.
Comment 32 Gabriele Svelto 2014-01-11 09:16:18 UTC
(In reply to Nicholas Miell from comment #28)
> If the application isn't implementing their own threading library, then the
> application is using pthreads, which operates on pthread_t, not TIDs.

The issue here is that the pthread API doesn't support some IMHO basic functionality that is available when manipulating TIDs directly. To give a concrete example, on Linux you can't adjust the priority of a POSIX thread with pthread_setschedparam()/pthread_setpriority() unless it's running on a non-standard scheduler (i.e. anything but SCHED_OTHER). In practice this means that on a regular system without special permissions you simply can't adjust thread priorities relative to each other via the pthread interfaces.

However every process can increase its nice value without requiring special permissions; so since a PID and a TID are the same thing we work around this problem in Gecko by retrieving each thread TID with gettid() and then adjusting their nice values via setpriority() (which accepts a PID).

While this is an ugly hack it works fine and offers a bit of functionality that is currently not available via the pthread interfaces while being well supported by the Linux kernel. This is unfortunate as we're basically keeping around a non-POSIX code path in addition to a regular POSIX path using pthread_setschedparam() which works fine on BSDs for example.
Comment 33 Nicholas Miell 2014-01-11 09:43:13 UTC
Your complaint as I understand it is "pthreads on Linux doesn't let me renice invidual threads, so instead of fixing that, gettid() should be exposed to userspace", which doesn't make much sense. :)
Comment 34 Gabriele Svelto 2014-01-11 10:07:40 UTC
(In reply to Nicholas Miell from comment #33)
> Your complaint as I understand it is "pthreads on Linux doesn't let me
> renice invidual threads, so instead of fixing that, gettid() should be
> exposed to userspace", which doesn't make much sense. :)

I was just giving a practical example of where we're using gettid() and why. Of course in our case the best solution would be to have pthreads scheduling primitives work properly with SCHED_OTHER but since that doesn't sound like a realistic option in the short run not having to wrap the SYS_gettid syscall ourselves would be already an improvement.

A brief search of SYS_gettid shows quite a few pieces of code wrapping it already in their own gettid() so it might be time to provide this function directly in libc:

http://code.google.com/query/#q=SYS_gettid

I found other, different uses in Gecko too: once the resulting value is used with tgkill() and once as the 'nl_pid' field of a netlink socket address.
Comment 35 Rich Felker 2014-01-11 15:17:36 UTC
I have a proposal for making pthread_setschedparam work with SCHED_OTHER: on error, determine if the failure is lack of kernel support for priorities with SCHED_OTHER, and use the legacy setpriority syscall instead.
Comment 36 Michael Kerrisk 2014-01-11 19:48:35 UTC
(In reply to Nicholas Miell from comment #31)
> (In reply to Justin Lebar from comment #29)
> Userspace should be using pthreads. 

Wy? Why should glibc be imposing policy on how user space makes use of Linux kernel features? 

Glibc (rightly) exposes many non-POSIX features of the Linux kernel. gettid() is just one more such case. By all means add pthreads wrappers for these various other pieces, if someone wants them. But don't require everyone to use the the pthreads API in order to access TIDs.
Comment 37 Rich Felker 2014-01-11 20:01:01 UTC
glibc already requires applications to use the pthreads API to use threads. Attempting to "roll your own" with clone will result in random but serious failures due to glibc's assumption that the thread pointer is valid and that it can find the values it expects in the TCB. I don't think this was ever intended as "imposing policy" but rather just a consequence of the fact that it's HARD to support applications doing things behind the implementation's back.

I'm still a bit undecided as to whether exposing gettid is a good idea, but I don't think avoiding imposing policy about bypassing pthreads is a good argument either way.
Comment 38 Carlos O'Donell 2014-01-12 17:05:45 UTC
(In reply to Rich Felker from comment #37)
> glibc already requires applications to use the pthreads API to use threads.
> Attempting to "roll your own" with clone will result in random but serious
> failures due to glibc's assumption that the thread pointer is valid and that
> it can find the values it expects in the TCB. I don't think this was ever
> intended as "imposing policy" but rather just a consequence of the fact that
> it's HARD to support applications doing things behind the implementation's
> back.
> 
> I'm still a bit undecided as to whether exposing gettid is a good idea, but
> I don't think avoiding imposing policy about bypassing pthreads is a good
> argument either way.

Agreed. My guiding principle here is that we need to think the change through and document the changes. I'd like to see interested parties create a glibc wiki page with the design. That way others can add use cases and wrinkles that should get ironed out like the scheduler functions which are POSIX and don't work like POSIX intended.
Comment 39 Jackie Rosen 2014-02-16 17:47:12 UTC Comment hidden (spam)
Comment 40 Florian Weimer 2015-10-07 08:02:30 UTC
We are committed to a 1:1 threading model because userspace code manipulates task attributes such as CPU affinity or capabilities, and all kinds of things will break if we start switching userspace threads to different (still userspace, obviously) kernel tasks.  (Restoring all those attributes on context switch is not possible for performance reasons.)

This means that Ulrich's objection to adding gettid is no longer valid.
Comment 41 Carlos O'Donell 2015-10-07 14:05:58 UTC
(In reply to Florian Weimer from comment #40)
> We are committed to a 1:1 threading model because userspace code manipulates
> task attributes such as CPU affinity or capabilities, and all kinds of
> things will break if we start switching userspace threads to different
> (still userspace, obviously) kernel tasks.  (Restoring all those attributes
> on context switch is not possible for performance reasons.)
> 
> This means that Ulrich's objection to adding gettid is no longer valid.

I've talked about this a bit before, and I'll mention it again for good measure.

The biggest problem is that the design in Linux for this stinks. The reuse of pid_t as a type for tid's is confusing IMO and was a design flaw.

Therefore if we are to carry this out we need to thoroughly think about what it means to expose a tid, should we use a distinct type, what is that type going to look like? Should users be able to convert from pthread_t to to tid_t and vice versa? Do we want new interfaces that take tid_t types for all those things you'd do with native threads or should the interfaces that take pid_t document their support for being called with tid_t values?

Either way, someone needs to start with a coherent design for what this is going to look like.
Comment 42 Michael Kerrisk 2015-10-09 12:38:50 UTC
(In reply to Carlos O'Donell from comment #41)
> (In reply to Florian Weimer from comment #40)
> > We are committed to a 1:1 threading model because userspace code manipulates
> > task attributes such as CPU affinity or capabilities, and all kinds of
> > things will break if we start switching userspace threads to different
> > (still userspace, obviously) kernel tasks.  (Restoring all those attributes
> > on context switch is not possible for performance reasons.)
> > 
> > This means that Ulrich's objection to adding gettid is no longer valid.
> 
> I've talked about this a bit before, and I'll mention it again for good
> measure.
> 
> The biggest problem is that the design in Linux for this stinks. The reuse
> of pid_t as a type for tid's is confusing IMO and was a design flaw.
>
> Therefore if we are to carry this out we need to thoroughly think about what
> it means to expose a tid, should we use a distinct type, what is that type
> going to look like? Should users be able to convert from pthread_t to to
> tid_t and vice versa? Do we want new interfaces that take tid_t types for
> all those things you'd do with native threads or should the interfaces that
> take pid_t document their support for being called with tid_t values?

I'm not so convinced about this. 

1. At the kernel level, we have APIs operate on PIDs (e.g., kill()) and APIs that operate on threads (e.g., rt_tgsigqueueinfo()).

2. In kernel thread groups, the so-called thread group leader is the one whose PID is the same its thread ID. That is, these two values are exactly the same. In APIs that operate of threads, specifying the "PID" means operate on the thread-group leader.

3. There is plenty of precedent for the preceding model. We have the session-process group-process hierarchy. Session IDs, process group IDs, and process IDs are all represented using the same type: pid_t. And we have the corresponding notion that a session leader has SID == PID and that a process group leader has PGID == PID.

At the kernel level, there is really only one kind of kernel scheduling entity (LSE) -- commonly called a "task" in Linux parlance. And that one kind of KSE is identified by one kind of data type. Creating an artificial distinction at the glibc level seems illogical and confusing. Furthermore, the clone(2) system call, which creates kernel "threads", returns a thread ID. But really, this is the same for processes: clone() is equally the creator of "processes". And of course, glibc itself already assumes that TIDs and PIDs are the same thing, since nowadays glibc's fork() is a wrapper around clone(), and that wrapper assumes that clone() returns a PID.

In short, I'd say that everything should be a pid_t.

> Either way, someone needs to start with a coherent design for what this is
> going to look like.

The only sane thing to do, AFAICS, is to use pid_t. And on that assumption, much of the design work is greatly simplified.

And yes, for Pthreads applications that want to exploit nonportable Linux pieces, it would be nice to have functions that translate pthread_t <==> kernel-TID.
Comment 43 Carlos O'Donell 2015-10-09 18:52:14 UTC
(In reply to Michael Kerrisk from comment #42)
> The only sane thing to do, AFAICS, is to use pid_t. And on that assumption,
> much of the design work is greatly simplified.

It sounds like you have a design in mind, please by all means start a wiki page to document the design. You need not implement it yourself, but if we can get consensus across the board and document which functions need implementing or documenting, then that's a win, and myself or others can implement it. Right now we have no wiki page and not even a strawman design solution.
Comment 44 Nicholas Miell 2015-10-09 19:01:51 UTC
Well, if we're going to be making design proposals:
 
PIDs (and TIDs) are inherently racy and every API that uses them is broken by design. No new APIs that use pid_t should be created, all existing should be deprecated.

They should be completely replaced by file descriptors obtained either from clone()'s return value (since fork() can't take flags) or by opening /proc/$PID (at which point you can safely inspect the process's attributes to verify you have a handle to the thing you wanted).
Comment 45 Michael Kerrisk 2015-10-09 19:16:23 UTC
(In reply to Nicholas Miell from comment #44)
> Well, if we're going to be making design proposals:
>  
> PIDs (and TIDs) are inherently racy and every API that uses them is broken
> by design. No new APIs that use pid_t should be created, all existing should
> be deprecated.
> 
> They should be completely replaced by file descriptors obtained either from
> clone()'s return value (since fork() can't take flags) or by opening
> /proc/$PID (at which point you can safely inspect the process's attributes
> to verify you have a handle to the thing you wanted).

Nicholas,

I don't disagree with you, but you speak of an ideal world that does not (yet) exist, and proposing to deprecate all of the existing APIs won't fly. Until that ideal world exists, we need a solution to the current problems that applications face. One day, assuming that Josh Triplett's clonefd() work hits Linux mainline, we'll be closer to the ideal.

Thanks,

Michael
Comment 46 Rich Felker 2015-10-09 19:41:53 UTC
In response to comment 44, there is nothing racy about tids as long as they are only used from within the process they belong to. A tid's lifetime cannot asynchronously end; only pthread_exit (or SYS_exit at the syscall level) can end it, and this is fully under application control. Process ids, on the other hand, are racy.

In response to comment 42, from a glibc standpoint that's pretty much all Linux implementation details. There is no reason for portable applications that want to use thread-directed sigevent delivery, etc. to care that tids happen to be allocated in a common namespace with pids, that the initial thread's tid is equal to its pid, etc. Adopting these conventions as part of a public interface is an option, but one which potentially constrains future directions, and the pros and cons should be weighed.
Comment 47 Carlos O'Donell 2016-04-08 14:31:26 UTC
Note that C++ is considering coroutines, and if implemented in C they would represent a case where we could conceptually have multiple execution contexts within the same OS thread. Exposing the OS tid would further complicate programming using these execution contexts since the tid's would be the same. If we stay with pthread_t, we can at least under the hood have a pthread_t per execution context that might be used to allow some threading operations to be shared between the various execution abstractions. I understand this objection is not a full formed idea, but I wanted to put my recent thoughts down on this issue.

If anything needs fixing it's the interfaces that take a tid. We need SIGEV_PTHREAD_ID, we need new functions to set CPU affinity for threads (Rich mentions he is looking at a fix), and fcntl options for threads.
Comment 48 Florian Weimer 2016-04-08 14:39:46 UTC
(In reply to Carlos O'Donell from comment #47)
> Note that C++ is considering coroutines, and if implemented in C they would
> represent a case where we could conceptually have multiple execution
> contexts within the same OS thread. Exposing the OS tid would further
> complicate programming using these execution contexts since the tid's would
> be the same. If we stay with pthread_t, we can at least under the hood have
> a pthread_t per execution context that might be used to allow some threading
> operations to be shared between the various execution abstractions.

I'm afraid this argument isn't valid because we use %fs-relative (or %gs-relative) addressing, instead of loading the TCB from the %fs:0 and addressing relative to that.  As a result, a co-routine switch would have to copy the *entire* TCB as part of the context switch, which would not offer the performance characteristics people expect from coroutines.

We are committed to the kernel thread identity, so we might as well expose it to applications explicitly.

There is also a possibility that coroutines will be implemented as Python-style generators, where only select functions using a new syntactic element can perform context switches, and there aren't any stack switches involved at all or anything the lower-level run-time environment would notice.
Comment 49 Carlos O'Donell 2016-04-08 15:47:05 UTC
(In reply to Florian Weimer from comment #48)
> (In reply to Carlos O'Donell from comment #47)
> > Note that C++ is considering coroutines, and if implemented in C they would
> > represent a case where we could conceptually have multiple execution
> > contexts within the same OS thread. Exposing the OS tid would further
> > complicate programming using these execution contexts since the tid's would
> > be the same. If we stay with pthread_t, we can at least under the hood have
> > a pthread_t per execution context that might be used to allow some threading
> > operations to be shared between the various execution abstractions.
> 
> I'm afraid this argument isn't valid because we use %fs-relative (or
> %gs-relative) addressing, instead of loading the TCB from the %fs:0 and
> addressing relative to that.  As a result, a co-routine switch would have to
> copy the *entire* TCB as part of the context switch, which would not offer
> the performance characteristics people expect from coroutines.

This depends on the semantics of TLS access from coroutines. What is relevant today is that the decisions we make here may constrain the implementation of the features defined in the standard, or even the standard itself.

> We are committed to the kernel thread identity, so we might as well expose
> it to applications explicitly.

We should solve problems in ways that leave maximum degrees of flexibility for future implementations.

Just because it is easier to expose gettid() than to fix the routines that now expect tid, doesn't mean it's a good idea in the long term.
 
> There is also a possibility that coroutines will be implemented as
> Python-style generators, where only select functions using a new syntactic
> element can perform context switches, and there aren't any stack switches
> involved at all or anything the lower-level run-time environment would
> notice.

Right, generators would not need anything from the runtime, but they are only one possibility, and it may be that all such options are pursued by the standard.

Again, I think we should solve this problem while still retaining the maximum amount of flexibility for the future.
Comment 50 Florian Weimer 2016-04-12 13:34:44 UTC
OS/400 has something similar to gettid:

http://publib.boulder.ibm.com/iseries/v5r1/ic2924/index.htm?info/apis/users_22.htm

However, its IDs are actual IDs, unique across thread death, so arguably more useful than the temporary thread IDs we get from the Linux kernel.
Comment 51 Stas Sergeev 2017-02-03 09:43:17 UTC
(In reply to Carlos O'Donell from comment #43)
> It sounds like you have a design in mind, please by all means start a wiki

How about the following design proposal:
just provide the raw gettid wrapper via some
system-specific header (like sys/* or asm/* I guess)
and keep thinking of further proposals about
making things generic.
By not exporting the raw wrapper you only force
people to use syscall(SYS_gettid), which is not
good at all. In the mean time, a wrapper exported
via system-specific headers does not obligate you
to any design restrictions or even doc writing.
Comment 52 Kevin Flynn 2019-01-26 16:00:00 UTC
After having stumbled upon this bug report while working on some code, I decided perhaps I should add my 2 cents to the discussion. It is a technical 2 cents, not a social science 2 cents.

The question I was seeking an answer to is simply:

"How can a thread determine if it holds a mutex lock ?" The solution I settled upon is an "error checking mutex", however, I was looking at gettid(), pthread_self(), and there seems to be missing functionality that the gettid() syscall solves.

From what I could find in the documentation, pthread_self() does not define an invalid value constant, nor does it define the result if the main process or a single threaded process calls it. It appears to only be valid if called within the context of a thread created by pthread_create().

One can reasonbly infer from the documentation of other functions that return a pid_t value such as fork(), that getpid() and gettid() will never return a negative process id or thread id as a valid value, so any negative value could serve as an invalid value constant. ( #define INVALID_TID -1 )

Use of the gettid() system call seems to provide the following useful functions that posix threads functions do not:

1. a reliable invalid thread id constant ( any negative value )
2. a value that can distinguish threads from the main process

Cheers,
Kevin Flynn.
Comment 53 Florian Weimer 2019-01-28 09:27:55 UTC
(In reply to Kevin Flynn from comment #52)
> From what I could find in the documentation, pthread_self() does not define
> an invalid value constant, nor does it define the result if the main process
> or a single threaded process calls it. It appears to only be valid if called
> within the context of a thread created by pthread_create().

That's not a problem: glibc only supports threads created by pthread_create.  Other threads will not have a valid TCB and depending on how glibc is compiled, literally no function will work properly in this case.
Comment 54 Kevin Flynn 2019-01-28 14:26:31 UTC
(In reply to Florian Weimer from comment #53)
> (In reply to Kevin Flynn from comment #52)
> > From what I could find in the documentation, pthread_self() does not define
> > an invalid value constant, nor does it define the result if the main process
> > or a single threaded process calls it. It appears to only be valid if called
> > within the context of a thread created by pthread_create().
> 
> That's not a problem: glibc only supports threads created by pthread_create.
> Other threads will not have a valid TCB and depending on how glibc is
> compiled, literally no function will work properly in this case.

You seem to have responded to supporting commentary, as opposed to the main point of my post. Also, you say, its not a problem, because of some arbitrary human decision, as opposed to, an attribute of the computing devices we write code for. I'm going to assume you didn't spend much time contemplating my post. Thanks.
Comment 55 Carlos O'Donell 2019-01-28 15:37:14 UTC
(In reply to Kevin Flynn from comment #54)
> (In reply to Florian Weimer from comment #53)
> > (In reply to Kevin Flynn from comment #52)
> > > From what I could find in the documentation, pthread_self() does not define
> > > an invalid value constant, nor does it define the result if the main process
> > > or a single threaded process calls it. It appears to only be valid if called
> > > within the context of a thread created by pthread_create().
> > 
> > That's not a problem: glibc only supports threads created by pthread_create.
> > Other threads will not have a valid TCB and depending on how glibc is
> > compiled, literally no function will work properly in this case.
> 
> You seem to have responded to supporting commentary, as opposed to the main
> point of my post. Also, you say, its not a problem, because of some
> arbitrary human decision, as opposed to, an attribute of the computing
> devices we write code for. I'm going to assume you didn't spend much time
> contemplating my post. Thanks.

I'm sorry, you feel that way. We really do try to respond to the questions asked by those commenting on these issues.

You ask about two specific issues which, as Florian points out, are not problems  if you are only using POSIX Threads or C threads from glibc to create and control threads.

1. a reliable invalid thread id constant ( any negative value )

- There is never an invalid thread id (pthread_t or otherwise). The caller of gettid / pthread_self is always validly executing when it calls these functions and will never get an invalid value. If you want to check for an invalid value you can start a detached non-joinable thread and use that thread's value to indicate "invalid" if you need that kind of constraint in your design.

2. a value that can distinguish threads from the main process

- All of the threads are part of the main "process", the main thread, or the first thread is pretty close to a normal thread. We want you to treat it just like any other thread. For example it can call pthread_exit() just like a thread to leave all the other threads running.

You also mention pthread_self() in the context of main or a single-threaded process:
~~~
It appears to only be valid if called within the context of a thread created by pthread_create().
~~~
Please see:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_self.html

POSIX places no restrictions on the API. You can use pthread_self() from main() in glibc, and it works and is designed to work that way. If you read something contradictory from 'man pthread_self', please feel free to send a patch to the linux man pages project to add a clarification.

The main() function is always executed within a thread of execution, that is it has it's own control flow and resources etc. and so constitutes a thread.

Does that answer your questions?
Comment 56 Kevin Flynn 2019-01-28 17:42:14 UTC
(In reply to Carlos O'Donell from comment #55)
> (In reply to Kevin Flynn from comment #54)
> > (In reply to Florian Weimer from comment #53)
> > > (In reply to Kevin Flynn from comment #52)
> > > > From what I could find in the documentation, pthread_self() does not define
> > > > an invalid value constant, nor does it define the result if the main process
> > > > or a single threaded process calls it. It appears to only be valid if called
> > > > within the context of a thread created by pthread_create().
> > > 
> > > That's not a problem: glibc only supports threads created by pthread_create.
> > > Other threads will not have a valid TCB and depending on how glibc is
> > > compiled, literally no function will work properly in this case.
> > 
> > You seem to have responded to supporting commentary, as opposed to the main
> > point of my post. Also, you say, its not a problem, because of some
> > arbitrary human decision, as opposed to, an attribute of the computing
> > devices we write code for. I'm going to assume you didn't spend much time
> > contemplating my post. Thanks.
> 
> I'm sorry, you feel that way. We really do try to respond to the questions
> asked by those commenting on these issues.
> 
> You ask about two specific issues which, as Florian points out, are not
> problems  if you are only using POSIX Threads or C threads from glibc to
> create and control threads.
> 
> 1. a reliable invalid thread id constant ( any negative value )
> 
> - There is never an invalid thread id (pthread_t or otherwise). The caller
> of gettid / pthread_self is always validly executing when it calls these
> functions and will never get an invalid value. If you want to check for an
> invalid value you can start a detached non-joinable thread and use that
> thread's value to indicate "invalid" if you need that kind of constraint in
> your design.
> 
> 2. a value that can distinguish threads from the main process
> 
> - All of the threads are part of the main "process", the main thread, or the
> first thread is pretty close to a normal thread. We want you to treat it
> just like any other thread. For example it can call pthread_exit() just like
> a thread to leave all the other threads running.
> 
> You also mention pthread_self() in the context of main or a single-threaded
> process:
> ~~~
> It appears to only be valid if called within the context of a thread created
> by pthread_create().
> ~~~
> Please see:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_self.html
> 
> POSIX places no restrictions on the API. You can use pthread_self() from
> main() in glibc, and it works and is designed to work that way. If you read
> something contradictory from 'man pthread_self', please feel free to send a
> patch to the linux man pages project to add a clarification.
> 
> The main() function is always executed within a thread of execution, that is
> it has it's own control flow and resources etc. and so constitutes a thread.
> 
> Does that answer your questions?

No, it doesn't. I didn't ask any questions. You seem to suffer from the same reading comprehension problems as the other commentor. 

Just delete my account. It's that pointless.
Comment 57 Adhemerval Zanella 2019-01-29 20:27:19 UTC
I also fail to see why your points were not explained by Florian and Carlos. In any case, I would like to ask you if we could maintain the tone professional and try to move forward (your last message is going on the opposite direction).
Comment 58 Stas Sergeev 2019-01-29 23:35:28 UTC
(In reply to Florian Weimer from comment #53)
> That's not a problem: glibc only supports threads created by pthread_create.
> Other threads will not have a valid TCB and depending on how glibc is
> compiled, literally no function will work properly in this case.

Right, and thet's where gettid() is needed.
Consider the alien code (wine, dosemu) that uses
%fs on their own. Then you get a signal (SIGSEGV) to go out,
but in a signal handler, as you say, "no function will work"
because on x86_64 linux doesn't restore TLS before
calling into a sighandler. This is where gettid() comes -
it allows to restore TLS properly, or at least find
out where have you come from.

(In reply to Carlos O'Donell from comment #55)
> I'm sorry, you feel that way. We really do try to respond to the questions
> asked by those commenting on these issues.

And yet there is zero reply to comment #51.
Not trying to complain, just pointing out that there
is a disconnect between the reality and what is being said.
Have you tried to make linux to restore TLS when calling
a signal handler? No, because its difficult. It would
require a new API that will allow glibc to register in
the kernel the mapping between tids and TLS pointers.
We do not ask you to do that, we resort to work-arounds,
which is gettid(). What's the point to not export it in
any unportable way?
Well, add new kernel API, make it restore TLS properly,
and you will reduce the need for gettid() exactly by 1
use-case (and 1000 to go).
Comment 59 Stas Sergeev 2019-01-30 00:04:38 UTC
Since when glibc started to decide what is
to export via include/linux/* and what to hide?
It should export all, or what's the point of
include/linux/* ?
Comment 60 jsm-csl@polyomino.org.uk 2019-01-30 02:47:56 UTC
On Tue, 29 Jan 2019, stsp at users dot sourceforge.net wrote:

> And yet there is zero reply to comment #51.

Bugzilla is not really the appropriate place for discussions of possible 
new features; it's best for things that are uncontroversially bugs with 
unambiguous criteria for telling whether they are fixed.  Feature 
discussions and consensus building take place primarily on libc-alpha.  
In this case, see the patches and discussion of gettid last month.
Comment 61 Sourceware Commits 2019-02-08 11:03:00 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92 (commit)
      from  f289e656ec8221756519a601042bc9fbe1b310fb (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92

commit 1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92
Author: Florian Weimer <fweimer@redhat.com>
Date:   Sat Feb 2 15:17:02 2019 +0100

    Linux: Add gettid system call wrapper [BZ #6399]
    
    This commit adds gettid to <unistd.h> on Linux, and not to the
    kernel-independent GNU API.
    
    gettid is now supportable on Linux because too many things assume a
    1:1 mapping between libpthread threads and kernel threads.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   70 +++++++++++
 NEWS                                               |    2 +
 manual/process.texi                                |   11 ++
 posix/Makefile                                     |    3 +-
 posix/bits/unistd_ext.h                            |   21 +++
 posix/unistd.h                                     |    3 +
 sysdeps/unix/sysv/linux/Makefile                   |    6 +-
 sysdeps/unix/sysv/linux/Versions                   |    3 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist       |    1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist         |    1 +
 sysdeps/unix/sysv/linux/arm/libc.abilist           |    1 +
 sysdeps/unix/sysv/linux/bits/unistd_ext.h          |   36 ++++++
 sysdeps/unix/sysv/linux/csky/libc.abilist          |    1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist          |    1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist          |    1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist          |    1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |    1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |    1 +
 sysdeps/unix/sysv/linux/microblaze/libc.abilist    |    1 +
 .../unix/sysv/linux/mips/mips32/fpu/libc.abilist   |    1 +
 .../unix/sysv/linux/mips/mips32/nofpu/libc.abilist |    1 +
 .../unix/sysv/linux/mips/mips64/n32/libc.abilist   |    1 +
 .../unix/sysv/linux/mips/mips64/n64/libc.abilist   |    1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist         |    1 +
 .../sysv/linux/powerpc/powerpc32/fpu/libc.abilist  |    1 +
 .../linux/powerpc/powerpc32/nofpu/libc.abilist     |    1 +
 .../sysv/linux/powerpc/powerpc64/be/libc.abilist   |    1 +
 .../sysv/linux/powerpc/powerpc64/le/libc.abilist   |    1 +
 sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist    |    1 +
 sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist  |    1 +
 sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist  |    1 +
 sysdeps/unix/sysv/linux/sh/libc.abilist            |    1 +
 sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist |    1 +
 sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist |    1 +
 sysdeps/unix/sysv/linux/syscalls.list              |    1 +
 sysdeps/unix/sysv/linux/tst-gettid-kill.c          |  129 ++++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-gettid.c               |   79 ++++++++++++
 sysdeps/unix/sysv/linux/tst-setgetname.c           |    6 -
 sysdeps/unix/sysv/linux/x86_64/64/libc.abilist     |    1 +
 sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist    |    1 +
 40 files changed, 388 insertions(+), 9 deletions(-)
 create mode 100644 posix/bits/unistd_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/unistd_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-gettid-kill.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-gettid.c
Comment 62 Florian Weimer 2019-02-08 11:03:47 UTC
Fixed in glibc 2.30.
Comment 63 Stas Sergeev 2019-02-08 13:08:53 UTC
(In reply to cvs-commit@gcc.gnu.org from comment #61)
>     gettid is now supportable on Linux because too many things assume a
>     1:1 mapping between libpthread threads and kernel threads.

People also have their own threading libraries,
that use clone(), gettid() and the rest of the
linux-specific API. And there are TLS problems
with sighandlers. And there is a need to export
any linux API to include/linux/* anyway.
So I don't think the stated reason is indeed
even remotely a primary reason, but thanks for
your work on this!
Comment 64 Sourceware Commits 2019-02-08 15:34:29 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  8f89ab216f205c2ffd90d1fc8454efdfc0b01dee (commit)
      from  144a794e0a1670dfc7a178637c7f35b5910c42ec (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8f89ab216f205c2ffd90d1fc8454efdfc0b01dee

commit 8f89ab216f205c2ffd90d1fc8454efdfc0b01dee
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Feb 8 16:33:00 2019 +0100

    posix: Fix missing wrapper header for <bits/unistd_ext.h>
    
    Fixes commit 1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92
    ("Linux: Add gettid system call wrapper [BZ #6399]").
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                 |    4 ++++
 include/bits/unistd_ext.h |    1 +
 2 files changed, 5 insertions(+), 0 deletions(-)
 create mode 100644 include/bits/unistd_ext.h
Comment 65 Rich Felker 2019-02-08 16:16:52 UTC
Regarding #63, "their own threading libraries" simply don't work with glibc. You can't safely call any glibc function, in a manner that's indepdendent of the internals of a specific glibc version, from a thread created in some way that bypasses glibc. At the very least, the TCB and TLS will not be setup right for it to be safe to call these functions. So that is not a use case related to the addition of gettid to glibc.
Comment 66 Stas Sergeev 2019-02-08 16:25:29 UTC
(In reply to Rich Felker from comment #65)
> Regarding #63, "their own threading libraries" simply don't work with glibc.
> You can't safely call any glibc function, in a manner that's indepdendent of
> the internals of a specific glibc version, from a thread created in some way
> that bypasses glibc. At the very least, the TCB and TLS will not be setup
> right for it to be safe to call these functions.

"right" is a key word here. :)
I do not need to set them up "right", I just set TLS to the one
of the main thread always, and use gettid() to see where am I.
Of course pthread_self() is not supposed to be used under such
threads - I provide a replacement anyway.
Comment 67 Rich Felker 2019-02-08 16:36:32 UTC
That does not work. It will either cause deadlock or lack of locking (depending on whether a lock is non-recursive or recursive) in libc functions you call from the thread, since they will wrongly use the tid of the thread whose TLS you misappropriated (from its TCB).
Comment 68 Stas Sergeev 2019-02-08 17:01:45 UTC
(In reply to Rich Felker from comment #67)
> That does not work. It will either cause deadlock or lack of locking
> (depending on whether a lock is non-recursive or recursive) in libc
> functions you call from the thread, since they will wrongly use the tid of
> the thread whose TLS you misappropriated (from its TCB).

This assumes the pre-emptive multitasking, which mine is not
yet. :) So far I use explicit calls to switch out, to avoid the
like problems.
But there are various tricks I tried in the past in other
projects, which I suppose can work here too. For example you
can load another instance of libc with dlmopen(), with lmid=LM_ID_NEWLM
and flags=RTLD_LOCAL|RTLD_DEEPBIND, and call that instance.
Currently such code exists in user-union project, I haven't
tried it with threads yet, but I think it can work.
Comment 69 Sourceware Commits 2019-02-08 17:39:48 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  e47d82c99a6db060419b421768aced76bea92997 (commit)
      from  8f89ab216f205c2ffd90d1fc8454efdfc0b01dee (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e47d82c99a6db060419b421768aced76bea92997

commit e47d82c99a6db060419b421768aced76bea92997
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Feb 8 16:51:17 2019 +0100

    Provide <bits/unistd_ext.h> as a sysdeps header exclusively
    
    Non-sysdeps headers cannot be overriden by sysdeps headers across the
    entire build, so it is necessary to turn such extension headers into
    sysdeps headers themselves.  The approach here follows the existing
    <bits/shm.h> header (although it uses sysdeps/gnu instead of
    sysdeps/generic).
    
    Fixes commit 1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92 ("Linux: Add
    gettid system call wrapper [BZ #6399]") and commit
    8f89ab216f205c2ffd90d1fc8454efdfc0b01dee ("posix: Fix missing wrapper
    header for <bits/unistd_ext.h>").

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                    |    6 ++++++
 include/bits/unistd_ext.h                    |    1 -
 {posix => sysdeps/generic}/bits/unistd_ext.h |    0
 3 files changed, 6 insertions(+), 1 deletions(-)
 delete mode 100644 include/bits/unistd_ext.h
 rename {posix => sysdeps/generic}/bits/unistd_ext.h (100%)
Comment 70 Michael Kerrisk 2019-02-21 12:19:52 UTC
Thank you, Florian.