Bug 27880 - Please provide a pthread pid accessor
Summary: Please provide a pthread pid accessor
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-18 01:19 UTC by H. Peter Anvin
Modified: 2024-02-06 22:30 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H. Peter Anvin 2021-05-18 01:19:51 UTC
There is currently no way to access the kernel tid for another pthread, except via the very cumbersome thread_db interface. The intent is good, to provide interface isolation, but sometimes having such cross-layer interfaces (e.g. fileno()) are at the very least serve a "harm reduction" function.

As it is, there are enough reasons access to a tid is necessary that programmers are employing multiple unsafe techniques:

1. Calling gettid() early in the thread main function and caching that value indefinitely;
2. Mixing pthreads and raw calls to clone();
3. Extracting or even hard-coding the offset for tid in struct pthread (almost never using any kind of hard version tying.)

Having an interface like:

pid_t pthread_gettid_np(pthread_t thread)

... would at least allow the interface to be kept in sync with the current libpthread, and provides a path to returning an error if no meaningful tid is available (e.g. in the case of M:N threads, or the underlying kernel thread no longer existing.)
Comment 1 H. Peter Anvin 2021-05-18 05:05:54 UTC
From what I can read, MacOS X seems to have a pthread_threadid_np(thread, &tid) interface for this purpose; AIX has can get it via the pthread_getthrds_np() interface.

FreeBSD has pthread_getthreadid_np(), but that only applies to the self thread, i.e. is equivalent to Linux gettid().

Reusing the MacOS X interface seems most sensible; the pthread_getthrds_np() interface seems very complex.
Comment 2 jsm-csl@polyomino.org.uk 2021-05-18 18:36:24 UTC
This was previously requested in bug 14300 (but then that was marked as a 
duplicate of bug 6399, then 6399 was marked FIXED with gettid but not 
pthread_gettid_np having been added).
Comment 3 H. Peter Anvin 2021-05-18 19:31:39 UTC
Right. I'm asking for the ability to get the tid of a *different* thread, whereas gettid() gets the tid of the *current* thread.
Comment 4 Adhemerval Zanella 2021-05-18 20:58:15 UTC
I think it is a fair addition, I will work on add this.
Comment 5 Florian Weimer 2021-05-19 20:14:59 UTC
I already have a patch somewhere. I'll see if I can post it.

Some operating systems have a thread ID functionality that provides TIDs that aren't reused. I think that's probably not what you have in mind. You want the kernel TID, right?

What about threads that have exited and have not yet been joined? Should we report the TID that was used when the thread was running?
Comment 6 H. Peter Anvin 2021-05-19 20:21:59 UTC
Yes, the kernel tid (which is valid as long as there is a zombie process.)
Comment 7 Adhemerval Zanella 2021-05-19 20:27:45 UTC
I think the TID value is what he is has in mind, assuming that prototype in comment #1 returns 'pid_t'.

For thread exit/cancelled I think the sensible way would to return an error to indicate it.  Another issue, which I think also there is no guaranteee, is calling the newer interface with pthread_self() on a vfork process (since the tid won't be update and I am assuming the straightforward solution would be to just atomic read the TID field in struct pthread).

Also I am not sure about the name, pthread_gettid_np seems sensible but I am not sure if it make sense for non-Linux systems.
Comment 8 H. Peter Anvin 2021-05-19 20:31:28 UTC
Since MacOS X already has pthread_threadid_np(), I think we should mimic that.

And yes ->tid seems like the sane thing.
Comment 9 Florian Weimer 2021-05-19 20:49:35 UTC
(In reply to hpa@zytor.com from comment #8)
> Since MacOS X already has pthread_threadid_np(), I think we should mimic
> that.
> 
> And yes ->tid seems like the sane thing.

pthread_threadid_np seems to return a 64-bit unique ID. Our interface won't do that, so I think we should use a different function name.
Comment 10 Adhemerval Zanella 2021-05-27 19:32:19 UTC
It seems that this is trickier than it seems, some issues we might considere first:

  1. What should we do with detached threads? As for pthread_kill, issuing a pthread_gettid_np might use an invalid handler (since the pthread_t identifier might be reused).  This only solution I have is to define it as undefined behavior, this is not great but to proper support it would incur to keep tracking or all possible pthread_t identifiers (we already keep the glibc provided stacks, dl_stack_cache, so it would be a matter to include the user provided one in the list as special entries).

  2. I think that once we provide this API, developers will start to use o query if a thread is alive and I am not sure if this is really the proper API for this. This is the same issue as 1.

  3. How do we handle the concurrent access between pthread_join and pthread_gettid_np? Once a pthread_join is issued, the pthread_t identifier might be reused and accessing it should be invalid. pthread_join first synchronizes using 'joinid' to avoid concurrent pthread_join and then wait the kernel signal on 'tid' that the thread has finished.  The 'pthread_gettid_np' naive check would just do a atomic load on tid, however it might read a transient value between pthread_join 'joinid' setup and the futex wait.  I am not sure how to handle it correctly.

Also, MacOSX signature is:

  int pthread_gettid_np (pthread_t thread, uint64_t *thread_id)

And it returns the current thread identification if THREAD is NULL, returns ESRCH for invalid handle (the 1. and 2. issue below), and also consults the kernel if the identifier can no be obtained.

I think for possible glibc symbols we should use a pid_t instead, the NULL special arguments is also tricky because if by POSIX pthread_t might be NULL (this is an implementation detail), for ESRCH we can our INVALID_NOT_TERMINATED_TD_P (which not the best solution, but it the current pratice).
Comment 11 H. Peter Anvin 2021-05-27 23:48:07 UTC
As far as a pthread_id being reused - it doesn't seem to be fundamentally different from what can happen with any other use of pthread_id? It is actually a pointer, so it isn't any different from any other stale pointer?

It really seems to be absolutely no difference to me: if an old pthread_id can be reused and this causing failures, then *any* of the pthread functions are affected.

In the case of (2), this actually makes it a bit safer than what the pthreads interface currently provides: at least there is a second thing to test (the tid is at least likely to change.)

When a pthread is detached from its underlying kernel thread, this should return zero or -1 with ESRCH; and as you correctly point out this is pretty much exactly what your INVALID_NOT_TERMINATED_TD_P() test does...

If this kind of robustness is an issue, you really can't just use a plain pointer, but this is really unrelated to this specific use case; you would probably need to do something like a {table index, generation number} token.

The MacOS interface does seem a bit needlessly complicated;
pid_t pthread_gettid_np(pthread_t thread) is definitely simpler.
Comment 12 Josh Kriegshauser 2024-02-06 22:30:39 UTC
I would really like to see this addition make it in, even if it is only exposed for the calling thread. Sadly, it seems as though work on it has stalled.

In my use case it's a performance issue. The gettid() syscall takes ~700 ns on my machine, but reading pthread_self() is ~9 ns.

It would be great to have a performant way of reading the TID out of the struct pthread.