This is the mail archive of the mailing list for the GDB project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: wrong assumptions about pthread_t being numeric

On 09/17/2011 05:29 PM, Pedro Alves wrote:
On Saturday 17 September 2011 15:58:33, Pedro Alves wrote:
On Saturday 17 September 2011 02:02:28, John Spencer wrote:
On 09/17/2011 02:30 AM, Pedro Alves wrote:
On Saturday 17 September 2011 00:13:10, John Spencer wrote:
On 09/17/2011 01:00 AM, Pedro Alves wrote:
These are only built natively on solaris and aix respectively, so
let's just leave them alone.

I expected it to be desirable for a product in industrial use to be
standard-compliant and not invoking undefined behavior.
Those files are tied to those platforms' thread_db/libc implementations.
There's absolutely no need to handle some other hipotetical libc that
defines pthread_t diferently there.  If it appears, we'll handle it.
Chances are, some other changes would be necessary to make it really
work, not just build.

No, you're completely missing the point.  The code in question is
not _using_ posix threads, as a normal application would.
It is interfacing with the system's libc to get at the thread support
library's internals, and letting the debugger control and instead the state
of threads.  There's no portability requirement here.  The major requirement
is that the code works with the libc it is intended to be interfacing with.

pthread_t on a given system is not _required_ to be an arithmetic type, but
it certainly can.  Sorry, I strongly object to changing the solaris or AIX
thread support just because you've learned that on some other systems
pthread_t may not be numeric.

for example in musl's case it is wrong to compare the
underlying type (which is a struct pointer) with 0.
musl does not support solaris nor aix...

in the implementation specific part it would practically be valid to
compare it with
NULL, but that's still invalid regarding the definition in POSIX.
The gdb code in question _is_ peeking at implementation specific
parts!  That's the point that's apparently gone missing.

E.g., with such a program being debugged:

   pthread_t thread;
   res = pthread_create (&thread, NULL, entry_func, NULL);

You can do:

   (gdb) p /x thread
   $2 = 0x7ffff7829700

And correlate that with the thread in gdb's thread list:

  (gdb) info threads
    3 Thread 0x7ffff7028700 (LWP 29643)  0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
    2 Thread 0x7ffff7829700 (LWP 29640)  0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  * 1 Thread 0x7ffff7fcd720 (LWP 29618)  main () at threads.c:50

See, it's thread 2. If on some other platform pthread_t is

struct pthread_t { int foo; int bar; };

Then we'll need to have specific code in gdb's handling
to print that.  But on some other platform, it could be,

struct pthread_t { long tid; int some_other_id_who_knows; };

and again, we'd need some code to be able to print this.

So, you see, as we find platforms that need adapting to, we'll
do it, but it doesn't make much sense to do it before stumbling
on them.

so the few places that use a thread id had to be changed everywhere, to
not have dozens of special case
sections for any libc/OS combination...
Again, let's not invent work until we find such a combination...
There's no way you can know what such a libc/thread_db's pthread_t
would look like until it exists.

also there are a few spots where the thread_t is numerically compared to
some arbitrary int value
(i have not yet evaluated where those originate from).
Let us know when you find them...

Maybe you're thinking of this in gdbserver/thread-db.c, where we

   /* If the new thread ID is zero, a final thread ID will be available
      later.  Do not enable thread debugging yet.  */
   if (ti.ti_tid == 0)
     return 0;

but again, this is an implementation detail of glibc that gdb needs
to know about!  There's no portable null pthread_t, and even if
there was, there's no telling what other libc's do in this scenario.
Again, there's _no_ way to make this portable, because we _are_ part
of the implementation.

Code doing the same in gdb, where the comments are a bit more clear:

   if (ti.ti_tid == 0)
       /* A thread ID of zero means that this is the main thread, but
	 glibc has not yet initialized thread-local storage and the
	 pthread library.  We do not know what the thread's TID will
	 be yet.  Just enable event reporting and otherwise ignore
	 it.  */

this also can't
work when thread_t is no number.
this happens for example in the if statement leading to the posted
printf format string errors.

i expect that the number of actual fixes would be small, but it might
require another way to get to a thread id.

if the id is only used for textual representation, it would be
sufficient if gdb would assign a new unique number
on each successful pthread_create.
if it is also used to find a thread internally, it had to have a list of
tid numbers ->  pthread_t.
Again, when we have a libc/thread_db that requires that, we'll do it.
Until then, we'll keep the code simple, thank you.

thread-db.c: In function 'find_one_thread':
thread-db.c:295:7: error: format '%ld' expects type 'long int', but
argument 3 has type 'thread_t'
thread-db.c: In function 'attach_thread':
thread-db.c:335:7: error: format '%ld' expects type 'long int', but
argument 3 has type 'thread_t'
thread-db.c:341:9: error: format '%ld' expects type 'long int', but
argument 2 has type 'thread_t'
So just cast it to long, and you're done.

pthread_t could legally be a struct, which you can't just cast to a long.
No need to complicate things for an hipotetical scenario.  The set of
libc's in existence is finite.  If we were to handle a struct pthread_t,
we'd need to be able to print it, and so we'd need some libc specific
way to do it, something autoconf'ed.  There's no need to invent work.

i disagree. adding a proper solution once is superior to creating dozens
of special case hacks.
What's the "proper" solution then?  The only possible way I can think
of to handle some libc what uses a struct for pthread_t, is with some way at
runtime to know which libc/thread_db gdb is talking to, failing that,
#ifdefs and/or autoconfig'ury.  Since a simple cast will
work for glibc, android, musl, and whatever other libc's happen be
be building gdbserver today, what's the point?
I'll even go a step further, and suggest you do expose pthread_t
as a typedef to long in public headers, and cast it to pointer internally
in musl, just like glibc and most other libc's do.  Hiding the fact
that you implement things with a pointer could even be a good thing.

It's just easier for everybody else that way.  There's really no point in
being different for the sake of it.

that's definitely going to far, you can't dictate how libcs should handle an opaque type internally just to keep broken code working.

since it seems to much effort to change it to a proper solution, there should be at least a explicit function/macro which takes a thread_t and converts it to long, since it is assumed in a couple of spots that it is of this type.
that is exactly what my patch does.

and as you wished, it fixes the current issue with minimal effort.

also it saves a lot of time in the long term.
No, it wastes time right now coming up with a fix for an made up
problem that does not exist in practice today.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]