This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: wrong assumptions about pthread_t being numeric


On Saturday 17 September 2011 02:02:28, John Spencer wrote:
> On 09/17/2011 02:30 AM, Pedro Alves wrote:
> > On Saturday 17 September 2011 00:13:10, John Spencer wrote:
> >> On 09/17/2011 01:00 AM, Pedro Alves wrote:
> >>> These are only built natively on solaris and aix respectively, so
> >>> let's just leave them alone.
> >>>
> >> I expected it to be desirable for a product in industrial use to be
> >> standard-compliant and not invoking undefined behavior.
> > Those files are tied to those platforms' thread_db/libc implementations.
> > There's absolutely no need to handle some other hipotetical libc that
> > defines pthread_t diferently there.  If it appears, we'll handle it.
> 
> > Chances are, some other changes would be necessary to make it really
> > work, not just build.
> >
> 
> exactly. 

No, you're completely missing the point.  The code in question is
not _using_ posix threads, as a normal application would.
It is interfacing with the system's libc to get at the thread support
library's internals, and letting the debugger control and instead the state
of threads.  There's no portability requirement here.  The major requirement
is that the code works with the libc it is intended to be interfacing with.

pthread_t on a given system is not _required_ to be an arithmetic type, but
it certainly can.  Sorry, I strongly object to changing the solaris or AIX
thread support just because you've learned that on some other systems
pthread_t may not be numeric.

> for example in musl's case it is wrong to compare the 
> underlying type (which is a struct pointer) with 0.

musl does not support solaris nor aix...

> in the implementation specific part it would practically be valid to 
> compare it with
> NULL, but that's still invalid regarding the definition in POSIX.

The gdb code in question _is_ peeking at implementation specific
parts!  That's the point that's apparently gone missing.

E.g., with such a program being debugged:

  pthread_t thread;
  res = pthread_create (&thread, NULL, entry_func, NULL);  
  ...

You can do:

  (gdb) p /x thread
  $2 = 0x7ffff7829700

And correlate that with the thread in gdb's thread list:

 (gdb) info threads 
   3 Thread 0x7ffff7028700 (LWP 29643)  0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
   2 Thread 0x7ffff7829700 (LWP 29640)  0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
 * 1 Thread 0x7ffff7fcd720 (LWP 29618)  main () at threads.c:50

See, it's thread 2.  If on some other platform pthread_t is

struct pthread_t { int foo; int bar; };

Then we'll need to have specific code in gdb's libthread_db.so handling
to print that.  But on some other platform, it could be, 

struct pthread_t { long tid; int some_other_id_who_knows; };

and again, we'd need some code to be able to print this.

So, you see, as we find platforms that need adapting to, we'll
do it, but it doesn't make much sense to do it before stumbling
on them.

> so the few places that use a thread id had to be changed everywhere, to 
> not have dozens of special case
> sections for any libc/OS combination...

Again, let's not invent work until we find such a combination...
There's no way you can know what such a libc/thread_db's pthread_t
would look like until it exists.

> also there are a few spots where the thread_t is numerically compared to 
> some arbitrary int value
> (i have not yet evaluated where those originate from). 

Let us know when you find them...

Maybe you're thinking of this in gdbserver/thread-db.c, where we
have:

  /* If the new thread ID is zero, a final thread ID will be available
     later.  Do not enable thread debugging yet.  */
  if (ti.ti_tid == 0)
    return 0;

but again, this is an implementation detail of glibc that gdb needs
to know about!  There's no portable null pthread_t, and even if
there was, there's no telling what other libc's do in this scenario.
Again, there's _no_ way to make this portable, because we _are_ part
of the implementation.

Code doing the same in gdb, where the comments are a bit more clear:

  if (ti.ti_tid == 0)
    {
      /* A thread ID of zero means that this is the main thread, but
	 glibc has not yet initialized thread-local storage and the
	 pthread library.  We do not know what the thread's TID will
	 be yet.  Just enable event reporting and otherwise ignore
	 it.  */

> this also can't 
> work when thread_t is no number.
> this happens for example in the if statement leading to the posted 
> printf format string errors.


> i expect that the number of actual fixes would be small, but it might 
> require another way to get to a thread id.
> 
> if the id is only used for textual representation, it would be 
> sufficient if gdb would assign a new unique number
> on each successful pthread_create.
> if it is also used to find a thread internally, it had to have a list of 
> tid numbers -> pthread_t.

Again, when we have a libc/thread_db that requires that, we'll do it.
Until then, we'll keep the code simple, thank you.

> 
> 
> >>>> thread-db.c: In function 'find_one_thread':
> >>>> thread-db.c:295:7: error: format '%ld' expects type 'long int', but
> >>>> argument 3 has type 'thread_t'
> >>>> thread-db.c: In function 'attach_thread':
> >>>> thread-db.c:335:7: error: format '%ld' expects type 'long int', but
> >>>> argument 3 has type 'thread_t'
> >>>> thread-db.c:341:9: error: format '%ld' expects type 'long int', but
> >>>> argument 2 has type 'thread_t'
> >>> So just cast it to long, and you're done.
> >>>
> >> pthread_t could legally be a struct, which you can't just cast to a long.
> > No need to complicate things for an hipotetical scenario.  The set of
> > libc's in existence is finite.  If we were to handle a struct pthread_t,
> > we'd need to be able to print it, and so we'd need some libc specific
> > way to do it, something autoconf'ed.  There's no need to invent work.
> >
> i disagree. adding a proper solution once is superior to creating dozens 
> of special case hacks.

What's the "proper" solution then?  The only possible way I can think
of to handle some libc what uses a struct for pthread_t, is with some way at
runtime to know which libc/thread_db gdb is talking to, failing that,
#ifdefs and/or autoconfig'ury.  Since a simple cast will
work for glibc, android, musl, and whatever other libc's happen be
be building gdbserver today, what's the point?

> also it saves a lot of time in the long term.

No, it wastes time right now coming up with a fix for an made up
problem that does not exist in practice today.

-- 
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]