This is the mail archive of the pthreads-win32@sourceware.cygnus.com mailing list for the pthreas-win32 project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Bug update


After several days of trial and error, I believe I've made some progress on this.

It seems that DllMain can get called with fdwReason == DLL_THREAD_DETACH
when a thread has threadH == 0. This seems to happen very, very rarely, which is
one reason this bug might not have revealed itself until now. Looking at the code,
I see no way that threadH could be zero for a thread that has been successfully
created, unless the call to _beginthread[ex] got far enough to create the thread,
but then failed for some other reason (something bad happening in pthread_threadStart,
for example). In any case,  this code in dll.c causes some memory corruption:

  pthread_setspecific (_pthread_selfThreadKey, NULL);
  _pthread_threadDestroy (self);

I don't know yet which of these two calls actually causes the problem. I guarded
both like so:

    if (self->threadH) {
        pthread_setspecific (_pthread_selfThreadKey, NULL);
        _pthread_threadDestroy (self);
    }

and this seems to prevent the crashes. I've run my test program for 15 hours and it
hasn't crashed. However, it still (very suspiciously) leaks handles slowly. After 15
hours, it has lost 10847 of them. Of course, this program is creating millions of
threads, accelerating the handle leakage as much as possible. It may be very hard
to detect this handle leakage in any real-world programs. Also, since I'm not calling
_pthread_threadDestroy(self) when threadH == 0, one would expect memory some
memory leakage there. However, this seems to be a very rare occurrence, so in
practice makes no difference. (The program's memory usage is still under 2MB.)

I have also run my (much more complicated) real application with the patched DLL,
and it too seems to run happily for many hours.

I don't yet fully understand what's going on in either the pthread_setspecific or
pthread_callUserDestroyRoutines, so I don't know exactly who's to blame. It's also
quite possible that my 15 crash-free hours are simple luck. Until I understand the
bug fully, I won't rest easy. But this seems like something some of you who are
more familiar with the code might be able to work from.

BTW, I did verify that the program crashes on another NT machine. I.e., it's a real bug,
not my machine.

Dave


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]