linuxthreads bug in 2.2.4 under ppc linux
Kevin B. Hendricks
kevin.hendricks@sympatico.ca
Fri Dec 7 21:51:00 GMT 2001
Hi,
I recently upgraded to glibc-2.2.4 and seem to have run into a
linuxthreads problem under PPC Linux.
This problem is very timing dependent. It does not happen every time but
after 10 or 20 attempts with the code, I can get it usually get it to
segfault and the segfault always happens in the exact same place.
Here is an quick analysis of the problem:
(#0 0xfdcce70 in __pthread_alt_unlock () at eval.c:88
#1 0xfdc895c in pthread_mutex_unlock () at eval.c:88
#2 0xfe680d4 in ChildStatusProc ()
from /src2/openoffice-641c/solver/641/unxlngppc.pro/lib/libsal.so.3
#3 0xfe667e4 in oslWorkerWrapperFunction ()
from /src2/openoffice-641c/solver/641/unxlngppc.pro/lib/libsal.so.3
#4 0xfdc7448 in pthread_start_thread () at eval.c:88
#5 0xfafe5a8 in clone () at eval.c:88
We are using normal mutexes here.
Based on dissassembling the code, the problem is here in
void __pthread_alt_unlock(struct _pthread_fastlock *lock);
After the test to see if the node was abandoned:
if (p_node->abandoned) {
/* Remove abandoned node. */
It turns out it was not and the else clause is invoked and the following
code is run:
} else if ((prio = p_node->thr->p_priority) >= maxprio) {
/* Otherwise remember it if its thread has a higher or equal
priority
compared to that of any node seen thus far. */
maxprio = prio;
pp_max_prio = pp_node;
p_max_prio = p_node;
}
But the wait_node structure being looked at had all 0 values
In the code r4 is the address of the fastlock and its status value is
0x0fb57250 which is the pointer to the wait_node.
(gdb) x/10 $r4
0x7fffd3a4: 0x0fb57250 0x0fde66cc 0x0fb56e40 0x7fffd3c0
0x7fffd3b4: 0x0fdc8588 0x0fde66cc 0x0fb57250 0x7fffd3d0
0x7fffd3c4: 0x0fdc895c 0x0fb5f974
Unfortunately the wait node itself is all zeros (pnode->abandoned was 0
but also the thr and next pointers were 0.
(gdb) x/10 $r11
0xfb57250 <main_arena+1040>: 0x00000000 0x00000000 0x00000000
0x00000000
This results in a segfault trying to access the p_priority of a 0 thr
pointer at 0xfdcce70 since r9 is 0 (the thr value).
0xfdcce6c <__pthread_alt_unlock+240>: lwz r9,4(r11)
0xfdcce70 <__pthread_alt_unlock+244>: lwz r0,88(r9)
0xfdcce74 <__pthread_alt_unlock+248>: cmpw r0,r6
0xfdcce78 <__pthread_alt_unlock+252>:
blt 0xfdcce88 <__pthread_alt_unlock+268>
So the question is is this a legal state?
Is it possible to have a nonzero status in a fastlock but the wait_node it
points at is all zeros.
If so we should see if check to make sure the thr pointer is not zero
before trying to access its fields.
I am sorry I can't be more help here but the code in spinlock.c seems to
be much more complicated that the old way mutexes were done under earlier
glibc-2.2 releases.
I see lots of reservation lock pairs (lwarx stwcx.) used through the code
I disassembled. I am very unsure if the proper syncs and isyncs
(BARRIERS) are being used here.
Here are 3 examples taken from this routine that all all different in
their use of sync and isync.
(this one does no sync to start)
bde0: 7d 20 20 28 lwarx r9,r0,r4
bde4: 7d 69 4a 79 xor. r9,r11,r9
bde8: 40 82 00 0c bne- bdf4 <__pthread_alt_unlock+0x78>
bdec: 7c 00 21 2d stwcx. r0,r0,r4
bdf0: 40 a2 ff f0 bne- bde0 <__pthread_alt_unlock+0x64>
bdf4: 4c 00 01 2c isync
(this one does a sync to start and an isync after)
be38: 7c 00 04 ac sync
be3c: 7d 20 50 28 lwarx r9,r0,r10
be40: 7c 09 4a 79 xor. r9,r0,r9
be44: 40 82 00 0c bne- be50 <__pthread_alt_unlock+0xd4>
be48: 7d 60 51 2d stwcx. r11,r0,r10
be4c: 40 a2 ff f0 bne- be3c <__pthread_alt_unlock+0xc0>
be50: 4c 00 01 2c isync
(this one does a sync to start but no isync after)
bf38: 7c 00 04 ac sync
bf3c: 7d 20 18 28 lwarx r9,r0,r3
bf40: 7c 09 4a 79 xor. r9,r0,r9
bf44: 40 82 00 0c bne- bf50 <__pthread_alt_unlock+0x1d4>
bf48: 7d 80 19 2d stwcx. r12,r0,r3
bf4c: 40 a2 ff f0 bne- bf3c <__pthread_alt_unlock+0x1c0>
bf50: 7d 20 4b 78 mr r0,r9
My (limited) understanding of this is that you when you grab a lock you
use the lwarx,stwcx pair and follow it by an isync. When you write 0 to a
lock to free it you do a sync first and then simply write it. Therefore I
think the last one that does a sync before the reservation but no isync
after is wrong.
Maybe Geoff or Franz or David knows for sure.
Any guidance on how to address this would be greatly appreciated.
Thanks,
Kevin
More information about the Libc-alpha
mailing list