[libc-alpha] linuxthreads bug in 2.2.4 under ppc linux
Kevin B. Hendricks
kevin.hendricks@sympatico.ca
Sun Dec 9 06:28:00 GMT 2001
Hi,
I gave it one last look and I seemed to have stumbled upon what I think
might be the problem.
Under -O3 ppc seems to be using oldstatus in place of pnode
Here is the code:
1. First it comes in and sets up the frame and then branches to the
compare and swap routine:
0000c1b8 <__pthread_alt_unlock>:
c1b8: 94 21 ff f0 stwu r1,-16(r1)
c1bc: 7c 08 02 a6 mflr r0
c1c0: 93 c1 00 08 stw r30,8(r1)
c1c4: 93 e1 00 0c stw r31,12(r1)
c1c8: 90 01 00 14 stw r0,20(r1)
c1cc: 48 01 9c 19 bl 25de4 <__DTOR_END__+0x4>
c1d0: 7f c8 02 a6 mflr r30
c1d4: 7c 64 1b 78 mr r4,r3
c1d8: 7c 00 04 ac sync
c1dc: 39 80 00 00 li r12,0
c1e0: 3b e0 00 01 li r31,1
c1e4: 48 00 01 a0 b c384 <__pthread_alt_unlock+0x1cc>
Then here is the compare and swap routine:
c384: 80 03 00 00 lwz r0,0(r3)
c388: 28 00 00 01 cmplwi r0,1
c38c: 41 81 fe 5c bgt+ c1e8 <__pthread_alt_unlock+0x30>
c390: 7c 00 04 ac sync
c394: 7d 20 18 28 lwarx r9,r0,r3
c398: 7c 09 4a 79 xor. r9,r0,r9
c39c: 40 82 00 0c bne- c3a8 <__pthread_alt_unlock+0x1f0>
c3a0: 7d 80 19 2d stwcx. r12,r0,r3
c3a4: 40 a2 ff f0 bne- c394 <__pthread_alt_unlock+0x1dc>
c3a8: 7d 20 4b 78 mr r0,r9
c3ac: 2c 00 00 00 cmpwi r0,0
c3b0: 40 82 ff d4 bne+ c384 <__pthread_alt_unlock+0x1cc>
c3b4: 80 01 00 14 lwz r0,20(r1)
c3b8: 7c 08 03 a6 mtlr r0
c3bc: 83 c1 00 08 lwz r30,8(r1)
c3c0: 83 e1 00 0c lwz r31,12(r1)
c3c4: 38 21 00 10 addi r1,r1,16
c3c8: 4e 80 00 20 blr
It line: c384 it loads the lock status and compares it to 1 and if greater
it jumps back up otherwise it enters the lwarx stwcx. pairing and after
that subloop returns, r0 has the oldvalue from the lock status and if it
is greater than 1 it branches back to the top otherwise the whole thing
just returns (it has freed the lock).
So if greater than 1 (there were waiters) it goes to:
c1e8: 7c 0b 03 78 mr r11,r0
c1ec: 7c 68 1b 78 mr r8,r3
c1f0: 7c 65 1b 78 mr r5,r3
c1f4: 7d 67 5b 78 mr r7,r11
c1f8: 3c c0 80 00 lis r6,-32768
c1fc: 7c 00 04 ac sync
c200: 80 0b 00 08 lwz r0,8(r11)
Here it stores the oldstatus in r11 (it never reloads pnode from memory it
instead just uses oldstatus in its place and r11 is used to immediately
access the abandoned field since r11 is assumed to be a pointer to a
wait_node.
But in the *working* version (registers are used differently here but I
will point out the differences. If there are waiters (the cas has
returned a value of greater than 1 for the lock status) r27 is the
pointer to the lock and the lock status is *reread* from memory instead of
using the results of the oldvalue from the compare and swap routine.
c214: 83 bb 00 00 lwz r29,0(r27)
c218: 7f 7c db 78 mr r28,r27
c21c: 7f 79 db 78 mr r25,r27
c220: 7f bf eb 78 mr r31,r29
c224: 3f 40 80 00 lis r26,-32768
c228: 7c 00 04 ac sync
So effectively, under -O3 the compiler has merged oldstatus and the pnode
value loaded from the *pp_head into the same variable.
#if defined HAS_COMPARE_AND_SWAP
{
long oldstatus = lock->__status;
if (oldstatus == 0 || oldstatus == 1) {
if (__compare_and_swap_with_release_semantics (&lock->__status,
oldstatus, 0))
break;
else
continue;
}
}
#endif
/* Process the entire queue of wait nodes. Remove all abandoned
wait nodes and put them into the global free queue, and
remember the one unabandoned node which refers to the thread
having the highest priority. */
pp_max_prio = pp_node = pp_head;
p_max_prio = p_node = *pp_head;
maxprio = INT_MIN;
READ_MEMORY_BARRIER(); /* Prevent access to stale data through p_node
*/
while (p_node != (struct wait_node *) 1) {
Effectively the line:
p_max_prio = p_node = *pp_head;
above have been optimized away and the value of oldstatus from the
previous compare and swap is being used for pnode directly.
If I print or otherwise access pnode in any way, inside the loop (I added
a simple MSG to print pnode if it is null for example), then this
optimization does not happen and all functions well.
So this change seems to be part of the problem.
I am not sure if this difference is important but it is a real difference
in how the code works.
Why are we not using the oldstatus from the compare and swap in place of
pnode? Was there a reason? If so, then we should somehow tell the
compiler not to optimize away pnode.
That's about the only differnce I can find.
I hope this helps,
Kevin
More information about the Libc-alpha
mailing list