This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: PATCH: Fix ll/sc for mips (take 3)
On Tue, Feb 05, 2002 at 01:54:07PM -0800, H . J . Lu wrote:
> __asm__ __volatile__
> ("/* Inline compare & swap */\n"
> "1:\n\t"
> "ll %1,%5\n\t"
> "move %0,$0\n\t"
> "bne %1,%3,2f\n\t"
> "move %0,%4\n\t"
> "sc %0,%2\n\t"
> "beqz %0,1b\n\t"
> "2:\n\t"
> "/* End compare & swap */"
> : "=&r" (ret), "=&r" (temp), "=m" (*p)
> : "r" (oldval), "r" (newval), "m" (*p)
> : "memory");
>
> The assembler will do
>
> 0xd724 <__pthread_alt_lock+212>: ll v1,0(s1)
> 0xd728 <__pthread_alt_lock+216>: move a1,zero
> 0xd72c <__pthread_alt_lock+220>: bne v1,s0,0xd744 <__pthread_alt_lock+244>
> 0xd730 <__pthread_alt_lock+224>: nop
> 0xd734 <__pthread_alt_lock+228>: move a1,v0
> 0xd738 <__pthread_alt_lock+232>: sc a1,0(s1)
> 0xd73c <__pthread_alt_lock+236>: beqz a1,0xd724 <__pthread_alt_lock+212>
> 0xd740 <__pthread_alt_lock+240>: nop
>
> There is an extra "nop" in the delay slot. I don't think gas is smart
> enough to fill the delay slot. I will put back those ".set noredor".
The solution is to move the move instruction in front of the branch
instruction. The assembler will then move it into the delay slot:
__asm__ __volatile__
("/* Inline compare & swap */\n"
"1:\n\t"
"ll %1,%5\n\t"
"move %0,$0\n\t"
"move %0,%4\n\t"
"bne %1,%3,2f\n\t"
"sc %0,%2\n\t"
"beqz %0,1b\n\t"
"2:\n\t"
"/* End compare & swap */"
: "=&r" (ret), "=&r" (temp), "=m" (*p)
: "r" (oldval), "r" (newval), "m" (*p)
: "memory");
Also this function looks like a good candidate for inlining (Is it actually
inlined? Haven't checked ...) where depending on it's use the address of
*p is loaded twice from the GOT, so changing the code to:
__asm__ __volatile__
("/* Inline compare & swap */\n"
"1:\n\t"
"ll %1,(%5)\n\t"
"move %0,$0\n\t"
"move %0,%4\n\t"
"bne %1,%3,2f\n\t"
"sc %0,(%2)\n\t"
"beqz %0,1b\n\t"
"2:\n\t"
"/* End compare & swap */"
: "=&r" (ret), "=&r" (temp), "=r" (p)
: "r" (oldval), "r" (newval), "r" (p)
: "memory");
will avoid having to pay that PIC bloat twice and get you around the gas
inefficiency of putting in too many nops into PIC code.
Ralf