This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

glibc-2.3 alpha stxncpy.S

FYI, the version of stxncpy.S that got accepted into glibc-2.3 to fix
the stratcliff failure stalls in $u_loop.  According to the 21164 hardware
reference manual, load instructions have a 2 cycle latency.  The code
attempts to use t2 one cycle after the load which causes a 1 cycle stall...

	or	t0, t1, t0	# e0    : current dst word now complete
	subq	a2, 1, a2	# .. e1 : decrement word count
	stq_u	t0, 0(a0)	# e0    : save the current word
	addq	a0, 8, a0	# .. e1 :
	extql	t2, a1, t1	# e0    : extract high bits for next time
	beq	a2, $u_eoc	# .. e1 :
	ldq_u	t2, 8(a1)	# e0    : load high word for next time
	addq	a1, 8, a1	# .. e1 :
	nop			# e0    :
>>> STALLS for 1 cycle to load t2 <<<
	cmpbge	zero, t2, t7	# .. e1 : test new word for eos
	extqh	t2, a1, t0	# e0    : extract low bits for current word
	beq	t7, $u_loop	# .. e1 :

The version of the fix I sent earlier avoided the stall by scheduling
the address increment instructions in the otherwise unused cycle:

Anyway, 1 cycle isn't a big deal, but perhaps a comment should be added to
indicate the stall as is done in $a_loop?

- glen

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]