This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: Optimized memset for Kunpeng processor.

Hi Xuelei,

> Due to the branch prediction issue of Kunpeng processor, we found
> memset_generic has poor performance on middle sizes setting, and so
> we reconstructed the logic, expanded the loop by 3 times in set_long
> to solve the problem, even when setting below 1K sizes have benefit.

Would it not make more sense to use traditional unrolling? Eg. process
128 or 256 bytes per iteration instead of 3x 64?

> Another change is that DZ_ZVA seems no work when setting zero, so we
> discarded it and used set_long to set zero instead. Fewer branches and
> predictions also make the zero case have slightly improvement.

You mean DC_ZVA does not work (ie. disabled in OS) or it doesn't give a
speedup? That sounds odd...

+	cmp	count, 128
+	b.hs	L(set_long)
+	cmp	count, 16
+	b.lo	L(less16)

Wouldn't it make more sense to first test for the small case?

+	ands	tmp1, dstin, 15
+	bne	2f

Is there really a gain in splitting out the aligned from unaligned case here?
You could either always align (which means 1 extra store) or just keep the
unaligned case (which uses fewer instructions, and will be best if already

+	tbz	count, 5, 1f
+	stp	q0, q0, [dstin, 64]
+1:	stp	q0, q0, [dstend, -32]

+	tbz	count, 5, 3f
+	stp	q0, q0, [dst, 64]
+3:	stp	q0, q0, [dstend, -48]

There is little point in branching over just 1 instruction - it's cheaper just
to execute it than risk the misprediction (it would need to use dstend
rather than dstin).

+1:	tbz	count, 5, 2f
+	str	q0, [dst, 32]
+	str	q0, [dst, 48]

stp? And the branch over 1 instruction comment applies here too.

+2:	stp	q0, q0, [dstend, -32]
+	ret

+	and	valw, valw, 255

valw is unused...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]