This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: V3 [PATCH] aarch64: optimized memcpy implementation for thunderx2


On 10/11/18 6:32 AM, Anton Youdkevitch wrote:
>> be unmeasurable compared to the load.  I do suggest you use
>> properly pc-relative addresses in that case though.
>> I.e. "L(foo) - .".
> Now I do not follow. Why is the existing addressing is a not
> proper pc-relative one except for the part that it employs
> the fact that the distance is small and adrp is not needed?
> Or this is what you actually meant?

I suppose it doesn't matter, now that I write it out and count instructions,
but it would be the difference between

	adrp	tmp2, L(ext_table)
	add	tmp2, tmp2, :lo12:L(ext_table)
	ldr	tmp2, [tmp2, tmp1, LSL #3]
	adr	tmp3, L(load_and_merge)
	add	tmp2, tmp2, tmp3
	br	tmp2

and

	adrp	tmp2, L(ext_table)
	add	tmp2, tmp2, :lo12:L(ext_table)
	add	tmp2, tmp1, LSL #3
	ldr	tmp3, [tmp2]
	add	tmp2, tmp2, tmp3
	br	tmp2

If you're going to subtract L(load_and_merge), you might even save memory by
noting that the displacements fit in bytes instead of quads.

> Also, the "dot" cannot be used for for cross-section address
> generation.

Absolutely it can.  It is in fact exactly R_AARCH64_PREL64.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]