This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Add AVX support to x86-64 _dl_runtime_profile


On Mon, Jul 6, 2009 at 5:16 AM, Ulrich Drepper<drepper@redhat.com> wrote:
> H.J. Lu wrote:
>> This is the updated patch against today's git. I fixed some white spaces
>> and added vector register value check for audit_test.
>
> I look at this and there are several problems. ?Fixing it is not that
> trivial time-wise so I leave it up to you. ?The main problem is that
> you're breaking the ABI. ?The data structures in <bits/link.h> are used
> by existing audit modules. ?Existing code must continue to work.
>
> Therefore the additional of the ymm registers must not disturb them.
> The additional space must be allocated at the end (and no strong
> alignment really, if it causes problems in the data structures).

Done.

> This means that the low part of xmm0-7 will be represented twice. ?This
> introduces another problem but it is unavoidable. ?We therefore always
> have to store xmm0-7 and then the ymm register content conditionally
> when the feature is available. ?And the availability shouldn't be tested
> at every use but instead only once. ?Ideally at program startup time
> when auditing is used.

I used a scheme similar to IFUNC. check_avx is called only once.
After that, _dl_runtime_profile will branch to save_and_restore_vector_sse
or save_and_restore_vector_avx via an indirect jump.

> The big problem introduced this way is how to handle the restoring of
> the register contents. ?Here it is clear that if the audit function
> modifies the xmm[x] and ymm[x] value the code is wrong. ?So we can
> ignore this case. ?Therefore I suggest we restore the ymm[x] content and
> then check whether the xmm[x] content changed by comparing the exposed
> value from the La_x86_64_regs and La_x86_64_retval data structures with
> a shadow copy we made before the call to the audit function. ?If it
> changed we replace to the low half of the ymm register.

Since SSE registers are aliases of the lower 128bits of AVX register,
we only need to restore AVX register if it is available.

> One final problem: your patch only handles 8 ymm registers? ?Why? ?Is
> the calling convention so screwed up that it doesn't allow using all 16
> ymm registers?

Since AVX register is just SSE register extended to 256bit, we only
use the first 8 AVX/SSE registers for passing parameters.

> Please fix these issues and run the resulting code through the simulator
> again.
>

Here is the updated patch tested under AVX emulator.  It defines
La_x86_64_vector as 64byte since we will extend 256bit AVX register
to 512bit in the future. We can support 512bit AVX register with
backward binary compatibility without extending data structure,
which will require save 128bit xmm register 3 times.

Thanks.


-- 
H.J.
2009-07-06  H.J. Lu  <hongjiu.lu@intel.com>

	* config.h.in (HAVE_AVX_SUPPORT): New.

	* config.make.in (config-cflags-avx): New.

	* configure.in: Substitute libc_cv_cc_avx.
	* configure: Regenerated.

	* elf/Makefile (distribute): Add tst-audit4.c tst-auditmod4a.c
	tst-auditmod4b.c.
	(tests): Add tst-audit4 for x86_64.
	(modules-names): Add tst-auditmod4a tst-auditmod4b.
	($(objpfx)tst-audit4): New.
	($(objpfx)tst-audit4.out): Likewise.
	(tst-audit4-ENV): Likewise.
	(CFLAGS-tst-audit4.c): Likewise.
	(CFLAGS-tst-auditmod4a.c): Likewise.
	(CFLAGS-tst-auditmod4b.c): Likewise.

	* elf/tst-auditmod3b.c (pltenter): Check vector register values
	for audit_test.
	(pltexit): Likewise.

	* elf/tst-audit4.c: New.
	* elf/tst-auditmod4a.c: Likewise.
	* elf/tst-auditmod4b.c: Likewise.

	* sysdeps/x86_64/Makefile (gen-as-const-headers): Add
	link-defines.sym.

	* sysdeps/x86_64/bits/link.h (La_x86_64_ymm): New.
	(La_x86_64_vector): Likewise.
	(La_x86_64_regs): Append lr_vector.
	(La_x86_64_retval): Append lr_vector0/lrv_vector1.

	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Move
	saving and restoring SSE registers to ...
	* sysdeps/x86_64/dl-trampoline.h: This.  New.

	* sysdeps/x86_64/dl-trampoline.S: Inclide <config.h> and
	<link-defines.h>.
	(_dl_runtime_profile): Use LR_SIZE to allocate space for
	La_x86_64_regs.  Jump to memory at save_and_restore_vector
	if HAVE_AVX_SUPPORT is defined.
	(save_and_restore_vector_sse): New.
	(save_and_restore_vector_avx): Likewise.
	(check_avx): Likewise.
	(save_and_restore_vector): Likewise.

	* sysdeps/x86_64/elf/configure.in: Set libc_cv_cc_avx and
	HAVE_AVX_SUPPORT.
	* sysdeps/x86_64/elf/configure: Regenerated.

	* sysdeps/x86_64/link-defines.sym: New.

Attachment: libc-avx-3.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]