This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Tue, 28 Jul 2015 13:55:54 -0700
- Subject: Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOoXLPUr_LUexoRKjrCdNhP0J8EMY+1XNAaLnpW1qknb7w at mail dot gmail dot com> <20150709142827 dot GA18030 at domone> <CAMe9rOoXCwiPdQVP7_tV7599f6y9w_n1P+SXsE7urb69f3v7gA at mail dot gmail dot com> <20150711104654 dot GA26570 at domone> <20150711202742 dot GA9074 at gmail dot com> <20150711235002 dot GA7543 at gmail dot com> <20150726131622 dot GA10623 at domone> <CAMe9rOre_GQimKou2PXjp95xcfN1jYO5-tkEAB7eMbP1HMO+FQ at mail dot gmail dot com> <20150727101015 dot GA489 at domone> <CAMe9rOrpDk6ixUJ+9RU5L0aV=uLUJzzNtJc-XuPURTFHhXzGRw at mail dot gmail dot com> <20150727132623 dot GA13448 at domone> <CAMe9rOpwaByc1ogE2Y4fJzc_hwNo5+B23F7T1Qdsv2qKJy8DcQ at mail dot gmail dot com>
On Mon, Jul 27, 2015 at 6:37 AM, H.J. Lu <email@example.com> wrote:
> On Mon, Jul 27, 2015 at 6:26 AM, OndÅej BÃlka <firstname.lastname@example.org> wrote:
>> On Mon, Jul 27, 2015 at 06:14:07AM -0700, H.J. Lu wrote:
>>> >> There is a potential performance issue. This won't change parameters
>>> >> passed in S256-bit/512-bit vector registers because SSE load will only
>>> >> update the lower 128 bits of 256-bit/512-bit vector registers while
>>> >> preserving the upper bits. But these SSE load operations may not be
>>> >> fast on all current and future processors. To load the entire
>>> >> 256-bit/512-bit vector registers, we need to check CPU feature in
>>> >> each symbol lookup. On the other hand, we can compile x86-64 ld.so
>>> >> with -msse2. I don't know what the final performance impact is.
>>> > Yes, these should be saved due problems with modes. There could be
>>> > problem that saving these takes longer. You don't need
>>> > check cpu features on each call.
>>> > Make _dl_runtime_resolve a function pointer and on
>>> > startup initialize it to correct variant.
>>> One more indirect call.
>> no, my proposal is different, we could do this:
>> void *_dl_runtime_resolve;
>> int startup()
>> if (has_avx())
>> _dl_runtime_resolve = _dl_runtime_resolve_avx;
>> _dl_runtime_resolve = _dl_runtime_resolve_sse;
>> Then we will assign correct variant.
> Yes, this may work for both _dl_runtime_profile and
> _dl_runtime_resolve. I will see what I can do.
Please try hjl/pr18661 branch. I implemented:
0000000000016fd0 t _dl_runtime_profile_avx
0000000000016b50 t _dl_runtime_profile_avx512
0000000000017450 t _dl_runtime_profile_sse
00000000000168d0 t _dl_runtime_resolve_avx
0000000000016780 t _dl_runtime_resolve_avx512
0000000000016a20 t _dl_runtime_resolve_sse