This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: Andreas Schwab <schwab at linux-m68k dot org>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Sun, 9 Aug 2015 10:56:02 -0700
- Subject: Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.
- Authentication-results: sourceware.org; auth=none
- References: <20150809013434 dot 0B16814B9A at panix1 dot panix dot com> <m28u9lotfk dot fsf at linux-m68k dot org> <55C76FCD dot 5020607 at panix dot com> <CAMe9rOoAWjRma_mG_FazVh3FGOyiGJ=g82=bsfGqa-COnt5p1g at mail dot gmail dot com> <55C78525 dot 40402 at panix dot com>
On Sun, Aug 9, 2015 at 9:51 AM, Zack Weinberg <zackw@panix.com> wrote:
> On 08/09/2015 11:39 AM, H.J. Lu wrote:
>> On Sun, Aug 9, 2015 at 8:20 AM, Zack Weinberg <zackw@panix.com> wrote:
>>> On further investigation it appears not to -- specifically, internal
>>> calls using __GI_foo appear to go straight to the default implementation
>>> of 'foo'.
>>>
>>> If so, I am inclined to think that that is a bug -- there are a *lot* of
>>> internal calls to memset and memcpy in libc, they should not miss out on
>>> architectural tuning. I don't particularly understand how IFUNC works,
>>> but wouldn't it be sufficient to send internal calls to anything with an
>>> IFUNC through the PLT? (I suppose there would then be a question of
>>> whether the architectural optimizations made up for the PLT overhead.)
>>
>> Here is a description of IFUNC:
>>
>> https://sites.google.com/site/x32abi/documents/ifunc.txt?attredirects=0&d=1
>
> Thanks, that clarifies what IFUNC _does_, but it doesn't help me
> understand how it interacts with the libc_hidden_* optimization. I see
> in the code that e.g. __GI_memset is pointed directly at __memset_sse2
> (for amd64) but I do not understand whether that is a limitation of the
> current implementation, a a deliberate choice to avoid indirection at
> the cost of missing out on AVX2 tuning, or both. And if it is a
> limitation, I don't know what options we might have for lifting that
> limitation. I'm sure this was discussed when these patches originally
> landed, but it was long enough ago that I am having trouble finding them
> in the mailing list archive.
Those comments were made when the first IFUNC implementation
was done. We have improved IFUNC implementation since then
and those comments may not be true today. But we have to verify
that at least the extra indirect via PLT doesn't hurt performance on
most of current processors.
--
H.J.