This is the mail archive of the
mailing list for the glibc project.
Re: Doing more inside an ifunc (Was Re: Document use of IFUNC support outside of libc.)
- From: pinskia at gmail dot com
- To: Siddhesh Poyarekar <sid at reserved-bit dot com>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, nd at arm dot com, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
- Date: Fri, 15 Apr 2016 15:09:38 -0700
- Subject: Re: Doing more inside an ifunc (Was Re: Document use of IFUNC support outside of libc.)
- Authentication-results: sourceware.org; auth=none
- References: <56D8A849 dot 1020109 at redhat dot com> <56D9CBCB dot 2060207 at arm dot com> <56DA02C5 dot 7030208 at redhat dot com> <56DDBB64 dot 9050800 at arm dot com> <20160415151020 dot GA4831 at devel dot intra dot reserved-bit dot com>
> On Apr 15, 2016, at 8:10 AM, Siddhesh Poyarekar <email@example.com> wrote:
>> On Mon, Mar 07, 2016 at 05:33:24PM +0000, Szabolcs Nagy wrote:
>> there seems to be interest in optimizations/dispatch based
>> on the micro architecture which is not easily available in
>> userspace currently (on aarch64).
> Sorry, I was interested in this conversation but completely missed it,
> so starting it again. I hope it's not too late :)
>> linux exports various cpu info in /sys but that is not
>> stable abi and users probably don't want large number of
>> syscalls traversing the /sys tree at process startup just
>> to get slightly better tuned memcpy or similar.
>> one idea by Adhemerval Zanella was to use vdso for this.
>> (the kernel can provide a versioned function symbol there
>> to return a pointer to some cpu info struct, which can be
>> read only thus shared across processes).
>> there is no proposed design for this yet either on kernel
>> or libc side, but it would make sense if ifunc could use it.
>> currently the only reliable mechanisms for ifunc dispatch
>> are hwcap feature bits (if passed as argument) or cpuid
>> like instruction (e.g. on aarch64 cpuid like instructions
>> are not available to userspace, but can be emulated by the
>> kernel or provided as syscall, in either case it would be
>> context switch into the kernel, which can be bad if large
>> number of ifunc resolvers do it e.g. because function multi-
>> versioning is implemented that way, unless there is some
>> caching mechanism which is also not easy to do in ifunc...)
> The context switch is not the worst thing that can happen for the
> emulated instructions because we can easily cache the result and
> reduce the number of context switches to a minimum. The difficult bit
> for the emulated instruction (MRS) is heterogenous systems, where it
> would be difficult (impossible?) for userspace to just use the
> emulated instruction to deterministically identify all of the
> processor cores.
> So the emulated instruction will only work for specific processor
> cores that are known to always be in a homogenous configuration and
> never otherwise. For anything else, we will need the kernel to give
> us full information about all of the cores in another way, either via
> sysfs or vdso. The sysfs route has been proposed earlier but is
> hairy for us because it traverses the filesystem to identify all CPU
> cores, resulting in a proportional number of syscalls. The vdso
> alternative is better because the kernel can then give us all of the
> information in exactly one call and avoid the context switch at the
> same time.
> I had hacked up a patch to test using the sysfs patches in  and it
> required reimplementing some string functions to avoid referencing
> them but that was about the only thing needed to get it working.
> Safety however is a completely different issue and I don't know if we
> can even guarantee that during symbol resolution.
I gave an alternative to this approach by passing midr via the aux vector. It still is useful and we can change the kernel to have it return unknown for those known values which will be used for big.little. I don't have a link to my implementation right now though as I am traveling. This is much safer and easier to the black listing inside the kernel and the aux vector is basically free no open/read/close from ifunc or early launch either.
>  https://lkml.org/lkml/2015/9/16/452