Doing more inside an ifunc (Was Re: Document use of IFUNC support outside of libc.)

Fri Apr 15 23:51:00 GMT 2016

> On Apr 15, 2016, at 8:10 AM, Siddhesh Poyarekar <sid@reserved-bit.com> wrote:
> 
>> On Mon, Mar 07, 2016 at 05:33:24PM +0000, Szabolcs Nagy wrote:
>> there seems to be interest in optimizations/dispatch based
>> on the micro architecture which is not easily available in
>> userspace currently (on aarch64).
> 
> Sorry, I was interested in this conversation but completely missed it,
> so starting it again.  I hope it's not too late :)
> 
>> linux exports various cpu info in /sys but that is not
>> stable abi and users probably don't want large number of
>> syscalls traversing the /sys tree at process startup just
>> to get slightly better tuned memcpy or similar.
>> 
>> one idea by Adhemerval Zanella was to use vdso for this.
>> (the kernel can provide a versioned function symbol there
>> to return a pointer to some cpu info struct, which can be
>> read only thus shared across processes).
>> there is no proposed design for this yet either on kernel
>> or libc side, but it would make sense if ifunc could use it.
>> 
>> currently the only reliable mechanisms for ifunc dispatch
>> are hwcap feature bits (if passed as argument) or cpuid
>> like instruction (e.g. on aarch64 cpuid like instructions
>> are not available to userspace, but can be emulated by the
>> kernel or provided as syscall, in either case it would be
>> context switch into the kernel, which can be bad if large
>> number of ifunc resolvers do it e.g. because function multi-
>> versioning is implemented that way, unless there is some
>> caching mechanism which is also not easy to do in ifunc...)
> 
> The context switch is not the worst thing that can happen for the
> emulated instructions because we can easily cache the result and
> reduce the number of context switches to a minimum.  The difficult bit
> for the emulated instruction (MRS) is heterogenous systems, where it
> would be difficult (impossible?) for userspace to just use the
> emulated instruction to deterministically identify all of the
> processor cores.
> 
> So the emulated instruction will only work for specific processor
> cores that are known to always be in a homogenous configuration and
> never otherwise.  For anything else, we will need the kernel to give
> us full information about all of the cores in another way, either via
> sysfs or vdso.  The sysfs route has been proposed earlier[1] but is
> hairy for us because it traverses the filesystem to identify all CPU
> cores, resulting in a proportional number of syscalls.  The vdso
> alternative is better because the kernel can then give us all of the
> information in exactly one call and avoid the context switch at the
> same time.
> 
> I had hacked up a patch to test using the sysfs patches in [1] and it
> required reimplementing some string functions to avoid referencing
> them but that was about the only thing needed to get it working.
> Safety however is a completely different issue and I don't know if we
> can even guarantee that during symbol resolution.

I gave an alternative to this approach by passing midr via the aux vector. It still is useful and we can change the kernel to have it return unknown for those known values which will be used for big.little. I don't have a link to my implementation right now though as I am traveling.  This is much safer and easier to the black listing inside the kernel and the aux vector is basically free no open/read/close from ifunc or early launch either.

Thanks,
Andrew

> 
> Siddhesh
> 
> [1] https://lkml.org/lkml/2015/9/16/452