This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] A method for forcing IFUNC selector
- From: Paul Pluzhnikov <ppluzhnikov at gmail dot com>
- To: GLIBC Devel <libc-alpha at sourceware dot org>
- Cc: Ondrej Bilka <neleai at seznam dot cz>, Brooks Moses <bmoses at google dot com>
- Date: Thu, 6 Nov 2014 12:12:04 -0800
- Subject: [RFC] A method for forcing IFUNC selector
- Authentication-results: sourceware.org; auth=none
Greetings,
This commit:
commit 2d48b41c8fa610067c4d664ac2339ae6ca43e78c
Author: Ondrej Bilka <neleai@seznam.cz>
Date: Mon May 20 08:20:00 2013 +0200
Faster memcpy on x64.
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
changed the default memcpy selected on all of our processors from
__memcpy_ssse3_back to __memcpy_sse2_unaligned.
That caused a nice 2-3% improvement on some of our benchmarks (thanks!),
but also 10-15% degradation on others (boo!).
It appears that for certain sizes and alignments, the new memcpy could be
50% slower than the old one.
While we figure out how to re-tune our applications to get rid of the
"slow" size/alignment memcpy()s, we'd like to keep the applications that
suffer degradation on the old memcpy.
Unfortunately, glibc currently provides no way to do that [1].
Proposal: a new environment variable, say LD_IFUNC_SELECTOR, that will
contain semi-colon separated list of ifunc->implementation mappings that
the end-user desires to force. E.g. for our degraded applications, we
would set LD_IFUNC_SELECTOR to "memcpy=__memcpy_ssse3_back", while someone
who also wanted to force strcmp to __strcmp_sse42 would set it to
"memcpy=__memcpy_ssse3_back;strcmp=__strcmp_sse42".
This mechanism could also simplify debugging when one particular
implementation appears to be broken (as has happened in this past).
If this is an acceptable approach, I will send a patch to implement it.
Thanks,
[1] Or rather, I have not found a way to do that without a gross hack.
If there is a way, I am all ears.
--
Paul Pluzhnikov