This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 0/2] Multiarch hooks for memcpy variants
On Wed, Aug 16, 2017 at 8:28 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Zack Weinberg wrote:
>>
>> Last time we had this argument, someone (Ondrej?) claimed that the
>> overhead of going through an ifunc for intra-libc calls (specifically
>> to memcpy, IIRC) was dwarfed by the I-cache costs of having both the
>> generic and the targeted version of the function get used. I would
>> really like to see measurements addressing that specific point.
>
> I think it might be more easily measured if we make the effect much worse,
> for example by adding several KB of NOPs at entry of generic memcpy.
I think this needs to be an A/B test of the real code before and after
the real proposed change (i.e. sending intra-libc calls to memcpy
through the PLT and the ifuncs) in order to resolve the argument to
everyone's satisfaction. `perf`, looking specifically at all levels
of cache misses, ought to be able to pick out the signal even without
an artificial penalty.
> I could easily generate a trace of internal calls to memcpy, however the key
> question is which functions in GLIBC use memcpy in performance critical
> ways and which applications make heavy use of those?
I don't know. Maybe start with whole-program tests on big complicated
applications like Firefox and LibreOffice? Web and database servers
might also be interesting.
zw