This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 0/2] Multiarch hooks for memcpy variants
On Monday 14 August 2017 04:06 PM, Wilco Dijkstra wrote:
> I don't believe you can resolve this generally, it's highly dependent on the details.
> If the generic implementation is very efficient, the possible gain of specialized
> ifuncs may be so low that it can never offset the overhead of an ifunc. Also note
> that you're always slowing down the generic case, so if that version is used in
> many cases, an ifunc wouldn't make sense.
>
> I haven't looked in detail at memcpy use in GLIBC, however if the statistics are
> similar to typical use I measured then it makes no sense to use ifuncs. Large
> copies can benefit from special tweaks, and in that case the overhead of an ifunc
> would be much smaller (both relatively and absolutely due to lower frequency), so
> that's where an ifunc might be useful.
The first part is not true for falkor since its implementation is a good
10-15% faster on the falkor chip due to its design differences. glibc
makes pretty extensive use of memcpy throughout, but I don't have data
on how much difference a core-specific memcpy will make there, so I
don't have enough grounds for a generic change.
Your last point about hurting everything else is very valid though; it's
very likely that adding an extra indirection in cases where
__memcpy_generic is going to be called anyway is going to be expensive
given that a bulk of the memcpy calls will be for small sizes of less
than 1k.
Allowing a PLT only for __memcpy_chk and mempcpy would need a test case
waiver in check_localplt and that would become a blanket OK for PLT
usage for memcpy, which we don't want. Hence my patch is probably the
best compromise, especially since there is precedent for the approach in
x86.
Siddhesh