This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 0/2] Multiarch hooks for memcpy variants

From: Zack Weinberg <zackw at panix dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Siddhesh Poyarekar <siddhesh at gotplt dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
Date: Wed, 16 Aug 2017 21:09:54 -0400
Subject: Re: [PATCH 0/2] Multiarch hooks for memcpy variants
Authentication-results: sourceware.org; auth=none
References: <DB6PR0801MB20534ED1010DDF1B033821EE83890@DB6PR0801MB2053.eurprd08.prod.outlook.com> <18d2fdf8-ca55-1ded-fa66-3509b3bcf8fe@gotplt.org> <598DF02B.8010607@arm.com> <CAKCAbMg27DXDe=5vCCtBAW-g5BUkHKPb=_VTV7kr6cq_U91-Cg@mail.gmail.com> <4072a19f-eecb-8cdd-889f-46b4c8b968b4@gotplt.org> <CAKCAbMh8=u27ZcS9La4SdQ3UiHi76TZdv_KSCpX0pkY8WMohOQ@mail.gmail.com> <DB6PR0801MB20538D64F211A965ED3E806D838C0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <CAKCAbMhyg=WMgQet2EkeAw7m-FM=hWXqU8DcaU9=Bv19dnjS+Q@mail.gmail.com> <DB6PR0801MB2053D79838EE5B0FD134AB9483820@DB6PR0801MB2053.eurprd08.prod.outlook.com>

On Wed, Aug 16, 2017 at 8:28 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Zack Weinberg wrote:
>>
>> Last time we had this argument, someone (Ondrej?) claimed that the
>> overhead of going through an ifunc for intra-libc calls (specifically
>> to memcpy, IIRC) was dwarfed by the I-cache costs of having both the
>> generic and the targeted version of the function get used. I would
>> really like to see measurements addressing that specific point.
>
> I think it might be more easily measured if we make the effect much worse,
> for example by adding several KB of NOPs at entry of generic memcpy.

I think this needs to be an A/B test of the real code before and after
the real proposed change (i.e. sending intra-libc calls to memcpy
through the PLT and the ifuncs) in order to resolve the argument to
everyone's satisfaction.  `perf`, looking specifically at all levels
of cache misses, ought to be able to pick out the signal even without
an artificial penalty.

> I could easily generate a trace of internal calls to memcpy, however the key
> question is which functions in GLIBC use memcpy in performance critical
> ways and which applications make heavy use of those?

I don't know.  Maybe start with whole-program tests on big complicated
applications like Firefox and LibreOffice?  Web and database servers
might also be interesting.

zw

References:
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Wilco Dijkstra
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Siddhesh Poyarekar
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Szabolcs Nagy
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Zack Weinberg
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Siddhesh Poyarekar
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Zack Weinberg
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Wilco Dijkstra
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Zack Weinberg
- Re: [PATCH 0/2] Multiarch hooks for memcpy variants
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]