This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]

From: Florian Weimer <fweimer at redhat dot com>
To: "H.J. Lu" <hjl dot tools at gmail dot com>, Carlos O'Donell <carlos at redhat dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 20 Oct 2017 15:21:24 +0200
Subject: Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=fweimer at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D683E2DD4E
References: <20171019174144.GA26576@gmail.com> <333f8427-2fce-f32a-604e-7e635df4ebf2@redhat.com> <CAMe9rOrNDJAW6equMd5G-1NEMawSGfPCj5D14a7Z_Q8yNjttFw@mail.gmail.com> <52541476-9a52-a82d-e058-6ddc6404f037@redhat.com> <CAMe9rOqsE=eiUKAbEYeVJ7KXgPYk=eU87bjD1OAqFy_iUOSBvA@mail.gmail.com> <8ba10843-d6e0-b55a-4f86-fcebc4356e59@redhat.com>

On 10/20/2017 02:58 PM, Florian Weimer wrote:

On 10/20/2017 01:09 PM, H.J. Lu wrote:
When there are many DSOs, it takes more time to lookup a symbol
and time to save/restore vector registers becomes noise.   The only
case when time to save/restore vector registers becomes non-trivial is

1. There are a few DSOs so that symbol lookup takes fewer cycles.  And
2. There are many external function calls which are executed onlyonce. And
3. These external functions take very few cycles.

I can create such a testcase.  But I don't think it is a typical case.
Completely agree. Basically, a program which is affected would have to(a) call many functions, (b) with short symbol lookup chains, and (c) dovery little actual work. This seems to be a very unlikely scenario.
I have a test case. GCC scales poorly with many function calls, and itis difficult to get --export-dynamic to work with recent GCC/binutils. Iwill try to run it on various machines.


LD_DEBUG=statistics shows this:

      9506:
      9506:     runtime linker statistics:
      9506:       total startup time in dynamic loader: 19960074 cycles

9506: time needed for relocation: 19105814 cycles(95.7%)

      9506:                      number of relocations: 87
      9506:           number of relocations from cache: 3
      9506:             number of relative relocations: 1226

9506: time needed to load objects: 701382 cycles(3.5%)

      9506:
      9506:     runtime linker statistics:
      9506:                final number of relocations: 781589
      9506:     final number of relocations from cache: 3

This is a main program which contains 1,500 function calls. Thefunctions are defined in a single DSO, and each function calls 520 otherfunctions, giving a total number of 781,500 relocations from the test.

On my laptop (Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz), I get this(ten runs, real time measured in seconds):


> t.test(prev_laptop, after_laptop)

	Welch Two Sample t-test

data:  prev_laptop and after_laptop
t = -14.932, df = 18, p-value = 1.392e-11
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.05749145 -0.04330855
sample estimates:
mean of x mean of y
   0.2345    0.2849

So it's definitely not in the noise. The penalty appears to be around65ns per relocation.

I said in the past that we should use XSAVE in the trampoline, so thatwe do not have to touch the dynamic linker for each new CPU generation,and I think that alone is worth the slight additional cost.


I'll check a few additional machines over the coming hours.

Note that XSAVE will still not allow us to support *arbitrary* callingconventions, so we shouldn't advertise it as such. But hopefully, itwill be sufficient to get the ABI-violating binaries mentioned in bug21265 back into working order.


Thanks,
Florian

Follow-Ups:
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: Florian Weimer

References:
- RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: H.J. Lu
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: Carlos O'Donell
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: H.J. Lu
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: Carlos O'Donell
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: H.J. Lu
- Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]