This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] A method for forcing IFUNC selector
- From: Roland McGrath <roland at hack dot frob dot com>
- To: Paul Pluzhnikov <ppluzhnikov at gmail dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Ondrej Bilka <neleai at seznam dot cz>, Brooks Moses <bmoses at google dot com>
- Date: Thu, 6 Nov 2014 15:06:50 -0800 (PST)
- Subject: Re: [RFC] A method for forcing IFUNC selector
- Authentication-results: sourceware.org; auth=none
- References: <CALoOobMYNLsv6NSmXqwj7j4kCx1XaQU9m0VExFMrtb3SVKpNxg at mail dot gmail dot com> <20141106220528 dot BB51A2C3AC8 at topped-with-meat dot com> <CALoOobO2NrxWFBDkQDH-SZkcvfOmTg3KVQRGMhLWuF6JPo9DVA at mail dot gmail dot com>
> > Please contribute to benchtests so that they are more representative of a
> > variety of workloads.
> Extracting these is quite hard.
That's why I asked *you* to do it. ;-)
> It's trivial to show a synthetic case where new memcpy is 50% slower than
> the old one, but extracting a *real* memcpy trace showing 10% degradation
> is hard because the app is multithreaded, and executes billions of memcpy()s.
benchtests today is a small set of synthetic cases. Adding more synthetic
cases there sounds like an improvement to me.
As a long term thing, we want not only to have tests representative of real
workloads but to have mechanisms for arriving at such tests. Ideally,
whenever someone came along with an ill-served workload like the ones
you've identified, we would point them at a procedure for getting their
workload represented in our performance tests. So anything you and yours
can do to build infrastructure for collecting such traces (seems plausibly
doable via LD_AUDIT, e.g.), turning traces into tests, etc., would be a big
contribution that will keep on giving.
> In theory there is no difference between theory and practice, but in
> practice there is :-)
> In theory I may agree with you: it's bad to allow the user to specify
> internal memcpy symbol name.
> But in practice we need to run real applications, and these show real
I understand. In part, that's why I'm starting by decomposing the problem.
In theory, you would ideally do upstream first everything you want done and
it would solve your problems while helping everyone else too. But in
practice, we'll be conservative and slow about putting such pieces in place
while you need to make progress for your users and will use local
modifications to get your job done. So, for example, we might take a long
time to figure out the tunables infrastructure we're willing to put in, but
you might just make local modifications to use new environment variables
without waiting for that to get hashed out. Meanwhile, for the next layer
down that touches the IFUNC selector code itself and whatnot, you might be
able to develop and prove some code that we'd be entirely happy with and
could hash out the details of sooner--even if there would be no way to get
the parameters into that code until we hash out a tunables infrastructure.
> Also note that specifying internal name maybe isn't so bad -- you can
> disclaim all warranties for any user who does that, and you can warn if
> e.g. internal name no longer exists, or is not available given the CPUID
> feature bits.
Right. This is the sort of compromise we've discussed (rather vaguely) in
the past for the tunables interface.
> > As to the basic way in to do any kind of tweak like this, that is the
> > long-standing subject of "tunables".
> Sorry, I haven't seen that discussion, and don't understand how it would
> help me here.
See https://sourceware.org/glibc/wiki/TuningLibraryRuntimeBehavior for what
Carlos wrote up after the last major round of discussion. He might have
pointers to relevant threads in the list archives, though I don't recall
how much of that discussion was actually just in person at past Cauldrons.
How it would help you is that this is our high-level general-purpose
vaporware plan under which the specific tunability you are asking for would
be a straightforward example.
> > I'm more positively disposed towards ideas like a mask for cpuid
> > feature bits
> Setting or clearing CPUID feature bits will potentially switch multiple
> IFUNCs which I do *not* want to do. I want to only change one thing (at
> least one thing at a time).
Indeed, that's exactly what I meant by, "not actually give you the knob you
really want." A CPUID-level feature that is some sort of analogue to
LD_HWCAP_MASK would be a relatively easy sell with me. That's why I
mentioned it as an example. The implementation-specific "tweak this IFUNC
selector like this" approach is something that I would only entertain in
the context of the non-binding, not entirely a stable ABI, full details of
acceptability yet to be worked out world of tunables.