This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: The CPU run-time library for C


Disclaimer: While I work for Oracle, I am not authorized on comment on Oracle product plans.

My work focus has been primarily on performance issues on various systems and
HW platforms over the years. I am sympathetic to the desire for performance
improvements to get into actual end-user's hands as quickly as is consistent
with security, reliability, etc.  I don't believe the add-on glibc_(mem/string) approach will achieve this goal for most vendors and most vendor-dependent users.

Ideally, any supported release of a product will require testing, documentation,
and a QA phase. Most vendors support multiple releases at any given time.
For something likely tightly tied to glibc would require QA work for each
glibc_(mem/string) with each glibc_(base). If a vendor has only 3 of each at any given time, that would still mean 9 units of QA work instead of 3 units of QA work. The potential market benefit would be small compared to the additional overhead
of just the QA work. Once you add in the increased cost of applying fixes
more yet more source trees [six instead of three in the above example],
it hardly seems an attractive path for SW maintenance.

Today, a vendor can select the upstream performance related patches they
perceive as useful to their customers and apply them to their next update
of their newest glibc release. Older releases are likely to be left unchanged
as customers on older releases implicitly prefer stability. If they wanted
the latest stuff they'd switch to the newest vendor release.

To get improvements to customers faster, we need vendors to have pressure
from customers to make those improvements available. That means
customers need to be aware that improvements are happening.
Even simple synthetic open source benchmarks with a reasonable range of
input values can be useful in this regard. Then one can say:
"On the glibc strcpy benchmark, for platform y, the new strcpy code runs x% faster."
Simple, quantitative, easy to grasp the improvement, and easy to validate
by anyone with access to the src, the test, and platform y.
Then a vendor could pick up a set of improvements and tell customers
that "our newest version of glibc runs %x to %y faster on a range of
commonly used functions (see appendix for details) than glibc version zzz."
Customers who care would gravitate to vendors who release improvements
more quickly, giving vendors a reason to port the upstream improvements
more quickly.

- patrick


On 12/4/2018 2:34 PM, Carlos O'Donell wrote:
On 12/4/18 1:12 PM, Siddhesh Poyarekar wrote:
On 03/12/18 11:16 PM, H.J. Lu wrote:
1. Install libcpu-rt-c binary from their OS vendors if available.
I'm curious to know what OS vendors think of this.  AFAICT, it's not
too different from shipping an alternate glibc and in some ways, the
latter might just be easier than munging scripts to build a separate
library.

Also, if the same ABI guarantees are expected of this new library,
then again would OS vendors prefer to ship a whole new library or
would they be better off just backporting these new routines?

Basically, this doesn't make sense if OS vendors aren't going to ship
it.  Building in this complexity just to make a downloadable binary
in some arbitrary place sounds like an ugly hack that will come to
bite us later.
H.J. posted an early RFC in June:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00259.html

My summary of consensus in June was:

- Suggest implementing in a distinct project: Adhemerval, Florian, Carlos.

- Request simpler design: Florian, Siddhesh.

(1) Why not an external preloadable library?

This RFC appears unchanged from the original proposal and the outstanding
comments do not appear to have been discussed in any further detail.
Particularly the cost/benefit ratio to the project to accept such patches
versus a simpler mechanism. Likewise why "most" of user needs cannot be met
by something like the ARM's cortex-strings, which doesn't need deep
integration with glibc-specific features.

(2) Current libcpu-rt-c proposal does not meed OS vendor needs.

The present libcpu-rt-c proposal as-is is not usable by OS vendors;
replacing the core string routines is equivalent to a library rebase
and requires revalidation efforts by the distribution and by QE. This
makes it *almost* as difficult to rebase and update libcpu-rt-c as it is
to rebase and update glibc (not to mention it requires using DTS in RHEL
to get a new-enough compiler/binutils). The other consequence is that a
newer compiler/binutils may need a newer gdb to even be able to debug
the code in question, and the problem is compounded. No distro that
I'm aware of has ever delivered something like this.

OS vendors already have process to backport IFUNC and other
improvements to stable branches, and we do this in RHEL for Intel,
IBM, and ARM (just look at our public glibc.spec %changelog) e.g.
- Improve libm performance AArch64 (#1302086)
- Improve memcpy performance for POWER9 DD2.1 (#1498925)
- Add Intel AVX-512 optimized routines (#1298526).
- Improve performance on Intel Purley (#1335286).
- Add support for new IBM z14 (s390x) instructions (#1375235)

If you need key routines backported, please work with your
distribution contact to have key support backported. RHEL
point releases happen frequently.

Therefore this proposal only adds work to upstream glibc, and
doesn't provide customers with a supported libcpu-rt-c. At most
it gives customers a way to improve performance by using
libraries provided by a 3rd party. That 3rd party could equally
deploy a custom glibc and tell the customer to use that.

(3) Solution is too costly in terms of maintenance.

The solution lacks the simplicity of plans like --enable-math-private.

In this patch set from Florian:
https://sourceware.org/ml/libc-alpha/2018-09/msg00368.html

We see a proposal that is much simpler for the math routines.
In particularly building libm.so such that it is distinct from
glibc and can be preloaded. This is easier for libm functions
because they are so distinct from libc, but it's just an example
of the kind of well isolated solutions which are desirable
from upstream.

My opinion is that unless the solution becomes drastically
simpler that it has too high a cost in terms of maintenance
for the problem it solves.

---

In summary:

(1) Could solve "most" of the problem with an external
     pre-loadable library, wihtout all the bells-and-whistles
     glibc has (tunables, etc) e.g. ARM's cortex-strings.

(2) Difficult to support from an OS vendor point of view.
     Easier to just ship a new glibc.

(3) Costly in terms of maintenance for the value it provides.
     Cost is ongoing maintenance and support of lots of
     conditionals to enable 3rd parties providing parts of
     new glibc's functionality to users.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]