This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Why do you want libc to be 5 times slower than other libraries?

From: Agner Fog <agner at agner dot org>
To: rsa at us dot ibm dot com
Cc: libc-help at sourceware dot org, Roland McGrath <roland at redhat dot com>
Date: Wed, 06 Aug 2008 10:04:03 +0200
Subject: Re: Why do you want libc to be 5 times slower than other libraries?
Organization: agner@agner.org
References: <4892AB88.9040905@agner.org> <489812AA.6070609@agner.org> <1217955154.7784.21.camel@localhost>

Thank you for a thorough and detailed answer.

Ryan S. Arnold wrote:

> a.) GLIBC is a GPL licensed project where the copyright for the
>code in question has been assigned to the Free Software Foundation. All
>code that is contributed must be copyright assigned to the Free Software
>Foundation. This means that, regardless of the license of the reference
>code, we can not use 'open source' code from other projects unless it
>has been explicitly copyright assigned to the FSF.

This means a lot of extra work reinventing the wheel and solving problems that others have already solved. Not exactly what the idea of the open source movement is. Maybe other open source projects are willing to make a license-sharing agreement so the different projects can benefit from each other rather than working independently on the same problems. I explicitly stated in my mail that I was willing to assign the necessary rights of my code to the Gnu project.

>It is a very limited contribution (in
>our eyes) to offer up a TODO list without following up with the time to
>do the work, prove it, and contribute it while following the proper
>process that makes it possible for us to accept the contribution.

I am fully aware of that, and I would do it all if I had the time. Unfortunately, I haven't. It would be a lot of work for me just to get into the proper procedures, and I would still get complaints about using the wrong type of tabs and spaces or whatever. I am testing different libraries and different algorithms and telling you which one is fastest and which ones can be improved. I am offering you optimized code, but I am not offering the tedious work of fitting it into the form required for libc.

>The ïlibc-help mailing list is for a lot of things, not just questions.
>It is a place to develop ideas, vet patches, learn the development
>process, refine patches, etc.

Maybe the list descriptions need updating: "The libc-alpha list is for the discussion of glibc development" "The libc-help list is intended for all glibc questions including build problems, C library usage, and more"

>Performance improvements have been actively pursued for some time,
>especially by the companies who produce the architectures in question.
>Please engage this mailing list and the particular developers indicated
>below if you can identify problems with the current routines.

That's what I am doing

>b.) You didn't CC any of the developers at AMD or Intel who've
>already worked on such optimizations, e.g. Evandro Menezes, Michael
>Meissner, H.J. Lu, Harsha Jagasia, et al.

I don't know them. Thanks for the names.

>d.) Your email didn't indicate how you gathered your data or >whether you verified that what you were testing is an optimized version >of the code for the processor in question. It is up to the Linux OS >distributor to decide whether to compile and ship a CPU optimized >library for a particular CPU or CPU subtype with their distribution. >Did you compile your own versions of GLIBC for your tests? Are you sure >you distribution isn't using the default (non-cpu specific) string routines?

It is not optimized for a specific CPU, that't indeed the problem. I couldn't find any implementation of libc that has different branches for different CPU's, e.g. SSE2, SSE3, Intel SSE4, AMD SSE4, etc.

Does such a CPU dispatching exist in libc? How does it work? It should be possible to compile a static binary on a system with SSE-whatever, and run it on a system with SSE-something-else. Therefore, I want the CPU-dispatching to be inside libc.

>e.) Any optimization of critical routines has to take into account
>many factors regarding the data being processed. Of concern is not only
>aligned and unaligned data, but also data length, e.g.
> short-aligned, short-unaligned, long-aligned, long-unaligned

I agree. The performance difference is highest when data are in the level-1 cache and aligned by less than 16. I just didn't want to bother you with excessive data when the main conclusion is so clear. The bottom line is that memory and string functions in libc have poor performance because you are not using XMM registers and you have no efficient way of dealing with unaligned data. The most efficient way of copying data when source and destination have different alignments is to read aligned into XMM registers; shift and combine consecutive reads so that they fit the alignment of the destination; then write aligned.

>f.) You'll have to get consensus amongst the concerned parties (and
>with the maintainer) that the trade-offs you're suggesting are
>appropriate.

That's why I am discussing it here.

Follow-Ups:
- Re: Why do you want libc to be 5 times slower than other libraries?
  - From: Ryan S. Arnold

References:
- Memory and string functions can be improved dramatically on x86 andx86-64
  - From: Agner Fog
- Why do you want libc to be 5 times slower than other libraries?
  - From: Agner Fog
- Re: Why do you want libc to be 5 times slower than other libraries?
  - From: Ryan S. Arnold

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]