This is the mail archive of the
mailing list for the glibc project.
Re: Why do you want libc to be 5 times slower than other libraries?
Thank you for a thorough and detailed answer.
Ryan S. Arnold wrote:
> a.) GLIBC is a GPL licensed project where the copyright for the
>code in question has been assigned to the Free Software Foundation. All
>code that is contributed must be copyright assigned to the Free Software
>Foundation. This means that, regardless of the license of the reference
>code, we can not use 'open source' code from other projects unless it
>has been explicitly copyright assigned to the FSF.
This means a lot of extra work reinventing the wheel and solving
problems that others have already solved. Not exactly what the idea of
the open source movement is. Maybe other open source projects are
willing to make a license-sharing agreement so the different projects
can benefit from each other rather than working independently on the
same problems. I explicitly stated in my mail that I was willing to
assign the necessary rights of my code to the Gnu project.
>It is a very limited contribution (in
>our eyes) to offer up a TODO list without following up with the time to
>do the work, prove it, and contribute it while following the proper
>process that makes it possible for us to accept the contribution.
I am fully aware of that, and I would do it all if I had the time.
Unfortunately, I haven't. It would be a lot of work for me just to get
into the proper procedures, and I would still get complaints about using
the wrong type of tabs and spaces or whatever. I am testing different
libraries and different algorithms and telling you which one is fastest
and which ones can be improved. I am offering you optimized code, but I
am not offering the tedious work of fitting it into the form required
>The ïlibc-help mailing list is for a lot of things, not just questions.
>It is a place to develop ideas, vet patches, learn the development
>process, refine patches, etc.
Maybe the list descriptions need updating:
"The libc-alpha list is for the discussion of glibc development"
"The libc-help list is intended for all glibc questions including build
problems, C library usage, and more"
>Performance improvements have been actively pursued for some time,
>especially by the companies who produce the architectures in question.
>Please engage this mailing list and the particular developers indicated
>below if you can identify problems with the current routines.
That's what I am doing
>b.) You didn't CC any of the developers at AMD or Intel who've
>already worked on such optimizations, e.g. Evandro Menezes, Michael
>Meissner, H.J. Lu, Harsha Jagasia, et al.
I don't know them. Thanks for the names.
>d.) Your email didn't indicate how you gathered your data or
>whether you verified that what you were testing is an optimized version
>of the code for the processor in question. It is up to the Linux OS
>distributor to decide whether to compile and ship a CPU optimized
>library for a particular CPU or CPU subtype with their distribution.
>Did you compile your own versions of GLIBC for your tests? Are you sure
>you distribution isn't using the default (non-cpu specific) string
It is not optimized for a specific CPU, that't indeed the problem. I
couldn't find any implementation of libc that has different branches for
different CPU's, e.g. SSE2, SSE3, Intel SSE4, AMD SSE4, etc.
Does such a CPU dispatching exist in libc? How does it work? It should
be possible to compile a static binary on a system with SSE-whatever,
and run it on a system with SSE-something-else. Therefore, I want the
CPU-dispatching to be inside libc.
>e.) Any optimization of critical routines has to take into account
>many factors regarding the data being processed. Of concern is not only
>aligned and unaligned data, but also data length, e.g.
> short-aligned, short-unaligned, long-aligned, long-unaligned
I agree. The performance difference is highest when data are in the
level-1 cache and aligned by less than 16. I just didn't want to bother
you with excessive data when the main conclusion is so clear. The bottom
line is that memory and string functions in libc have poor performance
because you are not using XMM registers and you have no efficient way of
dealing with unaligned data. The most efficient way of copying data when
source and destination have different alignments is to read aligned into
XMM registers; shift and combine consecutive reads so that they fit the
alignment of the destination; then write aligned.
>f.) You'll have to get consensus amongst the concerned parties (and
>with the maintainer) that the trade-offs you're suggesting are
That's why I am discussing it here.