This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)


On Thu, Jan 15, 2015 at 03:48:47PM -0000, Wilco Dijkstra wrote:
> Roland McGrath wrote:
> > Wilco Dijkstra wrote:
> > > We need something like this in string.h so we always optimize all calls to
> > > standard optimized functions, irrespectively of the compiler and options used:
> > 
> > We would need that if we wanted to do that.  But these entrypoints are all
> > old and deprecated.  They are only for the benefit of old code.  Any code
> > so old that it hasn't been touched since there were actually systems to
> > build it on that don't have the C89 standard functions surely has worse
> > performance issues than this.  Making the deprecated functions optimal only
> > encourages people to keep using them.
> 
> Agreed, however they appear to be used in a lot of code, including benchmarks.
> For example a quick grep shows there are a large number of occurrences of 
> bzero and bcopy in SPEC2006.
>
Also gcc could optimize memset to __bzero, I will probably write patch
for x64 to save few cycles. There is omplication that gcc could use
memset return value so we need to check if its dead or create new
symbol.
 
> > > Now the only remaining one to deal with is mempcpy - I'd like something like
> > > this in string/strings2.h:
> > 
> > Why?  It's trivial enough for each memcpy implementation to implement
> > mempcpy too, and for many implementations rolling it in might save an
> > instruction or two over the generic addition.  It doesn't seem worth
> > the complexity to bother with anything in the header files.
> 
> Back to mempcpy, not only is inlining mempcpy simple and a good idea, it is
> also the most efficient implementation. If you create a separate optimized
> implementation of mempcpy, it requires 1-2 extra instructions and increases
> pressure on caches and branch predictors. Another approach would be to set

That was previously mentioned in parent thread. With separate mempcpy
you will likely pay additional 100 cycle penalty as mempcy is not called
often.

> the return value at the start of memcpy so that mempcpy can jump past it. 
> This means 1 extra instruction in every memcpy invocation plus an extra
> branch for mempcpy.

That is false. You need to copy starting memcpy fragment until you set
return value and then jump which gives no overhead to memcpy.

That could be problematic on some architectures as you need to do it
without spilling extra register.

> Neither option is clearly better than just inlining. 
> This ignores the additional effort to write/test mempcpy which could be 
> spent on more important things. It appears most targets have not bothered 
> with mempcpy as a result.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]