This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] Inline mempcpy


> > On Wed, May 13, 2015 at 10:21:25PM +0000, Joseph Myers wrote:
> > > On Wed, 13 May 2015, Ondřej Bílka wrote:
> > >
> > > > Hi,
> > > > As pointed out that following patch should be generic
> > > > http://patchwork.sourceware.org/patch/6459/
> > > > here is sample patch that does it. Header for mempcpy now becomes
> > > > following:
> > > >
> > > > #ifdef __USE_GNU
> > > > # if !defined _HAVE_STRING_ARCH_mempcpy || defined _FORCE_INLINES
> > >
> > > Doesn't this change the semantics of _HAVE_STRING_ARCH_mempcpy?  That is,
> > > after this patch, architectures with efficient .S implementations of
> > > mempcpy (as opposed to ones that just use the default .c implementation)
> > > should define that macro rather than just those with inline
> > > implementations.  So the patch needs to update architectures as well.
> > >
> > Read the friendly thread. In short you should always expand mempcpy to
> > memcpy and keep mempcpy implementation just for legacy programs.
> >
> > Possible benefits are small, few cycles to save register spill at most.
> > It can't be better as you can implement memcpy with mempcpy.
> >
> > But there is huge cost involved. Effective mempcpy is often larger than
> > one kilobyte. Duplicating that kilobyte for mempcpy would increase
> > instruction cache pressure a lot. Easily it could cause one cache miss
> > per call resulting additional 30 cycle penalty negating any benefit you
> > try to make.
> 
> That cost sounds like it very much depends on the architecture's cache
> size on typical processors (presumably likely to go up over time) and the
> size of the function implementations.  On SPARC, for example, mempcpy is
> simply an alternative entry point in the same file as memcpy that sets up
> a different return value, so I see no apparent reason to avoid the
> out-of-line mempcpy implementation there.  Generally, you should get
> consensus for the change from architecture people on each architecture
> with its own mempcpy version (x86_64, x86, powerpc, sparc).
> 
> Regarding the patch in particular, you appear to lose the
> 
> > -/* In glibc we use this function frequently but for namespace reasons
> > -   we have to use the name `__mempcpy'.  */
> > -#   define mempcpy(dest, src, n) __mempcpy (dest, src, n)
> 
> so it's not clear the patch would even be effective for programs that call
> mempcpy rather than __mempcpy.  And in
> 
> > #define __mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)
> 
> you're missing the space before the '(' in the function call.  The patch
> removes the definition of __mempcpy_small from the headers, but
> __mempcpy_small is part of the libc ABI for existing binaries that were
> built with old GCC; I'd have expected this patch to cause string-inlines.c
> no longer to export a definition of __mempcpy_small, resulting in ABI
> tests failing (you didn't say how the patch was tested).  I'd expect you
> to need to arrange for the definition to go somewhere else so that it
> still gets exported from libc (taking care of x86 having its own versions
> of string-inlines.c).  

So the best option seems to add this at the end of string/string.h:


#if defined __USE_GNU && !defined TARGET_HAS_OPTIMIZED_MEMPCPY

#undef mempcpy
#undef __mempcpy
#define mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)
#define __mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)

__attribute__((__always_inline__)) inline void *
__mempcpy_inline (void *__restrict __dest,
                  const void *__restrict __src, size_t __n)
{
  return memcpy (__dest, __src, __n) + __n;
}

#endif


Now any target that wants to use an assembler version of mempcpy needs to define
TARGET_HAS_OPTIMIZED_MEMPCPY to avoid inlining mempcpy - looking at the replies
so far that might only be SPARC.

> And as __mempcpy_inline is __STRING_INLINE, the
> compiler could choose not to inline it, meaning that it needs to go in the
> Versions file and ABI baselines unless you can arrange for it to have
> __mempcpy as its asm name for any non-inlined calls (which would certainly
> be preferable, to avoid introducing a new ABI).

So we don't add it to bits/string2.h and it won't be in string-inlines.c.

> All those ABI complications around removing existing inlines from headers
> definitely show that any such removals should be separate from adding new
> inlines or other optimizations.

Agreed, it's best figure out how to remove/deprecate bits/string2.h as a 
separate project. Eg. string-inlines.c could define all the exported functions
with trivial implementations, making bits/string2.h redundant.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]