This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH] Inline mempcpy
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Joseph Myers'" <joseph at codesourcery dot com>, 'Ondřej Bílka' <neleai at seznam dot cz>
- Cc: <libc-alpha at sourceware dot org>
- Date: Mon, 18 May 2015 12:41:07 +0100
- Subject: RE: [PATCH] Inline mempcpy
- Authentication-results: sourceware.org; auth=none
- References: <A610E03AD50BFC4D95529A36D37FA55E769A83F7F7 at GEORGE dot Emea dot Arm dot com>
> > On Wed, May 13, 2015 at 10:21:25PM +0000, Joseph Myers wrote:
> > > On Wed, 13 May 2015, Ondřej Bílka wrote:
> > >
> > > > Hi,
> > > > As pointed out that following patch should be generic
> > > > http://patchwork.sourceware.org/patch/6459/
> > > > here is sample patch that does it. Header for mempcpy now becomes
> > > > following:
> > > >
> > > > #ifdef __USE_GNU
> > > > # if !defined _HAVE_STRING_ARCH_mempcpy || defined _FORCE_INLINES
> > >
> > > Doesn't this change the semantics of _HAVE_STRING_ARCH_mempcpy? That is,
> > > after this patch, architectures with efficient .S implementations of
> > > mempcpy (as opposed to ones that just use the default .c implementation)
> > > should define that macro rather than just those with inline
> > > implementations. So the patch needs to update architectures as well.
> > >
> > Read the friendly thread. In short you should always expand mempcpy to
> > memcpy and keep mempcpy implementation just for legacy programs.
> >
> > Possible benefits are small, few cycles to save register spill at most.
> > It can't be better as you can implement memcpy with mempcpy.
> >
> > But there is huge cost involved. Effective mempcpy is often larger than
> > one kilobyte. Duplicating that kilobyte for mempcpy would increase
> > instruction cache pressure a lot. Easily it could cause one cache miss
> > per call resulting additional 30 cycle penalty negating any benefit you
> > try to make.
>
> That cost sounds like it very much depends on the architecture's cache
> size on typical processors (presumably likely to go up over time) and the
> size of the function implementations. On SPARC, for example, mempcpy is
> simply an alternative entry point in the same file as memcpy that sets up
> a different return value, so I see no apparent reason to avoid the
> out-of-line mempcpy implementation there. Generally, you should get
> consensus for the change from architecture people on each architecture
> with its own mempcpy version (x86_64, x86, powerpc, sparc).
>
> Regarding the patch in particular, you appear to lose the
>
> > -/* In glibc we use this function frequently but for namespace reasons
> > - we have to use the name `__mempcpy'. */
> > -# define mempcpy(dest, src, n) __mempcpy (dest, src, n)
>
> so it's not clear the patch would even be effective for programs that call
> mempcpy rather than __mempcpy. And in
>
> > #define __mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)
>
> you're missing the space before the '(' in the function call. The patch
> removes the definition of __mempcpy_small from the headers, but
> __mempcpy_small is part of the libc ABI for existing binaries that were
> built with old GCC; I'd have expected this patch to cause string-inlines.c
> no longer to export a definition of __mempcpy_small, resulting in ABI
> tests failing (you didn't say how the patch was tested). I'd expect you
> to need to arrange for the definition to go somewhere else so that it
> still gets exported from libc (taking care of x86 having its own versions
> of string-inlines.c).
So the best option seems to add this at the end of string/string.h:
#if defined __USE_GNU && !defined TARGET_HAS_OPTIMIZED_MEMPCPY
#undef mempcpy
#undef __mempcpy
#define mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)
#define __mempcpy(dest, src, n) __mempcpy_inline(dest, src, n)
__attribute__((__always_inline__)) inline void *
__mempcpy_inline (void *__restrict __dest,
const void *__restrict __src, size_t __n)
{
return memcpy (__dest, __src, __n) + __n;
}
#endif
Now any target that wants to use an assembler version of mempcpy needs to define
TARGET_HAS_OPTIMIZED_MEMPCPY to avoid inlining mempcpy - looking at the replies
so far that might only be SPARC.
> And as __mempcpy_inline is __STRING_INLINE, the
> compiler could choose not to inline it, meaning that it needs to go in the
> Versions file and ABI baselines unless you can arrange for it to have
> __mempcpy as its asm name for any non-inlined calls (which would certainly
> be preferable, to avoid introducing a new ABI).
So we don't add it to bits/string2.h and it won't be in string-inlines.c.
> All those ABI complications around removing existing inlines from headers
> definitely show that any such removals should be separate from adding new
> inlines or other optimizations.
Agreed, it's best figure out how to remove/deprecate bits/string2.h as a
separate project. Eg. string-inlines.c could define all the exported functions
with trivial implementations, making bits/string2.h redundant.
Wilco