RE: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)

> Ondřej Bílka wrote:
> On Wed, Feb 04, 2015 at 04:30:43PM -0000, Wilco Dijkstra wrote:
> > > > the return value at the start of memcpy so that mempcpy can jump past it.
> > > > This means 1 extra instruction in every memcpy invocation plus an extra
> > > > branch for mempcpy.
> > >
> > > That is false. You need to copy starting memcpy fragment until you set
> > > return value and then jump which gives no overhead to memcpy.
> >
> > That's not how memcpy implementations work. You never set the return value
> > explicitly, you either don't change the destination register (which on most ABIs
> > also is the return value) or save/restore it on targets with few registers.
> > Additionally for small/medium copies you use the destination (and return value)
> > unchanged, so to support a different return value you need an extra instruction
> > to make a copy of the destination ...
> >
> No, my description is quite explicit. You take memcpy implementation and
> look at first instructions such that there is no read/write to return
> register/memory after reaching that instruction.
> Now for mempcpy you take memcpy as template and clone it until you reach
> instruction corresponding to one described before.
> On that position you change return value and jump to corresponding
> instruction in memcpy.
> It is obvious this does not add extra instruction to memcpy as memcpy is
> not changed.

Yes but that means if memcpy never makes a copy of the destination register 
and uses it in *all* subcases (like my memcpy does) then you end up having
to duplicate almost all of the memcpy code.

While duplicating most of memcpy is not as bad as duplicating all, the common
cases will not be cached/predicted. So the conclusion is that you either make
memcpy less efficient in order to share all of it (by using an extra register
and making a copy of the destination register early on) or duplicate most of
memcpy. Neither is a good idea.


