This is the mail archive of the
mailing list for the glibc project.
RE: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'Ondřej Bílka' <neleai at seznam dot cz>
- Cc: "'Roland McGrath'" <roland at hack dot frob dot com>, <libc-alpha at sourceware dot org>
- Date: Wed, 11 Feb 2015 14:51:44 -0000
- Subject: RE: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)
- Authentication-results: sourceware.org; auth=none
- References: <20150108185812 dot 285782C3BF6 at topped-with-meat dot com> <001901d02c0d$43cf9920$cb6ecb60$ at com> <20150109191632 dot 694692C3C1F at topped-with-meat dot com> <001a01d02dc9$bd6f0370$384d0a50$ at com> <20150113191449 dot AD91B2C39DC at topped-with-meat dot com> <001e01d03003$f67b8670$e3729350$ at com> <20150114193244 dot 44C022C39DB at topped-with-meat dot com> <002101d030da$c05f76f0$411e64d0$ at com> <20150131203619 dot GA13121 at domone dot leoexpresswifi dot com> <002b01d04097$ec2c9b10$c485d130$ at com> <20150211130656 dot GA7008 at domone>
> Ondřej Bílka wrote:
> On Wed, Feb 04, 2015 at 04:30:43PM -0000, Wilco Dijkstra wrote:
> > > > the return value at the start of memcpy so that mempcpy can jump past it.
> > > > This means 1 extra instruction in every memcpy invocation plus an extra
> > > > branch for mempcpy.
> > >
> > > That is false. You need to copy starting memcpy fragment until you set
> > > return value and then jump which gives no overhead to memcpy.
> > That's not how memcpy implementations work. You never set the return value
> > explicitly, you either don't change the destination register (which on most ABIs
> > also is the return value) or save/restore it on targets with few registers.
> > Additionally for small/medium copies you use the destination (and return value)
> > unchanged, so to support a different return value you need an extra instruction
> > to make a copy of the destination ...
> No, my description is quite explicit. You take memcpy implementation and
> look at first instructions such that there is no read/write to return
> register/memory after reaching that instruction.
> Now for mempcpy you take memcpy as template and clone it until you reach
> instruction corresponding to one described before.
> On that position you change return value and jump to corresponding
> instruction in memcpy.
> It is obvious this does not add extra instruction to memcpy as memcpy is
> not changed.
Yes but that means if memcpy never makes a copy of the destination register
and uses it in *all* subcases (like my memcpy does) then you end up having
to duplicate almost all of the memcpy code.
While duplicating most of memcpy is not as bad as duplicating all, the common
cases will not be cached/predicted. So the conclusion is that you either make
memcpy less efficient in order to share all of it (by using an extra register
and making a copy of the destination register early on) or duplicate most of
memcpy. Neither is a good idea.