This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Gcc builtin review: strcpy, stpcpy, strcat, stpcat?


On Wed, Jun 10, 2015 at 11:35:30AM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Thu, Jun 04, 2015 at 02:50:07PM +0100, Wilco Dijkstra wrote:
> 
> > > The usual problem of knowing whether all targets define assembler versions of
> > > stpcpy applies - so I don't think it is a good idea to change all strcpy into
> > > stpcpy in general. The only useful case is strcpy(x,y)+strlen(x) which could
> > > potentially give a major speedup.
> > >
> > Then its situation where it decision depends on implementation details,
> > as on some architectures you could save some cycles with stpcpy itself.
> 
> Yes, I think the optimization to convert strcpy into stpcpy would need
> to be done in a target specific way in GLIBC headers for targets where it
> makes sense. It's not something you could easily do in GCC as stpcpy is
> not a standard function. In general it is best to optimize to use simpler,
> standard C90 functions (eg. mempcpy->memcpy eventhough mempcpy might
> be a better ABI to standardize on).
>
Not completely, it depends what can be gained by different function. For
example for memcpy/mempcpy at least on x64 I could introduce mempccpy to
implement them both. What matters is register passing convention as
following would put start to %rax register and end into %rdx.

struct ret
{
  char *start
  char *end
} 
struct ret mempccpy(void *, void *, size_t)

I could ask on gcc if they want to support it, there wouldn't be
difference vs memcpy as I need to calculate end anyway so I just store
it to rdx.

> > As useful cases, on gcc thread I said that gcc could use available
> > length to convert strchr to memchr and similar optimizations so strcpy
> > will be called more.
> > 
> > Then as I mentioned cache issues so far I measured mostly noise. I know
> > that overall stpcpy is often five times less called than strcpy, so
> > potential is there but it depends on actual savings when strcpy costs
> > cycle less.
> > Data about strcpy and stpcpy when running make of zlib with debian gcc-5
> > are following:
> > 
> > ./summary_strcpy calls 52218 average n:   71.0    
> 
> > ./summary_stpcpy calls 4950 average n:    7.5
> 
> This says that stpcpy processes only 1% of the data that strcpy does,
> so that means optimization of strcpy is 100 times more important. Ie.
> slowing down strcpy just to share with stpcpy does not make any sense.
> 

You couldn't use averages just like that. Main problem is granularity,
as for small inputs spend most of time determining size overhead from
extra bytes is quite small.
 
You need to look at probabilites that I printed which correspond to what
path will algorithm take, like that difference of end in first 16 bytes
is relatively small.

calls 52218
average n:   71.0    n <= 0:   7.4% n <= 4:  37.1% n <= 8:  52.8% n <=
16:  69.7% n <= 24:  77.4% n <= 32:  81.8% n <= 48:  86.6% n <= 64:
91.4%

calls 4950
average n:    7.5    n <= 0:   1.7% n <= 4:  76.9% n <= 8:  77.2% n <=
16:  79.1% n <= 24:  87.6% n <= 32:  95.5% n <= 48:  95.5% n <= 64:
100.0%

> Also given the relatively small strings the generic version of stpcpy would 
> be quite competitive already (the generic version using strlen+memcpy was
> beating optimized strcpy/stpcpy implementations on several targets at the
> time I made the change). So I'm just not convinced stpcpy needs a lot more
> optimization.
>
On that targets you would probably want gcc to inline calls into
strlen+memcpy pairs when you optimize for speed. But it depends if
there is possible new implementation or not. Size wouldn't matter as
they are small.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]