This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 4/*] Generic string memchr and strnlen

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Chris Metcalf <cmetcalf at ezchip dot com>
Cc: Wilco Dijkstra <wdijkstr at arm dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
Date: Tue, 28 Jul 2015 01:22:24 +0200
Subject: Re: [PATCH 4/*] Generic string memchr and strnlen
Authentication-results: sourceware.org; auth=none
References: <003b01d0c622$dd96f530$98c4df90$ at com> <20150724153758 dot GA855 at domone> <003c01d0c62f$3440aa00$9cc1fe00$ at com> <20150724170425 dot GA10041 at domone> <003d01d0c63b$0080b730$01822590$ at com> <20150724191248 dot GA2889 at domone> <004201d0c873$bd3b9fe0$37b2dfa0$ at com> <20150727165642 dot GA22842 at domone> <55B67BA7 dot 7030606 at ezchip dot com>

On Mon, Jul 27, 2015 at 02:42:47PM -0400, Chris Metcalf wrote:
> On 07/27/2015 12:56 PM, OndÅej BÃlka wrote:
> >Best example is strrchr where almost all architectures got it wrong
> >first time until I tell them otherwise. Problem is that they calculate
> >position of byte at each iteration instead just at end which is
> >ineffective. Now tile and powerpc have this problem. My proposed
> >alternative
> >
> >return memrchr (s, c, strlen (s) + 1);
> >
> >would beat these on suitable inputs.
> 
> Your comment and your code snippet don't really match, so can I ask
> you to elucidate a little what your concern is?
> 
It is mix of two concerns, so its both First a snippet was there 
mainly to show that generic code could be faster. I mentioned that
it applies for some inputs, as it would need larger inputs than usual
to overcome initialization and call overhead of assembly one.

> First, are you suggesting that at a high level the right thing to do
> is to do strlen() and then search backwards from the end?  This will
> certainly be faster if there is an instance of the character near the
> end of the string: on tilegx, strlen is 0.375 cycles/byte on hot cache
> vs 0.625 cycles/byte for strrchr.  But if the character is closer to
> the beginning of the string, you have to pay for the strlen, and then
> pay again for scanning the string backwards, and that cost is higher
> than just scanning forward for NUL and 'c' at the same time.  It's
> actually a bit slower to scan backwards since you have to do a counted
> loop, plus examine the loaded words, so you end up running at 0.750
> cycles/byte in reverse, so on average you would need to find the
> character in the last third of the string to make up the difference.
> 
> This is what your strlen + memchrr code snippet suggested.
> 
I didn't say that a generic is optimal only that it avoids problem with
tile implementation on some inputs.

A problem is that your analysis is incomplete. It assumes that character
occurs rarely which doesn't have to be case. A forward scan could be
very expensive if character has probability 1/16 which causes branch
misprediction each second iteration.

With strlen and memrchr you will likely have at most two mispredictions 
when they find zero and character.


> Or, are you concerned about the forward implementation of strrchr for
> tilegx itself?  I reviewed the tilegx implementation of strrchr (in C
> but using our multibyte "v1cmpeq" vector builtins for speed), and the
> inner loop is just four instruction bundles, operating on eight bytes
> at a time (with a one-cycle stall after the ld to fetch from L1D$):
> 
> string/../sysdeps/tile/tilegx/strrchr.c:61
>   40:    { addi r10, r10, 8 ; bnez r11, 90 <__GI_strrchr+0x90> }
> string/../sysdeps/tile/tilegx/strrchr.c:64
>   48:    { ld r11, r10 }
> string/../sysdeps/tile/tilegx/strrchr.c:43
>   50:    { v1cmpeq r12, r11, r13 ; v1cmpeqi r11, r11, 0 }
> string/../sysdeps/tile/tilegx/strrchr.c:49
>   58:    { beqzt r12, 40 <__GI_strrchr+0x40> }
> 
> This is what your English comment suggested (quoted above).  Despite
> the appearance of the C code, the compiler knows better than to waste
> time computing the precise character position at every iteration hit.
> 
A problem that I described in english is that this isn't inner loop as
you jump from it each second iteration when c probability is 1/16.

For example if application uses strrchr to find last / in path name then 
its likely that it would be among last 8 characters and occurs
frequently previously.

Here you would need a conditional move instead of branch to avoid that
issue.


> So I'm not sure which point you are making but unless you know
> something about the average distribution of the characters in the
> strrchr() string to suggest they are likely to occur in the last third
> of the string more than 50% of the time, I don't think I'm convinced.
> 
This mainly depends on size of string, if you assume that each character
has fixed probability it happens for sufficiently large strings.

A tricky part of writing these is that inputs are short so you need to
write separate header to get better performance.

Follow-Ups:
- RE: [PATCH 4/*] Generic string memchr and strnlen
  - From: Wilco Dijkstra
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: Chris Metcalf

References:
- [PATCH 4/*] Generic string memchr and strnlen
  - From: Wilco Dijkstra
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: OndÅej BÃlka
- RE: [PATCH 4/*] Generic string memchr and strnlen
  - From: Wilco Dijkstra
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: OndÅej BÃlka
- RE: [PATCH 4/*] Generic string memchr and strnlen
  - From: Wilco Dijkstra
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: OndÅej BÃlka
- RE: [PATCH 4/*] Generic string memchr and strnlen
  - From: Wilco Dijkstra
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: OndÅej BÃlka
- Re: [PATCH 4/*] Generic string memchr and strnlen
  - From: Chris Metcalf

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]