This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/*] Generic string memchr and strnlen


On 07/27/2015 12:56 PM, OndÅej BÃlka wrote:
Best example is strrchr where almost all architectures got it wrong
first time until I tell them otherwise. Problem is that they calculate
position of byte at each iteration instead just at end which is
ineffective. Now tile and powerpc have this problem. My proposed
alternative

return memrchr (s, c, strlen (s) + 1);

would beat these on suitable inputs.

Your comment and your code snippet don't really match, so can I ask
you to elucidate a little what your concern is?

First, are you suggesting that at a high level the right thing to do
is to do strlen() and then search backwards from the end?  This will
certainly be faster if there is an instance of the character near the
end of the string: on tilegx, strlen is 0.375 cycles/byte on hot cache
vs 0.625 cycles/byte for strrchr.  But if the character is closer to
the beginning of the string, you have to pay for the strlen, and then
pay again for scanning the string backwards, and that cost is higher
than just scanning forward for NUL and 'c' at the same time.  It's
actually a bit slower to scan backwards since you have to do a counted
loop, plus examine the loaded words, so you end up running at 0.750
cycles/byte in reverse, so on average you would need to find the
character in the last third of the string to make up the difference.

This is what your strlen + memchrr code snippet suggested.

Or, are you concerned about the forward implementation of strrchr for
tilegx itself?  I reviewed the tilegx implementation of strrchr (in C
but using our multibyte "v1cmpeq" vector builtins for speed), and the
inner loop is just four instruction bundles, operating on eight bytes
at a time (with a one-cycle stall after the ld to fetch from L1D$):

string/../sysdeps/tile/tilegx/strrchr.c:61
  40:    { addi r10, r10, 8 ; bnez r11, 90 <__GI_strrchr+0x90> }
string/../sysdeps/tile/tilegx/strrchr.c:64
  48:    { ld r11, r10 }
string/../sysdeps/tile/tilegx/strrchr.c:43
  50:    { v1cmpeq r12, r11, r13 ; v1cmpeqi r11, r11, 0 }
string/../sysdeps/tile/tilegx/strrchr.c:49
  58:    { beqzt r12, 40 <__GI_strrchr+0x40> }

This is what your English comment suggested (quoted above).  Despite
the appearance of the C code, the compiler knows better than to waste
time computing the precise character position at every iteration hit.

So I'm not sure which point you are making but unless you know
something about the average distribution of the characters in the
strrchr() string to suggest they are likely to occur in the last third
of the string more than 50% of the time, I don't think I'm convinced.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]