This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH 4/*] Generic string memchr and strnlen

> OndÅej BÃlka wrote:
> On Mon, Jul 27, 2015 at 02:42:47PM -0400, Chris Metcalf wrote:
> > On 07/27/2015 12:56 PM, OndÅej BÃlka wrote:
> > >Best example is strrchr where almost all architectures got it wrong
> > >first time until I tell them otherwise. Problem is that they calculate
> > >position of byte at each iteration instead just at end which is
> > >ineffective. Now tile and powerpc have this problem. My proposed
> > >alternative
> > >
> > >return memrchr (s, c, strlen (s) + 1);
> > >
> > >would beat these on suitable inputs.
> >
> > Your comment and your code snippet don't really match, so can I ask
> > you to elucidate a little what your concern is?
> >
> It is mix of two concerns, so its both First a snippet was there
> mainly to show that generic code could be faster. I mentioned that
> it applies for some inputs, as it would need larger inputs than usual
> to overcome initialization and call overhead of assembly one.

I believe the above sequence is the right one to use in the generic code,
especially when you have an optimized memrchr that scans backwards
(simpler inner loop as you quit at first match). The current generic
implementation repeatedly calls strchr so will be very slow if there
are multiple matches.

> For example if application uses strrchr to find last / in path name then
> its likely that it would be among last 8 characters and occurs
> frequently previously.
> Here you would need a conditional move instead of branch to avoid that
> issue.

Agreed - the AArch64 implementation does that. It's hard to say what
is best if you don't have such an instruction - it'll depend on the
statistics of real inputs and the relative performance of strlen+partial 
backwards memrchr scan vs a full forward strrchr scan.

> > So I'm not sure which point you are making but unless you know
> > something about the average distribution of the characters in the
> > strrchr() string to suggest they are likely to occur in the last third
> > of the string more than 50% of the time, I don't think I'm convinced.

So what we really need is better statistics on strrchr and friends. Looking
at GLIBC sources, it seems that >90% search for '/' in a path. 

Do you have actual stats for strrchr OndÅej? That would really help solving
this issue. What I'd like to know is:

1. Average length of the strings
2. What percentage fails to match
3. Average number of matches per string if it matches at least once
4. Average relative position within string of last match

With that info we could create a simple patch for bench-strrchr.c to make
it use realistic inputs. Then based on that we can fix the generic code and
let maintainers further tune their optimized implementations if they do not
beat the generic code.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]