This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 12/31/1969 05:00 PM, wrote: >> I use only SSE2 instructions and my algorithm is faster than using SSE4 instructions. >> With trivial modifications I could also use plain arithmetic/AVX2 I knew that when I first posted the twoway strstr implementation that it was not optimal for short searches, and I totally welcome these attempts to approve things. My biggest concern is the avoidance of quadratic effects (strstr should always be linear speed, as proven by the twoway algorithm, and memmem should be sublinear with large enough needles, by skipping over portions of the haystack). > The biggest problem I have with reviewing *any* of this code is that > performance is relative to benchmarks. > > As a community we have no baseline benchmark numbers. Precisely - the very reason that I have not tried to further optimize anything in glibc beyond my twoway string, and the reason the SSE4 quadratic pessimization even got in to glibc in the first place, is because we don't have a good benchmark. > > I would like to see someone from the community come forward to help > sort this out. > > I think at a minimum we need: > > * A list of baseline benchmarks we are going to use to evaluate > performance related submissions. > * Collect data on the runs. > * Record everything in the wiki including how to run the benchmarks > and the collection of growing results. Among other things, the benchmark must cover: short needle, short haystack, at all possible starting alignments of both strings, as well as alignment of the match (when present) as well as time to detect a miss short needle, long haystack where the match or miss is late in the haystack long needle short haystack must exit early rather than wasting time computing the end of the needle long needle long haystack must not be quadratic, even if the needle is highly repetitive > > Are either of you interested in helping the community develop some > metrics for acceptance of performance related patches? While I can't promise to spend time writing such metrics from scratch, I will gladly help review them, as I'm interested in backporting whatever speedups we have here back into gnulib. I still think that there are some arch-specific assembly code tweaks that can always beat a generic C implementation, but having a fast and reliable generic C implementation portable to all platforms is a worthy goal for gnulib. The other thing I would really like to see is a way to pre-compile a string search. That is, the regcomp()/regexec()/regfree() model was invented precisely because it is very common to search for a common needle across multiple haystacks. strstr() and memmem() must remain for the ease of use on a one-shot search, but they have the overhead of recomputing the setup on every invocation. I think it would be worth adding additional functions to glibc to make repeated searches faster (for all but the case of long needle short haystack, because the compilation phase doesn't know to abort early). Maybe: struct strmem_search_t { /* contents should be treated as opaque */ }; /* Populate psrch with compilation of a string needle. Possible to fail with errno set to ENOMEM. Needle must not be altered until a call to strmem_free. */ int strstr_comp(struct strmem_search_t *psrch, const char *needle); /* Populate psrch with compilation of a memory needle. Possible to fail with errno set to ENOMEM. Needle must not be altered until a call to strmem_free. */ int memmem_comp(struct strmem_search_t *psrch, const void *needle, size_t len); /* Use a compiled representation to search a string haystack. */ char *strstr_exec(const char *haystack, const struct strmem_search_t *psrch); /* Use a compiled representation to search a memory haystack. */ void *memmem_exec(const void *haystack, size_t len, const struct strmem_search_t *psrch); /* Free a compiled needle */ void strmem_free(struct strmem_search_t *psrch); -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |