This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.
Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Geoff Keating <geoffk@cygnus.com> writes: > I'd appreciate it if you could run some benchmarks. I'm not convinced > that adding an extra taken branch to the most common path in this > routine would be insignificant. OK, I benchmarked it, and the upshot is that the overhead is in the noise at approx 0.8%. I suspected this all along, but didn't feel like arguing, because it would just come across as whining. 8^) Let this be an instructive case regarding optimizing for micro-efficiencies. Even so, it was a fun diversion, though I would have better spent time the time elsewhere. I wrote the following program: ------------------------------------------------------------------------------ #include <stdio.h> #include <string.h> const char *words_0[] = { "Aarhus", "Aaron", ... 45398 intervening words from /usr/dict/words omitted ... "Zulus", "Zurich" }; int word_count = sizeof (words_0) / sizeof (*words_0); #define SPINS 1000 int main (int argc, char **argv) { const char **words_N = &words_0[word_count]; int hits = 0; int c = **++argv; int spins; for (spins = 0; spins < SPINS; spins++) { const char **words; for (words = words_0; words < words_N; words++) { const char *word = *words; while ((word = strchr (word, c))) { hits++; word++; } } } printf ("%d/%d hits for '%c'\n", hits, hits + word_count * SPINS, c); return 0; } ------------------------------------------------------------------------------ I ran it on a quiescent PPC Linux system: $ cat /proc/cpuinfo processor : 0 cpu : 604r clock : 300MHz revision : 1.0 bogomips : 209.31 zero pages : total: 0 (0Kb) current: 0 (0Kb) hits: 0/0 (0%) machine : Power Macintosh motherboard : AAPL,8500 MacRISC memory : 272MB I ran it looking for three different chars: 'e', 'q' and '1'. This is the output for the tree cases: 42413000/87815000 hits for 'e' 664000/46066000 hits for 'q' 0/45402000 hits for '1' In order to boost the signal/noise, I tossed the program's output during the test run. words-new uses my modified strchr and words-old is the original version. ------------------------------------------------------------------------------ $ for c in e q 1 > do > for v in old new > do > for run in 1 2 3 > do > /usr/bin/time -f '%C: %E real, %U user, %S sys, %P CPU' ./words-$v $c > done > done > echo '' > done >/dev/null ./words-old e: 0:14.19 real, 14.16 user, 0.03 sys, 100% CPU ./words-old e: 0:14.16 real, 14.15 user, 0.01 sys, 99% CPU ./words-old e: 0:14.15 real, 14.14 user, 0.02 sys, 100% CPU ./words-new e: 0:14.79 real, 14.77 user, 0.02 sys, 99% CPU ./words-new e: 0:14.78 real, 14.77 user, 0.01 sys, 99% CPU ./words-new e: 0:14.80 real, 14.78 user, 0.02 sys, 99% CPU ./words-old q: 0:09.90 real, 9.89 user, 0.02 sys, 100% CPU ./words-old q: 0:09.95 real, 9.95 user, 0.01 sys, 100% CPU ./words-old q: 0:09.93 real, 9.91 user, 0.02 sys, 99% CPU ./words-new q: 0:10.60 real, 10.58 user, 0.03 sys, 100% CPU ./words-new q: 0:10.65 real, 10.63 user, 0.02 sys, 99% CPU ./words-new q: 0:10.52 real, 10.50 user, 0.02 sys, 99% CPU ./words-old 1: 0:09.77 real, 9.75 user, 0.02 sys, 99% CPU ./words-old 1: 0:09.85 real, 9.84 user, 0.01 sys, 99% CPU ./words-old 1: 0:09.83 real, 9.81 user, 0.02 sys, 100% CPU ./words-new 1: 0:10.51 real, 10.50 user, 0.02 sys, 100% CPU ./words-new 1: 0:10.50 real, 10.50 user, 0.00 sys, 99% CPU ./words-new 1: 0:10.50 real, 10.48 user, 0.02 sys, 99% CPU ------------------------------------------------------------------------------ Notice that runtime for the 'e' case over the 'q' was only approx 50% longer for almost 100% more calls to strchr. This is the result of some combination of test harness overhead and data-cache effects. My guess is that the latter predominates. Geoff, does this satisfy you? If not, I think I could do better register assignment and eliminate the extra branch, but I don't want to spend even one more minute on this. IMO, it's a waste of time. Greg
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |