This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: PATCH: powerpc/strchr.S for BPs

To: Geoff Keating <geoffk at cygnus dot com>
Subject: Re: PATCH: powerpc/strchr.S for BPs
From: Greg McGary <greg at mcgary dot org>
Date: 23 Jul 2000 02:08:47 -0700
Cc: libc-hacker at sourceware dot cygnus dot com
References: <200007220836.BAA25479@kayak.mcgary.org> <200007230357.UAA00545@localhost.cygnus.com>

Geoff Keating <geoffk@cygnus.com> writes:

> I'd appreciate it if you could run some benchmarks.  I'm not convinced
> that adding an extra taken branch to the most common path in this
> routine would be insignificant.

OK, I benchmarked it, and the upshot is that the overhead is in the
noise at approx 0.8%.  I suspected this all along, but didn't feel
like arguing, because it would just come across as whining. 8^)
Let this be an instructive case regarding optimizing for micro-efficiencies.

Even so, it was a fun diversion, though I would have better spent time
the time elsewhere.

I wrote the following program:

------------------------------------------------------------------------------
#include <stdio.h>
#include <string.h>

const char *words_0[] = {
  "Aarhus",
  "Aaron",
	... 45398 intervening words from /usr/dict/words omitted ...
  "Zulus",
  "Zurich"
};

int word_count = sizeof (words_0) / sizeof (*words_0);

#define SPINS 1000

int
main (int argc, char **argv)
{
  const char **words_N = &words_0[word_count];
  int hits = 0;
  int c = **++argv;
  int spins;

  for (spins = 0; spins < SPINS; spins++)
    {
      const char **words;
      for (words = words_0; words < words_N; words++)
	{
	  const char *word = *words;
	  while ((word = strchr (word, c)))
	    {
	      hits++;
	      word++;
	    }
	}
    }
  printf ("%d/%d hits for '%c'\n", hits, hits + word_count * SPINS, c);

  return 0;
}
------------------------------------------------------------------------------

I ran it on a quiescent PPC Linux system:

$ cat /proc/cpuinfo
processor       : 0
cpu             : 604r
clock           : 300MHz
revision        : 1.0
bogomips        : 209.31
zero pages      : total: 0 (0Kb) current: 0 (0Kb) hits: 0/0 (0%)
machine         : Power Macintosh
motherboard     : AAPL,8500 MacRISC
memory          : 272MB

I ran it looking for three different chars: 'e', 'q' and '1'.
This is the output for the tree cases:

42413000/87815000 hits for 'e'
664000/46066000 hits for 'q'
0/45402000 hits for '1'

In order to boost the signal/noise, I tossed the program's output
during the test run.

words-new uses my modified strchr and words-old is the original version.

------------------------------------------------------------------------------
$ for c in e q 1
> do
> for v in old new
> do
> for run in 1 2 3
> do
> /usr/bin/time -f '%C: %E real, %U user, %S sys, %P CPU' ./words-$v $c
> done
> done
> echo ''
> done >/dev/null
./words-old e: 0:14.19 real, 14.16 user, 0.03 sys, 100% CPU
./words-old e: 0:14.16 real, 14.15 user, 0.01 sys, 99% CPU
./words-old e: 0:14.15 real, 14.14 user, 0.02 sys, 100% CPU
./words-new e: 0:14.79 real, 14.77 user, 0.02 sys, 99% CPU
./words-new e: 0:14.78 real, 14.77 user, 0.01 sys, 99% CPU
./words-new e: 0:14.80 real, 14.78 user, 0.02 sys, 99% CPU

./words-old q: 0:09.90 real, 9.89 user, 0.02 sys, 100% CPU
./words-old q: 0:09.95 real, 9.95 user, 0.01 sys, 100% CPU
./words-old q: 0:09.93 real, 9.91 user, 0.02 sys, 99% CPU
./words-new q: 0:10.60 real, 10.58 user, 0.03 sys, 100% CPU
./words-new q: 0:10.65 real, 10.63 user, 0.02 sys, 99% CPU
./words-new q: 0:10.52 real, 10.50 user, 0.02 sys, 99% CPU

./words-old 1: 0:09.77 real, 9.75 user, 0.02 sys, 99% CPU
./words-old 1: 0:09.85 real, 9.84 user, 0.01 sys, 99% CPU
./words-old 1: 0:09.83 real, 9.81 user, 0.02 sys, 100% CPU
./words-new 1: 0:10.51 real, 10.50 user, 0.02 sys, 100% CPU
./words-new 1: 0:10.50 real, 10.50 user, 0.00 sys, 99% CPU
./words-new 1: 0:10.50 real, 10.48 user, 0.02 sys, 99% CPU

------------------------------------------------------------------------------

Notice that runtime for the 'e' case over the 'q' was only approx 50%
longer for almost 100% more calls to strchr.  This is the result of
some combination of test harness overhead and data-cache effects.  My
guess is that the latter predominates.

Geoff, does this satisfy you?  If not, I think I could do better
register assignment and eliminate the extra branch, but I don't want
to spend even one more minute on this.  IMO, it's a waste of time.

Greg

Follow-Ups:
- Re: PATCH: powerpc/strchr.S for BPs
  - From: Greg McGary
- Re: PATCH: powerpc/strchr.S for BPs
  - From: Greg McGary

References:
- PATCH: powerpc/strchr.S for BPs
  - From: Greg McGary

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]