This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

regexp profiling hell....



I compiled -O2 -g.  I gave up on looking at profiling output & just
started hacking out pieces of scm_regexp_exec one at a time to see
where the time was spent.  (BTW, this is under linux, rh4.2, gcc
2.7.2.1 & on a pentium II 300.  My other tests were on a similar linux
setup but on a pentium 120).

Here're the results:

guile-test1.scm:

   (debug-disable 'debug)
   (define r (make-regexp "^r"))
   (do ((i 0 (1+ i))) ((>= i 300000)) (regexp-exec r "asdr"))

guile-test2.scm:

   (debug-disable 'debug)
   (define r (make-regexp "^r"))
   (define (f a b) #f)
   (do ((i 0 (1+ i))) ((>= i 300000)) (f r "asdr"))

guile-test4.scm:
   (debug-disable 'debug)
   (define r (make-regexp "^r"))
   (do ((i 0 (1+ i))) ((>= i 300000)))


Base case:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -c '(exit)'
   0.35user 0.24system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (140major+108minor)pagefaults 0swaps

We've got a .6 second startup & exit time.

Empty case:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test4.scm
   2.12user 0.27system 0:02.38elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+111minor)pagefaults 0swaps
   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test4.scm

This is the best we can hope for - 2.38 seconds elapsed.

Minimal function case:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test2.scm
   3.39user 0.30system 0:03.68elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+111minor)pagefaults 0swaps
   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test2.scm

This is the best we can reasonably expect - 3.68 seconds elapsed.
That one defun + executing it 300,000 times costs 1.3 seconds.

How much the work really costs:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test1.scm
   8.92user 0.28system 0:09.21elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (153major+113minor)pagefaults 0swaps

We're 5.5 seconds over the minimal function case, so it looks like
regexp-exec itself costs 5.5 seconds.

How much of this is overhead:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex -s guile-test1.scm
   3.81user 0.31system 0:04.11elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+110minor)pagefaults 0swaps

I commented out the call to regexec & ran it again.  Comparing this to
the normal regexp case shows that the system regexec is taking most of
the time - 5.1 seconds, and scm_regexp_exec has .43 seconds of
overhead.

How a different regexp behaves in this test:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.gawk -s guile-test1.scm
   5.13user 0.25system 0:05.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (141major+109minor)pagefaults 0swaps

This is with gawk's regexp.c compiled in.  It's 1.29 over the case
with regexec commented out, so for this test, gawk's regexp takes 1.29
seconds vs the system regexp which takes 5.1 seconds.

Cutting back the overhead:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex.nomalloc -s guile-test1.scm
   3.54user 0.30system 0:03.83elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+110minor)pagefaults 0swaps

This is with the regexec call & the calls to scm_must_malloc &
scm_must_free commented out.  This cuts out .18 seconds.

Cutting back more overhead:


   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex.nomalloc.nocoerce -s guile-test1.scm
   3.48user 0.30system 0:03.77elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+110minor)pagefaults 0swaps

As above with SCM_COERCE_SUBSTR removed.

Everything out:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.nothing -s guile-test1.scm
   3.47user 0.32system 0:03.78elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (149major+110minor)pagefaults 0swaps

Everything except for variable declarations & return commented out
from scm_regexp_exec.  This is the real minimal case, and it's
slightly slower than the trival fcn case - .1 seconds.  Evidentally
dispatching regexp_exec is measurably (but slightly) slower than
running a trivial fcn.  Maybe because of the 2 optional arguments?  In
any case, comparing this to the case with the call to regexec
commented out gives the real measure of overhead, namely .33 seconds -
3% of the time.

gawk version:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time gawk 'BEGIN { for (i=0; i<300000; i++) "asdr" ~ /^r/; }' </dev/null
   0.80user 0.01system 0:00.80elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (108major+13minor)pagefaults 0swaps


The equivalent of test1 for STk, namely:

   (define r (string->regexp "^r"))
   (do ((i 0 (1+ i))) ((>= i 300000)) (r "asdr"))

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test1.scm
   3.16user 0.01system 0:03.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (114major+47minor)pagefaults 0swaps

This is lower than guile's with the body of scm_regexp_exec commented
out, and is even faster than this minus the guile startup time (albet
just barely).

STk with the trivial fcn:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test2.scm
   3.05user 0.01system 0:03.05elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (114major+47minor)pagefaults 0swaps

Looks like STk's regexp package is faster than gawk's!

STk with empty do loop:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test3.scm
   2.43user 0.01system 0:02.44elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (114major+47minor)pagefaults 0swaps
   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test3.scm

Now somehow STk is slower than the corresponding guile test!  I'd
guess that guile did 1 less gc call when removing the body of the do
loop, and STk didn't.  Just goes to show the difficulty of
benchmarking...

I'd like to compare to scm, but the ... person who built the scm rpm
didn't seem to include the regex feature.

Timing scm on the empty fcn, i.e. - 

   (define (f x y) #f)
   (do ((i 0 (1+ i))) ((>= i 300000)) (f 3 "asdr"))

yields:

   hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time scm -f scm-test1.scm 
   1.48user 0.00system 0:01.48elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (100major+69minor)pagefaults 0swaps

This is substantially faster than guile even with startup time deleted.

Conclusions:

1. The biggest help would be to use gawk's regexp.c instead of the system
regexp calls, at least for this particular regexec call.  Regular
expression testing is complicated, so one would have to do many more
tests for a variety of arguments (and a variety of platforms) before
being certain about it.

2. Overhead in the regexp_exec call is significant, but not
substantial.  Two things could be done to reduce it.  The easiest
would be to malloc the space needed for matching in the rgx
structure.  Then the calls to scm_must_malloc & scm_must_free could be
skipped.  The other would be to skip the call to SCM_COERCE_SUBSTR, if
that's possible.

3. Even with these changes, deleting startup time, and commenting
everything out of scm_regexp_exec, gawk is still fundamentally faster
gawk runtime is .8 seconds vs 3.19 seconds for guile - 4x slower!
Guile's also slower than STk & scm, even though it's descended from
scm.  In particular, scm manages to run the trivial function loop
about 2x faster than guile - startup time.  Has the interpreter been
changed so much that it's substantially slower now?  Another
possibility is that guile's sucking in so much stuff initially that
the gc scans of memory are killing us, but I don't see an easy way to
check this.

I think the fundamental questions are really wrt points 3 & 4 - Why is
guile so much slower than gawk, STk & scm?  I can think of 2 things
wrt gawk - gawk doesn't gc (i.e. - is just 1 big memory leak), and
guile's interpreter is just slow.  This would be in line with the
comparisons to STk & scm, although, again, it could be the gc scanning
of all that stuff that's scarfed up at the beginning.

This leaves two questions:

 1. How much does guile's eval differ from scm's?
 2. How much more time is guile spending in gc because it's initially
    sucking up so much code?

If the problem really is in 2, then it means that either we need a
generational gc for guile (hard), or it *must* ship with compiled init
files (not hard).  The latter would also help startup time & would
make the stuff in the init files run faster.  I really don't
understand the resistance to doing this.  I know there are things that
*should* be done with the startup files that would speed things up,
but why not just compile them anyway?  At least people wouldn't have
to live through the startup times waiting for the day when things are
cleaned up.

A general testing issue - Doing benchmarking is a pain because of the
need to include (debug-disable 'debug) at the top of files means I
need different files for other scheme implementations.  How about also
-nodebug & -debug command line args & make the (configurable) default
-nodebug?

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il