Bug 2776 - Strange profiling results
Summary: Strange profiling results
Status: WAITING
Alias: None
Product: binutils
Classification: Unclassified
Component: gprof (show other bugs)
Version: 2.17
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-15 10:15 UTC by Dmitry Antipov
Modified: 2006-06-22 14:55 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Antipov 2006-06-15 10:15:47 UTC
I've observed a strange results while profiling the following program:

$ cat test.c
#include <sys/time.h>

#define rdtscll(val) asm volatile("rdtsc" : "=A" (val))

struct timeval tv;
unsigned long long t;

void f (int n) 
{
  int x, y, z = 0;

  for (x = 0; x < n; x++)
    for (y = 0; y < n; y++) {
      z += x * y;
      rdtscll (t);
    }
}

void g (int n) 
{
  int x, y, z = 0;

  for (x = 0; x < n; x++)
    for (y = 0; y < n; y++) {
      z += x * y;
      gettimeofday (&tv, 0);
    }
}

int main (int argc, char *argv[])
{ 
  int n;

  for (n = 0; n < 200; n++) {
    if (n % 2)
      f (n);
    else
      g (n);
  }
  return 0;
}

My typical results are:

$ gcc -O2 -g -pg -o test test.c                        
$ /usr/bin/time ./test                               
0.13user 3.67system 0:03.80elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+107minor)pagefaults 0swaps
$ gprof test | head -n 10                            
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 95.85      0.40     0.40      100     4.03     4.03  f
  4.79      0.42     0.02      100     0.20     0.20  g

 %         the percentage of the total running time of the
time       program used by this function.

I.e. g() is much faster than f(). But this means that raw CPU insn (rdtsc) is
slower than kernel syscall wrapped by C library function (gettimeofday()), which
is just impossible. Let's split the program above and test f() and g() separately:

$ cat test1.c
#define rdtscll(val) asm volatile("rdtsc" : "=A" (val))

unsigned long long t;

void f (int n) 
{
  int x, y, z = 0;

  for (x = 0; x < n; x++)
    for (y = 0; y < n; y++) {
      z += x * y;
      rdtscll (t);
    }
}

int main (int argc, char *argv[])
{ 
  int n;

  for (n = 0; n < 200; n++)
      f (n);
  return 0;
}
$ gcc -O2 -g -pg -o test1 test1.c                      
$ /usr/bin/time ./test1
0.08user 0.00system 0:00.09elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
$ 
$ cat test2.c                                          
#include <sys/time.h>

struct timeval tv;

void g (int n) 
{
  int x, y, z = 0;

  for (x = 0; x < n; x++)
    for (y = 0; y < n; y++) {
      z += x * y;
      gettimeofday (&tv, 0);
    }
}

int main (int argc, char *argv[])
{ 
  int n;

  for (n = 0; n < 200; n++)
      g (n);
  return 0;
}
$ gcc -O2 -g -pg -o test2 test2.c                        
$ /usr/bin/time ./test2                                
0.23user 7.37system 0:07.61elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps

These examples obviously shows that gprof results are very strange at least. Can
anyone explain them ?
Comment 1 Nick Clifton 2006-06-22 14:03:17 UTC
Subject: Re:  New: Strange profiling results

Hi Dmitry,

> void f (int n) 
> {
>       rdtscll (t);

> void g (int n) 
> {
>       gettimeofday (&tv, 0);

> int main (int argc, char *argv[])
> { 
>   int n;
> 
>   for (n = 0; n < 200; n++) {
>     if (n % 2)
>       f (n);
>     else
>       g (n);

> My typical results are:
> 
> $ gcc -O2 -g -pg -o test test.c                        
> $ gprof test | head -n 10                            
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  95.85      0.40     0.40      100     4.03     4.03  f
>   4.79      0.42     0.02      100     0.20     0.20  g

Have you checked the assembler output of the compiler ?  Perhaps it is 
being clever.  Maybe it knows that all but the calls to gettimeofday are 
redundant ?

What about main(), is any time spent in that function ?

Also it appears that the two percentages above add up to more than 100%.

The most likely though is that bad profiling data is being generated. 
ie it is not gprof's fault but either gcc's (for not inserting the calls 
to the profiling hooks correctly) or else the run time C library's (for 
not implementing the profiling hooks correctly).  Check you version of 
gcc.  If it is an old one, then try updating it.  If it is a new one, 
then try using an older one and see if the problem goes away.

Cheers
   Nick

Comment 2 Dmitry Antipov 2006-06-22 14:55:14 UTC
(In reply to comment #1)

> Have you checked the assembler output of the compiler ?  Perhaps it is 
> being clever.  Maybe it knows that all but the calls to gettimeofday are 
> redundant ?

Strace shows a lot of gettimeofday() calls during execution of the first test
program above.

Moreover, I've checked an annotated assemler output produced with
'-fverbose-asm' by gcc 4.0.2 and  gcc 4.1.1 and found that even with '-O3', the
code is assembled 'as is', i.e.
at least:
 - rdtsc and gettimeofday() aren't eliminated
 - loops aren't unrolled

> What about main(), is any time spent in that function ?

Flat profile doesn't include  data for main(), i.e. it  contains f() and g()
only. But the call graph shows something for main() too, for example:

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00    0.50                 main [1]
                0.49    0.00     100/100         f [2]
                0.01    0.00     100/100         g [3]
-----------------------------------------------
                0.49    0.00     100/100         main [1]
[2]     98.0    0.49    0.00     100         f [2]
-----------------------------------------------
                0.01    0.00     100/100         main [1]
[3]      2.0    0.01    0.00     100         g [3]
 
Assembler output says that mcount() is called from main(), so main() should be
accounted too. On the other side, it's possible that main() is just too fast
even for 1 profiling sample (all SIGPROFs was handled when we're in f() or g()
?). I believe this is another question for you - why main() isn't included into
flat profile :-) ?
 
> Also it appears that the two percentages above add up to more than 100%.

Yes, this is strange. 
 
> The most likely though is that bad profiling data is being generated. 
> ie it is not gprof's fault but either gcc's (for not inserting the calls 
> to the profiling hooks correctly) or else the run time C library's (for 
> not implementing the profiling hooks correctly).  Check you version of 
> gcc.  If it is an old one, then try updating it.  If it is a new one, 
> then try using an older one and see if the problem goes away.

I'll try a few versions of older gcc, starting from gcc 3.2...