This is the mail archive of the
mailing list for the Cygwin project.
Re: Is multithreaded profiling on cygwin possible?
- From: Brian Ford <ford at vss dot fsi dot com>
- To: cygwin at cygwin dot com
- Date: Tue, 14 Oct 2003 18:05:50 -0500 (CDT)
- Subject: Re: Is multithreaded profiling on cygwin possible?
- References: <email@example.com>
Sorry for the delay. I've been swamped, both with the setup issue and
work. I haven't had a chance to look at the actual patch you sent yet.
On Tue, 14 Oct 2003, peter garrone wrote:
> A list of active threads is maintained. A thread calling moncontrol(1) gets
> put in the list. When a call to SuspendThread fails, the thread is assumed
> to be defunct and taken off the list.
I guess I was originally thinking "just profile all threads all the
time", but I guess you method is more flexable. That could
probably be an option somehow after you polish this up.
> One of the fields in the thread is a counter corresponding to the sum of cpu
> returned by GetThreadTimes. This function has fields corresponding to
> kernel cpu and user cpu. The amount of time consumed by every thread is
> Generally only one thread will have consumed CPU. However to be general,
> and in case the profiling thread is inadvertently delayed, all threads are
> There is a partial tick problem. Suppose that a thread has consumed say
> 155% of the cpu time corresponding to a tick. I would assign one tick
> and use a local random number generator to assign an extra tick on
> average 55% of the time.
I'm getting lost here.
What is your tick definition? A sampling interval?
How can a thread ever consume more that 100%? I can see how two or more
threads might on a multi CPU system.
I'm lost in the random number generator application too.
> I tried getting the program counter for all threads, but this was found
> not to work very well, consuming excessive cpu, on average 50 milliseconds.
I thought there might be an overhead issue.
> All the other calls were of the order of 1 microsecond. However getting
> the program counter only for any thread that used cpu according to
> GetThreadTimes appeared to take about 50 microseconds.
> Generally of course only one thread will have used CPU. The function
> GetThreadContext is used to obtain the PC.
That doesn't sound too bad.
> Brian Ford wrote:
> > I tried using a backtrace method to map the sampling time onto
> > DLL leaf functions (the import stubs) once, but it did not seem possible
> > to perfect. Also, that is not always what you want.
> I would be interested if you would expand on this. Do you mean looking at
> the stack to find the calling function?
Yes. All the way back into the application address space, and then
munging the address to assign it to the import stub. Calls into the
Microsoft DLL's don't have frame pointer info, so the backtrace is
difficult, if not impossible. I did have some success, though.
> > But, if you want this to be usefull for the community at large, attacking
> > the two points in the previous email directly would probably be more
> > useful. ie. Figure out a way to store the samples using a
> > non-contiguous address space model, and modify gprof to load the symbol
> > tables for the dependent DLLs (gdb does this to some extent). Note that
> > UNIX shared libraries have similar issues. You may want to consult with
> > firstname.lastname@example.org for a general solution since they "own" gprof.
> I am thinking of implementing a separate profil call so that it can be used
> simultaneously with -pg compilation and linking. Also a "profile-dll" call
> so that profiling of the space occupied by a dll would occur.
> My problem with profiling the entire dll address space is
> 1) The necessity of recompiling dll's so that mapping and call counting
> is implemented
If you want call counts, there is never a way around that easily.
> 2) The difficulty of doing anything with propriety dll's
> 3) The size and sparsity of the resulting gmon.out data file.
It really needs a different algorithm. Maybe simply multiple gmon.outs in
I have also seen just a recording algorithm without the hash that
stops when the buffer supplied is full. That has limited use.
> So I thought I would try attacking the problem using the import libraries.
> Perhaps it is a silly idea, but if it could be made to work it avoids
> these problems.
I think it is a good idea. I just don't understand or "see" the details
yet. Too bad this method wouldn't help other shared library platforms,
though. (No import libraries.) Well, maybe it could. You could probably
make the stub libs automatically and have them load the shared libs. Not
sure of the details, again, though.
> If I can get it to work, I'll be back.
Please do (come back with your results, that is). I'm definately
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html