Cygwin multithreading performance

Mark Geisert mark@maxrnd.com
Sat Nov 21 09:21:00 GMT 2015


Kacper Michajlow wrote:
> Thanks for reply. And sorry for being not specific enough before. 'git
> gc' is a driver which runs various git command to do cleanup in
> repository. Though I'm mostly concerned about the code I linked.
> Instead of 'git gc' it is better to test directly 'git repack -a -f'
> and possibly on repository where it takes some time.
> 'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> Although with bigger repositories performance hit is bigger, this is
> good example to see what's going on.

I appreciate that more specific info on how you experience the issue.

> I'm well aware that forking on windows is problematic, but I
> explicitly interested in parallelized part of execution. I don't care
> about forks, while this slows things down too, they are not used in
> compression process which is parallelized over the all cpu threads.
> Each command is indeed forked, but I'm only interested about
> pack-objects part hence the code I linked.

OK, we're on the same page now :).

> $ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> Counting objects: 156690, done.
> Delta compression using up to 12 threads.
> Compressing objects: 100% (154730/154730), done.
> Writing objects: 100% (156690/156690), done.
> Total 156690 (delta 123449), reused 33146 (delta 0)
>
> $ grep "fork(" git.strace
>    559   53728 [main] git 24340 fork: 24368 = fork()
>    465   54022 [main] git 24368 fork: 0 = fork()
>
> Only two forks were created, while during compression only 25% cpu was
> used (on big repo like linux kernel it doesn't exceed 8%). With native
> git the same workload easily uses 95-100% cpu and therefor is a lot
> faster.

I was able to reproduce your issue using a cloned newlib-cygwin repo. 
On a 6-CPU machine I saw max 36% CPU utilization during the compression 
phase.  ProcessExplorer showed all 6 threads were getting CPU time (to 
varying degrees) and when suspended they were always trying to acquire a 
mutex.  I'd like to run some more straces and perhaps investigate with 
some other tools before saying more.  This may take a while.

What I've done so far is install the git-debuginfo and cygwin-debuginfo 
packages to that I can convert hex RIP addresses to line numbers.  I've 
run the testcase under gdb so I can interrupt at random times and poke 
around.  The straces from this testcase are ginormous so I hope I can 
figure out a better way to see why the compression threads aren't 
CPU-bound like they should be.  If you don't already know, 'strace 
--help' shows the available mask values.  The threads are each writing 
to disk, so I wonder if there's some unintentional serialization going 
on somewhere, but I don't know yet how I could verify that theory.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list