This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc development schedule [Re: sharing libcpp between GDB and GCC]


On Thu, Mar 28, 2002 at 10:22:49AM +0100, Gerald Pfeifer wrote:

> On Wed, 27 Mar 2002, Zack Weinberg wrote:
> >>> (E.g. it takes about 10x longer to do "cvs update" on the 3.0
> >>> branch than the trunk.)
> >> Yeah, what's up with that?  (I thought it was just me.)

The cvs server on sourceware uses an optimization (a patch written
by Ian Lance Taylor) to cache the information needed for a cvs
update in a single file per directory.  These are the CVS/fileattr
files--you know, the ones that get out of date once every few months
and need to be blown away.  One of them in the gcc dir looks like

F.cvsignore     _head=1.9;_expand=
F.gdbinit       _head=1.6;_attic=;_expand=
FABOUT-GCC-NLS  _head=1.5;_expand=
FABOUT-NLS      _head=1.3;_expand=
FCOPYING        _head=1.4;_expand=
FCOPYING.LIB    _head=1.4;_expand=
FChangeLog      _head=1.13542;_expand=
FChangeLog.0    _head=1.12;_expand=
FChangeLog.1    _head=1.6;_expand=

So the cvs server doesn't have to open every RCS file in the
directory to find its head revision.  A do-nothing cvs update goes
much, much faster.

For the cvs server to update on a branch, it has to read each RCS
file to find the latest version on that branch.  Even worse, it
looks like it'll  need to read several blocks into each file for
frequently modified files.  CVS shouldn't have to compute any diffs
for most 'cvs update' operations - chances are that only a handful
of files were modified on the branch so only a few of these expensive
diffs are computed.  It's the get-the-head-revision operation that
is killing it.


I did make a neato little mechanism for Apple's benefit a few months
back.  It logs every cvs commit as it happens to a plain text file,
and provides a web interface to get the list of files that were
modified since the last time the client contacted the server.  (the
scripts on user's systems would just use wget with a pre-set URL
to get this list of files to update.)  It was a neato little thing,
but for trunk updates I found that doing a cvs update for the gcc
repository consistently took less than two minutes, so it wasn't
really all that useful.  I could revisit it and finish it up if
people would use it - it would work particularly well for things
like a branch where a global cvs update becomes more expensive.

> > That's the problem I know about; there may be others.
> 
> Overseers, is there anything we can do?

We could always disable the trunk cvs update optimization.  The
cvs updates on branches would still be slower than trunk cvs updates,
but the difference wouldn't be so stark.

OK, kidding aside, Ian's patch could be modified to keep the head
revisions of branches in the fileattr cache file in addition to
the trunk.  But someone would have to do that.  Goodness knows I
don't have the time.

The sourceware system is under a bit of memory pressure these days,
which doesn't help; if these blocks could be cached for a longer
period of time, this RCS file reading/parsing will go faster.

(incidentally, Ian's patch is one of the reason we're still running
cvs 1.10.  Stock cvs performance regressed between 1.11 and 1.10,
and no one has gotten Ian's patch to work fully with 1.11, so moving
to 1.11 from 1.10+Ian's patch would be a big lose.  To be fair, except
for one weekend that Chris Faylor spent on Ian's patch, no one has
really put any effort into making it work with 1.11 fully.)

> I noticed that we have more anoncvs processes running than authenticated
> users; is there any way we could find out which CVS modules these
> anonymous users currently access? (If it's gcc, we could see whether
> we can simply disable anoncvs access to the gcc module.)

None that I know of.  cvs logging bites.  We do track the # of
bytes being sent/received from different hosts and the frequency
with which hosts connect.  The host that downloaded the most number
of bytes last week by anoncvs?  A purdue.edu site.  Two redhat.com
and one suse.de sites make the top six; an IP# and a cable modem
(rogers.com) finish out the top six.  Not very useful without some
idea what repository or what sorts of operations we're talking
about.


I haven't followed the discussion that led up to this note, so I
apologize if I'm repeating things already said.

Jason


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]