This is the mail archive of the mailing list for the GDB project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Monster testcase generator for performance testsuite

On Mon, Jan 5, 2015 at 5:32 AM, Yao Qi <> wrote:
> Doug Evans <> writes:
> Doug,
> First of all, it is great to have such generator for performance testing,
> but it doesn't have to be a monster and we don't need parallel build so
> far.  The parallel build will get the generator over-complicated.  See
> more below.
>> This patch adds preliminary support for generating large programs.
>> "Large" as in 10000 compunits or 5000 shared libraries or 3M ELF symbols.
> Is there any reason we define the workload like this?  Can they
> represent the typical and practical super large program?  I feel that the
> workload you defined is too heavy to be practical, and the overweight
> causes the long compilation time you mentioned below.

Those are just loose (i.e., informal) characterizations of real programs
my users run gdb on.
And that's an incomplete list btw.
So, yes, they do represent practical super large programs.
The programs these benchmarks will be based on are as real as it gets.
As for whether they're typical ... depends on what you're used to I guess. :-)

>> There's still a bit more I want to add to this, but it's at a point
>> where I can use it, and thus now's a good time to get some feedback.
>> One difference between these tests and current perf tests is that
>> one .exp is used to build the program and another .exp is used to
>> run the test.  These programs take awhile to compile and link.
>> Generating the sources for these monster testcases takes hardly any time
>> at all relative to the amount of time to compile them.  I measured 13.5
>> minutes to compile the included gmonster1 benchmark (with -j6!), and about
>> an equivalent amount of time to run the benchmark.  Therefore it makes
>> sense to be able to use one program in multiple performance tests, and
>> therefore it makes sense to separate the build from the test run.
> Compilation and run takes about 10 minutes respectively.  However, I
> don't understand the importance that making tests running for 10
> minutes, which is too long for a perf test case.  IMO, a-two-minute-run
> program should be representative enough...

I'm not suggesting compile/run time is the defining characteristic
that makes them useful. gmonster1 (and others) are intended to be
representative of real programs (gmonster1 isn't there yet, but it's
not because it's too big ..., I still have to tweak the kind of bigness
it has, as well as add more specially crafted code to exercise real issues).
Its compile time is what it is. The program is that big.
As for test run time, that depends on the test.
At the moment it's still early, and I'm still writing tests and
calibrating them.

As for general importance,

If a change to gdb increases the time it takes to run a particular command
by one second is that ok? Maybe. And if my users see the increase
become ten seconds is that still ok? Also maybe, but I'd like to make the
case that it'd be preferable to have mechanisms in place to find out sooner
than later.

Similarly, if a change to gdb increases memory usage by 40MB is that ok?
Maybe. And if my users see that increase become 400MB is that still ok?
Possibly (depending on the nature of the change). But, again, one of my
goals here is to have in place mechanisms to find out sooner than later.

Note that, as I said, there's more I wish to add here.
For example, it's not enough to just machine generate a bunch of generic
code. We also need the ability to add specific cases that trip gdb up,
and thus I also plan to add the ability to add hand-written code to
these benchmarks.
Plus, my plan is to make gmonster1 contain a variety of such cases
and use it in multiple benchmarks. Otherwise we're compiling/linking
multiple programs and I *am* trying to cut down on build times here! :-)

>> These tests currently require separate build-perf and check-perf steps,
>> which is different from normal perf tests.  However, due to the time
>> it takes to build the program I've added support for building the pieces
>> of the test in parallel, and hooking this parallel build support into
>> the existing framework required some pragmatic compromise.
> ... so the parallel build part may not be needed.

I'm not sure what the hangup is on supporting parallel builds here.
Can you elaborate? It's really not that much code, and while I could
have done things differently, I'm just using mechanisms that are
already in place. The only real "complexity" is that the existing
mechanism is per-.exp-file based, so I needed one .exp file per worker.
I think we could simplify this with some cleverness, but this isn't
what I want to focus on right now. Any change will just be to the
infrastructure, not to the tests. If someone wants to propose a different
mechanism to achieve the parallelism go for it. OTOH, there is value
in using existing mechanisms. Another way to go (and I'm not suggesting
this is a better or worse way, it's just an example) would be to have
hand-written worker .exp files and check those in. I don't have a
strong opinion on that, machine generating them is easy enough and
gives me some flexibility (which is nice) in these early stages.

>> Running the gmonster1-ptype benchmark requires about 8G to link the program,
>> and 11G to run it under gdb.  I still need to add the ability to
>> have a small version enabled by default, and turn on the bigger version
>> from the command line.  I don't expect everyone to have a big enough
>> machine to run the test configuration that I do.
> It looks like a monster rather than a perf test case :)

Depends.  How long do your users still wait for gdb to do something?
My users are still waiting too long for several things (e.g., startup time).
And I want to be able to measure what my users see.
And I want to be able to provide upstream with demonstrations of that.

> It is good to
> have a small version enabled by default, which requires less than 1 G,
> for example, to run it under GDB.  How much time it takes to compile
> (sequential build) and run the small version?

There are mechanisms in place to control the amount of parallelism.
One could make it part of the test spec, but I'm not sure it'd be useful
enough.  Thus I think there's no need to compile small testcases

As for what upstream wants the "default" to be, I don't have
a strong opinion, beyond it being minimally useful.  If the default isn't
useful to me, it's easy enough to tweak the test with a local change
to cover what I need.

Note that I'm not expecting the default to be these
super long times, which I noted in my original email. OTOH, I do want
the harness to be able to usefully handle (as in not wait an hour for the
testcase to be built) the kind of large programs that I need to run the
tests on.  Thus my plan is to have a harness that can handle what
I need, but have defaults that don't impose that on everyone.
Given appropriate knobs it will be easy enough to have useful
defaults and still be able to run the tests with larger programs.
And then if my runs find a problem, it will be straightforward for
me to provide a demonstration of what I'm seeing (which is part
of what I want to accomplish here).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]