This is the mail archive of the
mailing list for the GDB project.
Re: [RFC] Monster testcase generator for performance testsuite
- From: Doug Evans <dje at google dot com>
- To: Yao Qi <yao at codesourcery dot com>
- Cc: gdb-patches <gdb-patches at sourceware dot org>
- Date: Wed, 7 Jan 2015 14:33:18 -0800
- Subject: Re: [RFC] Monster testcase generator for performance testsuite
- Authentication-results: sourceware.org; auth=none
- References: <m3lhllpkd6 dot fsf at seba dot sebabeach dot org> <87mw5xuzdc dot fsf at codesourcery dot com> <CADPb22TdP5ZG=xHD-9EH1JoyUZtOkD1nZfzcx9TuVOPdJTU++Q at mail dot gmail dot com> <871tn7udyt dot fsf at codesourcery dot com>
On Wed, Jan 7, 2015 at 1:39 AM, Yao Qi <email@example.com> wrote:
> Doug Evans <firstname.lastname@example.org> writes:
>> If a change to gdb increases the time it takes to run a particular command
>> by one second is that ok? Maybe. And if my users see the increase
>> become ten seconds is that still ok? Also maybe, but I'd like to make the
>> case that it'd be preferable to have mechanisms in place to find out sooner
>> than later.
> Yeah, I agree that it is better to find out problems sooner than later.
> That is why we create perf test cases. If one second time increase is
> sufficient to find the performance problem, isn't it good? Why do we
> still need to run a bigger version which demonstrated ten seconds increase?
Some performance problems only present themselves at scale.
We need a perf test framework that lets us explore such things.
The point of the 1 second vs 10 second scenario is that the community
may find that 1 second is acceptable (IOW *not* a performance problem
significant enough to address). It'll depend on the situation.
But at scale the performance may be untenable, causing one to want
to rethink one's algorithm or data structure or whatever.
Similar issues arise elsewhere btw.
E.g., gdb may handle 10 or 100 threads ok, but how about 1000 threads?
>> Similarly, if a change to gdb increases memory usage by 40MB is that ok?
>> Maybe. And if my users see that increase become 400MB is that still ok?
>> Possibly (depending on the nature of the change). But, again, one of my
>> goals here is to have in place mechanisms to find out sooner than later.
> Similarly, if 40MB memory usage increase is sufficient to show the
> performance problem, why do we still have to use a bigger one?
> Perf test case is used to demonstrate the real performance problems in
> some super large programs, but it doesn't mean the perf test case should
> be as big as these super large programs.
One may think 40MB is a reasonable price to pay for some change
or some new feature. But at scale that price may become unbearable.
So, yes, we do need perf testcases that let one exercise gdb at scale.
>>>> These tests currently require separate build-perf and check-perf steps,
>>>> which is different from normal perf tests. However, due to the time
>>>> it takes to build the program I've added support for building the pieces
>>>> of the test in parallel, and hooking this parallel build support into
>>>> the existing framework required some pragmatic compromise.
>>> ... so the parallel build part may not be needed.
>> I'm not sure what the hangup is on supporting parallel builds here.
>> Can you elaborate? It's really not that much code, and while I could
> I'd like keep gdb perf test simple.
How simple? What about parallel builds adds too much complexity?
make check-parallel adds complexity, but I'm guessing no one is
advocating removing it, or was advocating against checking it in.
>>> It looks like a monster rather than a perf test case :)
>> Depends. How long do your users still wait for gdb to do something?
>> My users are still waiting too long for several things (e.g., startup time).
>> And I want to be able to measure what my users see.
>> And I want to be able to provide upstream with demonstrations of that.
> IMO, your expectation is beyond the scope or the purpose perf test
> case. The purpose of each perf test case is to make sure there is no
> performance regression and to expose performance problems as code
It's precisely within the scope and purpose of the perf testsuite!
We need to measure how well gdb will work on real programs,
and make sure changes introduced don't adversely affect such programs.
How do you know a feature/change/improvement will work at scale unless
you test it at scale?
> It is not reasonable to me that we measure what users see by
> running our perf test cases.
Perf test cases aren't an end unto themselves.
They exist to help serve our users. If we're not able to measure
what our users see, how do we know what their gdb experience is?
> Each perf test case is to measure the
> performance on gdb on a certain path, so it doesn't have to behave
> exactly the same as the application users are debugging.
>>> It is good to
>>> have a small version enabled by default, which requires less than 1 G,
>>> for example, to run it under GDB. How much time it takes to compile
>>> (sequential build) and run the small version?
>> There are mechanisms in place to control the amount of parallelism.
>> One could make it part of the test spec, but I'm not sure it'd be useful
>> enough. Thus I think there's no need to compile small testcases
> Is it possible (or necessary) that we divide it to two parts, 1) perf
> test case generator and 2) parallel build? As we increase the size
> generated perf test cases, the long compilation time can justify having
> parallel build.
I'm not sure what you're advocating for here.
Can you rephrase/elaborate?
>> As for what upstream wants the "default" to be, I don't have
>> a strong opinion, beyond it being minimally useful. If the default isn't
>> useful to me, it's easy enough to tweak the test with a local change
>> to cover what I need.
>> Note that I'm not expecting the default to be these
>> super long times, which I noted in my original email. OTOH, I do want
>> the harness to be able to usefully handle (as in not wait an hour for the
>> testcase to be built) the kind of large programs that I need to run the
>> tests on. Thus my plan is to have a harness that can handle what
>> I need, but have defaults that don't impose that on everyone.
>> Given appropriate knobs it will be easy enough to have useful
>> defaults and still be able to run the tests with larger programs.
>> And then if my runs find a problem, it will be straightforward for
>> me to provide a demonstration of what I'm seeing (which is part
>> of what I want to accomplish here).
> Yeah, I agree.
> Yao (éå)