This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH v2] Add new case for stress test

From: David Smith <dsmith at redhat dot com>
To: Zhou, Wenjian/åæå <zhouwj-fnst at cn dot fujitsu dot com>, systemtap at sourceware dot org
Date: Tue, 12 Jan 2016 09:20:01 -0600
Subject: Re: [PATCH v2] Add new case for stress test
Authentication-results: sourceware.org; auth=none
References: <1451889421-9209-1-git-send-email-zhouwj-fnst at cn dot fujitsu dot com> <56942599 dot 9090401 at redhat dot com> <569459D0 dot 5050203 at cn dot fujitsu dot com>
On 01/11/2016 07:41 PM, "Zhou, Wenjian/åæå" wrote:
> On 01/12/2016 05:58 AM, David Smith wrote:
>> On 01/04/2016 12:37 AM, Zhou Wenjian wrote:
>>>     * testsuite/lib/gen_load.exp: New test file
>>>     * testsuite/systemtap.stress/all_kernel_functions.exp: Use
>>> gen_load.exp
>>>     * testsuite/systemtap.stress/current.exp: Use gen_load.exp
>>>     * testsuite/systemtap.stress/parallel_exec.exp: New test case
>>> ---
>>>   testsuite/lib/gen_load.exp                         | 19 ++++++++++
>>>   .../systemtap.stress/all_kernel_functions.exp      | 16 +--------
>>>   testsuite/systemtap.stress/current.exp             | 19 ++--------
>>>   testsuite/systemtap.stress/parallel_exec.exp       | 42
>>> ++++++++++++++++++++++
>>>   4 files changed, 64 insertions(+), 32 deletions(-)
>>>   create mode 100644 testsuite/lib/gen_load.exp
>>>   create mode 100644 testsuite/systemtap.stress/parallel_exec.exp
>>>
>>> diff --git a/testsuite/lib/gen_load.exp b/testsuite/lib/gen_load.exp
>>> new file mode 100644
>>> index 0000000..43b9df1
>>> --- /dev/null
>>> +++ b/testsuite/lib/gen_load.exp
>>> @@ -0,0 +1,19 @@
>>> +# genload is used by stress test
>>> +proc gen_load {} {
>>> +    # if 'genload' from the ltp exists, use it to create a real load
>>> +    catch {exec which genload 2>/dev/null} genload
>>> +    if {![file executable $genload]} {
>>> +        set genload {/usr/local/ltp/testcases/bin/genload}
>>> +    }
>>> +    if [file executable $genload] {
>>> +        exec $genload -c 10 -i 10 -m 10 -t 10
>>> +        #                               ^^^^^ run for 10 seconds
>>> +        #                         ^^^^^ 10 procs spinning on malloc
>>> +        #                   ^^^^^ 10 procs spinning on sync
>>> +        #             ^^^^^ 10 procs spinning on sqrt
>>> +    } else {
>>> +        # sleep for a bit
>>> +        wait_n_secs 10
>>> +    }
>>> +    return 0
>>> +}
>>
>> I know you didn't write the code in gen_load.exp, but I'd modify the
>> comment before 'wait_n_secs' to remind readers that wait_n_secs runs
>> 'stress' if available.
>>
>>> diff --git a/testsuite/systemtap.stress/all_kernel_functions.exp
>>> b/testsuite/systemtap.stress/all_kernel_functions.exp
>>> index d7c2fbc..d546cfc 100644
>>> --- a/testsuite/systemtap.stress/all_kernel_functions.exp
>>> +++ b/testsuite/systemtap.stress/all_kernel_functions.exp
>>> @@ -1,18 +1,4 @@
>>> -proc genload {} {
>>> -    # if 'genload' from the ltp exists, use it to create a real load
>>> -    set genload {/usr/local/ltp/testcases/bin/genload}
>>> -    if [file executable $genload] {
>>> -        exec $genload -c 10 -i 10 -m 10 -t 10
>>> -        #                               ^^^^^ run for 10 seconds
>>> -        #                         ^^^^^ 10 procs spinning on malloc
>>> -        #                   ^^^^^ 10 procs spinning on sync
>>> -        #             ^^^^^ 10 procs spinning on sqrt
>>> -    } else {
>>> -        # sleep for a bit
>>> -        wait_n_secs 10
>>> -    }
>>> -    return 0
>>> -}
>>> +load_lib "gen_load.exp"
>>
>> OK, this change (calling 'load_lib') I don't get. That shouldn't be
>> necessary. All the code in testsuite/lib should be available to all
>> testcases with calling load_lib.
>>
>> Did you find this necessary?
>>
> 
> load_lib should be called either here or in
> systemtap/testsuite/config/unix.exp.
> Otherwise it can't be found.
> 
> Since the gen_load.exp is only used by the stress test, I didn't choose
> calling it
> in systemtap/testsuite/config/unix.exp.

I always forget about testsuite/config/unix.exp.

I've gone ahead and checked in the portion of this patch that moves
genload to its own file as commit 86cfd04. Note that I did add
genload.exp to testsuite/config/unix.exp, since all the other files in
testsuite/lib automatically get added.

>>>   proc probe_ok {probepoint} {
>>>       set cmd {exec stap -p2 -e}
>>> diff --git a/testsuite/systemtap.stress/current.exp
>>> b/testsuite/systemtap.stress/current.exp
>>> index 186baa3..83814b6 100644
>>> --- a/testsuite/systemtap.stress/current.exp
>>> +++ b/testsuite/systemtap.stress/current.exp
>>> @@ -2,23 +2,8 @@
>>>   # function, install it, and get some output.
>>>
>>>   set test "current"
>>> -
>>> -proc current_load {} {
>>> -    # if 'genload' from the ltp exists, use it to create a real load
>>> -    set genload {/usr/local/ltp/testcases/bin/genload}
>>> -    if [file executable $genload] {
>>> -        exec $genload -c 10 -i 10 -m 10 -t 10
>>> -        #                               ^^^^^ run for 10 seconds
>>> -        #                         ^^^^^ 10 procs spinning on malloc
>>> -        #                   ^^^^^ 10 procs spinning on sync
>>> -        #             ^^^^^ 10 procs spinning on sqrt
>>> -    } else {
>>> -        # sleep for a bit
>>> -        wait_n_secs 10
>>> -    }
>>> -    return 0
>>> -}
>>> +load_lib "gen_load.exp"
>>
>> Ditto comment about calling 'load_lib' here.
>>
>>>   set output_string "(\\w+ =
>>> \\d+\r\n){5}(${all_pass_string}){2}(WARNING.*skipped.*)?"
>>>
>>> -stap_run $srcdir/$subdir/$test.stp current_load $output_string -g -w
>>> +stap_run $srcdir/$subdir/$test.stp gen_load $output_string -g -w
>>> diff --git a/testsuite/systemtap.stress/parallel_exec.exp
>>> b/testsuite/systemtap.stress/parallel_exec.exp
>>> new file mode 100644
>>> index 0000000..9b6769e
>>> --- /dev/null
>>> +++ b/testsuite/systemtap.stress/parallel_exec.exp
>>> @@ -0,0 +1,42 @@
>>> +#!/bin/expect -f
>>> +
>>> +set test "parallel execute"
>>> +
>>> +load_lib "gen_load.exp"
>>> +
>>> +set process_num 10
>>> +
>>> +set script {
>>> +    probe syscall.open
>>> +    {
>>> +        if (pid() != target()) next
>>> +        log("open")
>>> +    }
>>> +
>>> +    probe syscall.close
>>> +    {
>>> +        if (pid() != target()) next
>>> +        log("close")
>>> +    }
>>> +}
>>> +
>>> +for { set i 0 } { $i < $process_num } { incr i } {
>>> +    exec stap -e "$script" -c "cat /etc/hosts > /dev/null"
>>> >>/tmp/parallel_exec &
>>> +}
>>> +
>>> +if {[eval gen_load] == 0} then {
>>> +    pass "$test load generation"
>>> +} else {
>>> +    fail "$test load generation"
>>> +}
>>
>> I'm not sure I understand the above. As far as I can see, gen_load
>> always returns a 0, so this test will always pass.
>>
> 
> Perhaps it will always pass.
> But it is the same of code in stap_run.exp:
> 
>             if {$LOAD_GEN_FUNCTION != ""} then {
>                 #run the interesting test here
>                 if {[eval $LOAD_GEN_FUNCTION] == 0} then {
>                     pass "$TEST_NAME load generation"
>                 } else {
>                     fail "$TEST_NAME load generation"
>                 }
>             }

Ah, but that's because stap_run.exp is a general function, and the load
generation function passed to it is designed to be flexible. Let's say
that it needs to create a file for instance, but that fails. The load
function could return non-zero in this case so that the user would
realize that this probably isn't a systemtap problem, but a
testsuite/config/system problem.

I guess someone in the future could modify genload() to return nonzero,
so it does make some sense to leave this code as is.

> 
>> In addition, I'd remove /tmp/parallel_exec before you start, otherwise
>> you might be appending to a file from the previous run. You might think
>> about giving your temporary file a unique name using mktemp.
>>
> 
> I see.
> 
>> I think you are really hoping there that waiting 10 seconds will be
>> enough time for all the systemtap scripts to finish. I'm not sure you
>> can depend on that, especially on a slow system like an arm/arm64. One
>> thing that would help here is to precompile your script (stap -p4 -e
>> '$script") before your run of 10, so that all 10 can use the precompiled
>> systemtap module from the cache.
>>
> 
> Previously, I think it is not important whether systemtap will finish
> in 10 seconds or not, for starting 10 systemtap processes in 10 seconds
> is enough.
> 
> But now, I'm confused. I not sure if it is necessary to test the
> parallel compiling. Maybe compiling and executing should be test
> separately.
> 
> How do you think?

See below.

>>> +set num1 [exec stap -e "$script" -c "cat /etc/hosts > /dev/null" |
>>> grep \[open|close\] | wc -l]
>>> +set num2 [exec grep \[open|close\] /tmp/parallel_exec | wc -l]
>>
>> I'm not sure I understand this logic, perhaps I'm missing something.
>>
>> I'd think we'd see num1 as 10 (since there are 10 invocations of stap).
>> Why are you grepping for open|close for num1?
>>
>> I'd think we'd you'd want to see num2 as 20 (10 lines of 'open' and 10
>> lines of 'close'), but I'd guess that your regexp is going to find the
>> stap invocation lines also. Why not grep for something like
>> '^(open|close)$'?
>>
>>> +
>>> +exec rm -f /tmp/parallel_exec
>>> +
>>> +if {$num1 * $process_num != $num2} {
>>> +    fail $test
>>> +} else {
>>> +    pass $test
>>> +}
>>
>> I'm lost on your math here. I'd think you'd want num1 as 10 (10
>> invocations of stap) and num2 as 20 (10 lines of 'open', 10 lines of
>> 'close'), but num1 (10) * process_num (10) == 100, which isn't close to
>> num2 (20). How do you see this working?
>>
> 
> I'm afraid you have misunderstood it.
> 
> num1 is the number of [open|close] in the output by 1 systemtap process.
> num2 is the number of [open|close] in the output by 10 systemtap processes.
> 10 processes' outputs are all in one file.
> 
> So if there is no problem in parallel executing, num2 should equal num1
> * 10.

Ah, I see what is going on now. I'd add comments that say the above to
the new testcase.

In addition, I've got an idea here. I'm afraid that on slower systems
the stap commands might not finish in 10 seconds. So, we need to ensure
that they do. Otherwise, we've added more testsuite timing problems,
where tests randomly fail.

To do this, instead of using "exec" to run the 10 parallel invocations
of stap, we can use "spawn", save the spawn ids, then wait on each spawn
id to finish before looking at the result file. You can see the basic
idea in systemtap.base/rename_module.exp.

It would look something like the following (untested) code:

====
set spawn_ids {}
for { set i 0 } { $i < $process_num } { incr i } {
    spawn stap -e "$script" -c "cat /etc/hosts > /dev/null"
>> /tmp/parallel_exec
    lappend spawn_ids $spawn_id
}

# Run genload...

# Wait on all the stap commands to finish.
foreach id $spawn_ids {
    catch {close -i $id}; catch {wait -i $id}
}
====

When that foreach loop is finished, you can be sure that all the stap
invocations are finished and it is safe to look at the result file.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
References:
- [PATCH v2] Add new case for stress test
  - From: Zhou Wenjian
- Re: [PATCH v2] Add new case for stress test
  - From: David Smith
- Re: [PATCH v2] Add new case for stress test
  - From: "Zhou, Wenjian/åæå"
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]