This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [tests patchv2] FYI unwinder: tests/ update

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Fri, 29 Nov 2013 17:02:26 +0100
Subject: Re: [tests patchv2] FYI unwinder: tests/ update
On Sat, 2013-11-23 at 21:57 +0100, Jan Kratochvil wrote:
> On Tue, 12 Nov 2013 21:11:43 +0100, Mark Wielaard wrote:
> > As a side note, I see it depends on new conditional BIARCH.
> > It would probably be a good idea if configure warned when detecting a
> > 64bit host, that didn't support biarch. So the user knows some tests
> > won't be run.
> 
> Done.
> 	m4/biarch.m4
> 	AC_MSG_WARN([not running biarch tests, $biarch_CC does not work])

Nice, thanks.

> > Although if I am reading the patch correct the backtrace-child-biarch
> > test program is still being build in that case. But then it is just the
> > host arch? Yes, it is, there is a comment in run-backtrace.sh that says
> > so.
> 
> This has been fixed now:
> 
> Currently the *-biarch.sh tests report SKIP if not supported:
> 	SKIP: run-backtrace-native-biarch.sh
> 	PASS: run-backtrace-native.sh
> 	SKIP: run-backtrace-native-core-biarch.sh
> 	PASS: run-backtrace-native-core.sh
> Using a bit ugly
> 	if !BIARCH
> 	export ELFUTILS_DISABLE_BIARCH = 1
> 	endif
> 
> One could just omit those unsupported testcases but that does not seem right
> to me.
> 
> One could also SKIP them by some less ugly hack than the environment variable.

SKIP them (with a hack) seems fine to me.

> > A little explanation of the general design of the test(s) would be
> > appreciated. Just something in run-backtrace.sh that explains what
> > program is called to do what, which signals are send to trigger which
> > actions and which parts are generic and which are x86 specific.
> 
> TBH I do not put so high quality + documentation requirements on testcases,
> hey are just the testcases...

But it should still be clear what they are testing. If they fail after
some code update someone still needs to be able to debug them. And since
this is for a new feature people will take a look at them just to see
how the new feature is used (so then, at least some comment saying
whether or not the usage is normal or exceptional/strange - just to test
a corner case, would be good).

> > I don't really understand the need for backtrace-data.
> 
> I had to write it as an example code anyway so I included it therein.  Also
> this public part of API would not be really tested without this testcase.
> But sure it is not a real test, it does not test anything much interesting.
> 
> But I have put there run-backtrace-data.sh and added it also to TESTS.

Good. So it is mostly a test of whether the api makes sense.

> > Does CC_BIARCH also take care of any include or library flags for 32bit?
> > Or is it assumed that all it takes is adding -m32?
> > 
> > OK, testing on at least fedora and debian seems to confirm you don't
> > need anything else as long as you have to right multilib/32bit
> > compiler/libs installed (gcc-multilib and libc6-dev-i386 on debian).
> 
> m4/biarch.m4 has been written by Roland.
> 
> But for example recent Ubuntu includes support for x32 which is not handled by
> elfutils at all I think.

No, it isn't indeed. I don't know how hard adding a backend for that
would be. We can see how to support it in the tests if someone thinks it
is important enough to provide one. Not really a priority IMHO.

> > Why not build backtrace-child-biarch pie?
> 
> It is built as PIE.  I have put there just a new comment:
> 
> # backtrace-child-biarch also uses those *_CFLAGS and *_LDLAGS variables:
> backtrace_child_CFLAGS = -fPIE
> backtrace_child_LDFLAGS = -pie -pthread

Thanks, I missed that.

> > It can be called with three options: --ptraceme, --gencore or --run.
> > 
> > It will always first call three dummy functions that are empty no clone,
> > no inline asm volatile ("") functions. (why?)
> 
> noinline is probably clear, Jakub Jelinek always also puts there noclone,
> which in fact is also clear why.  That asm volatile ("") is needed otherwise
> static noinline noclone empty function is completely omitted from its caller.

OK. I understand that it is used to make sure they aren't
omitted/inlined completely. But why are they called in the first place?
And why are they in the test source code?

> > Then it will create a new
> > thread that calls a function "start" which will call a function
> > "backtracegen" which will call a vararg no-return function "stdarg" that
> > will install a SIGUSR2 signal handler ("sigusr2").
> > 
> > If --ptraceme is given both the main and start thread will call ptrace
> > traceme.
> > 
> > If --gencore is given the main thread call pthread_join on the start
> > thread. The start thread will call the sigusr2 handler (directly, not
> > through raising a signal), which will then call abort().
> > 
> > If --run is given the main thread will raise (SIGUSR2) and the start
> > thread will, depending on whether it is build on x86_64 either raise
> > SIGUSR1 or call the sigusr2 handler directly (without raising the
> > signal). The sigusr2 handler will then raise SIGUSR1 itself.
> 
> I have put into backtrace-child.c this comment:
> 
> /* Command line syntax: ./backtrace-child [--ptraceme|--gencore] --run
>    --ptraceme will call ptrace (PTRACE_TRACEME) in the two threads.
>    --gencore will call abort () at its end.
>    --run is always required.
>    Main thread will signal SIGUSR2.  Other thread will signal SIGUSR1.
>    On x86_64 only:
>      PC will get changed to function 'jmp' by backtrace.c function
>      prepare_thread.  Then SIGUSR2 will be signalled to backtrace-child
>      which will invoke function sigusr2.
>    On non-x86_64:
>      sigusr2 gets called by normal function call from function stdarg.
>    On any arch then sigusr2 calls raise (SIGUSR1) for --ptraceme.
>    abort () is called otherwise, expected for --gencore core dump.
> 
>    Expected x86_64 output:
>    TID 10276:
>    # 0 0x7f7ab61e9e6b      raise
>    # 1 0x7f7ab661af47 - 1  main
>    # 2 0x7f7ab5e3bb45 - 1  __libc_start_main
>    # 3 0x7f7ab661aa09 - 1  _start
>    TID 10278:
>    # 0 0x7f7ab61e9e6b      raise
>    # 1 0x7f7ab661ab3c - 1  sigusr2
>    # 2 0x7f7ab5e4fa60      __restore_rt
>    # 3 0x7f7ab661ab47      jmp
>    # 4 0x7f7ab661ac92 - 1  stdarg
>    # 5 0x7f7ab661acba - 1  backtracegen
>    # 6 0x7f7ab661acd1 - 1  start
>    # 7 0x7f7ab61e2c53 - 1  start_thread
>    # 8 0x7f7ab5f0fdbd - 1  __clone
> 
>    Expected non-x86_64 (i386) output; __kernel_vsyscall are skipped if found:
>    TID 10408:
>    # 0 0xf779f430          __kernel_vsyscall
>    # 1 0xf7771466 - 1      raise
>    # 2 0xf77c1d07 - 1      main
>    # 3 0xf75bd963 - 1      __libc_start_main
>    # 4 0xf77c1761 - 1      _start
>    TID 10412:
>    # 0 0xf779f430          __kernel_vsyscall
>    # 1 0xf7771466 - 1      raise
>    # 2 0xf77c18f4 - 1      sigusr2
>    # 3 0xf77c1a10 - 1      stdarg
>    # 4 0xf77c1a2c - 1      backtracegen
>    # 5 0xf77c1a48 - 1      start
>    # 6 0xf77699da - 1      start_thread
>    # 7 0xf769bbfe - 1      __clone
>    */

That is very helpful. Thanks.
It doesn't explain why --run is always required, that seems odd to me.

> > (This is really confusing, why the difference between x86_64 and the
> > rest?)
> 
> 1. Changing inferior PC is arch-dependent (PTRACE_POKEUSER on x86_64).
> 2. Some of the arch-independent tests like this one I find sufficient to run
>    only on one arch.  And everyone has x86_64.
> 3. I found it good enough.  One could port it too all the other archs but see
>    item 2.

OK. I don't think it necessarily needs to be ported to each arch, but
having enough documentation in the test to explain what is arch
dependent and why is good.

I am not really clear on why the changing the PC is needed though.

> > There is also an assumption that if the process is build on x86_64 and
> > ptraced and a SIGUSR1 signal is caught by the tracee that the PC will be
> > adjusted to call the sigusr2 handler (this is also makes this test case
> > somewhat hard to follow IMHO).
> 
> Yes.

But why is that? What does it add to the test?

> > There are a couple of aborts in this code which I am unsure of whether
> > they are meant to be ever called/reached or not. A comment would be nice
> > to tell apart the abort () calls that are meant to actually generate the
> > core file and those that are there to make sure the test case fails in
> > an erroneous state.
> 
> I have extended the comment at:
>   /* This abort () call is reached to dump core for --gencore.
>      Without --gencore it should not be reached.  */
>   abort ();
> 
> There was already this comment
>   /* Not reached, signal will get ptrace-spawn to jump into sigusr2.  */
>   abort ();
> 
> I have put new comment here:
>   sigusr2 (SIGUSR2);
>   /* Not reached.  */
>   abort ();
> 
> here:
>   backtracegen ();
>   /* Not reached.  */
>   abort ();
> 
> And here:
>     raise (SIGUSR2);
>   /* Not reached */
>   abort ();

Thanks, that is helpful.

> > > +/* Test child for parent backtrace test.
> > > [...]
> > > +/* Execution will arrive here from jmp by an artificial ptrace-spawn signal.  */
> > > +
> > > +static void
> > > +sigusr2 (int signo)
> > > +{
> > > +  assert (signo == SIGUSR2);
> > > +  if (! gencore)
> > > +    raise (SIGUSR1);
> > > +
> > > +  /* Catch the .plt jump, it will come from this abort call.  */
> > > +  abort ();
> > > +}
> > 
> > Could you explain the .plt part a bit more. This seems to be the most
> > tricky part since it seems to rely on a fairly (2 years) recent
> > binutils. Is it specific to how you call abort here?
> 
> I have removed the .plt testing code now.

I don't really understand why it was there in the first place, so cannot
tell whether that is good or bad.

> > It is fine for a test to depend on a good toolchain and FAIL if the
> > toolchain produces broken/missing frame data. But if at all possible
> > lets split that out in its own test.
> 
> As a separate test it would be better to include the Jakub's gcc testcase for
> DWARF expressions, which I have therefore did now.

That is a cool test to have.

> > > +++ b/tests/backtrace.c
> > > [...]
> > > +/* Older systems (such as RHEL-5; fixed as binutils PR ld/12570 on 2011-06-20)
> > > +   will fail this testcase as they do not provide CFI for the .plt section.
> > > +   elfutils 'portable' patch/branch disables the .plt unwind test.  */
> > 
> > It would be nice to add how they fail. So the user can see if it is
> > this .plt case or something else. And as mentioned above can we split
> > out this .plt corner case in its own test?
> 
> As written above I have removed the .plt stuff as you seem too much concerned
> about the obsolete systems, despite it was removed in the 'portable' branch.
> 
> At least it has forced me to include the more complete Jakub's test, thanks.

I am not so concerned about obsolute systems. I just didn't understand
what precisely the .plt part was testing and it looked very complicated.
So I couldn't really say whether the complexity was worth it or not. If
you think the thing you wanted to test by including the .plt testing is
covered now by the new test that is good.

> > OK. nitpick call this dwfl_from_pid.
> 
> Couldn't you choose a different name?  IMO dwfl_* names are reserved for
> libdwfl.

Yes, good point. Lets not call it something starting with dwfl_. Just
keep the name as is.

> > > +static const char *executable;
> > > +
> > > +static int
> > > +find_elf (Dwfl_Module *mod, void **userdata, const char *modname,
> > > +	  Dwarf_Addr base, char **file_name, Elf **elfp)
> > > +{
> > > +  if (executable && modname != NULL
> > > +      && (strcmp (modname, "[exe]") == 0 || strcmp (modname, "[pie]") == 0))
> > > +    {
> > > +      char *executable_dup = strdup (executable);
> > > +      if (executable_dup)
> > > +	{
> > > +	  free (*file_name);
> > > +	  *file_name = executable_dup;
> > > +	  return -1;
> > > +	}
> > > +    }
> > > +  return dwfl_build_id_find_elf (mod, userdata, modname, base, file_name, elfp);
> > > +}
> > 
> > Is this still needed now that dwfl_core_file_report takes an executable?
> 
> Yes, unfortunately it is.  It predates the parameter.  It is planned to be
> simplified but so far I would fine OK to keep it as is, the "wrong" code is
> already checked in.

OK, but lets make sure we fix dwfl_build_id_find_elf to do this directly
too.

> > > +static Dwfl *
> > > +corefile_to_dwfl (const char *corefile)
> > > +{
> > > +  return report_corefile (dwfl_offline (), corefile);
> > > +}
> > 
> > OK. nitpick call it dwfl_from_corefile?
> 
> Again, I find dwfl_* prefix as reserved, don't you?

Yes, please ignore my silly nitpicks.

> > Currently when we run tests under valgrind we use --trace-children=yes
> > which will cause this to launch valgrind as main executable instead. But
> > don't worry. I'll fix that when I get to fixing the other valgrind issue
> > we found.
> 
> Yes, sorry, the valgrind run does not seem to be currently working anyway.

No worries, just ignore it for now. I first have to fix valgrind to work
properly for our testsuite.

> > > +  /* Make it true on x86_64 with i386 inferior.  */
> > > +  int disable = ehdr->e_ident[EI_CLASS] == ELFCLASS32;
> > 
> > This could use an explanation of "it". This doesn't give a clue of what
> > is being disabled.
>
> There is now instead:
>   /* It is false also on x86_64 with i386 inferior.  */
>   bool is_x86_64;

Aha, OK.

> > > +#ifdef __x86_64__
> > > +  Dwarf_Addr plt_start = scn_shdr->sh_addr + loadbase;
> > > +  Dwarf_Addr plt_end = plt_start + scn_shdr->sh_size;
> > > +  void (*jmp) (void);
> > > +  if (! disable)
> > > +    {
> > > +      int nsym = dwfl_module_getsymtab (data.mod);
> > > +      int symi;
> > > +      for (symi = 1; symi < nsym; ++symi)
> > > +	{
> > > +	  GElf_Sym symbol;
> > > +	  const char *symbol_name = dwfl_module_getsym (data.mod, symi, &symbol, NULL);
> > > +	  if (symbol_name == NULL)
> > > +	    continue;
> > > +	  switch (GELF_ST_TYPE (symbol.st_info))
> > > +	    {
> > > +	    case STT_SECTION:
> > > +	    case STT_FILE:
> > > +	    case STT_TLS:
> > > +	      continue;
> > > +	    default:
> > > +	      if (strcmp (symbol_name, "jmp") != 0)
> > > +		continue;
> > > +	      break;
> > > +	    }
> > > +	  /* LOADBASE is already applied here.  */
> > > +	  jmp = (void (*) (void)) (uintptr_t) symbol.st_value;
> > > +	  break;
> > > +	}
> > > +      assert (symi < nsym);
> > > +    prepare_thread (pid2, plt_start, plt_end, jmp);
> > > +    }
> > > +#endif
> > 
> > Like above, this could use a comment for porters to explain what address
> > is looked up and why. I wonder if this could be done a little more
> > portable by having the child just print the needed address and the
> > parent reading that. Maybe even use a pipe between the two for
> > synchronization.
> 
> Isn't it easier this dwfl_module_getsym way?  It is difficult to debug the
> multi-process piping things IMO.
> 
> I have put there:
>       // Find inferior symbol named "jmp".

OK. But that doesn't really explain what it is used for and why it is
x86_64 specific.

> > This really could use some documentation or better argument parsing IMHO.
> > Why not have simpler arguments:
> > 
> > --backtrace-child
> > --backtrace-pid <pid>
> > --backtrace-core <corefile>
> > --backtrace-exe <exefile>
> > --backtrace-exe-core <exefile> <corefile>
> > 
> > But if I read run-backtrace.sh correctly you only need the last two. It
> > might make the testcase a little more readable/easier to follow to just
> > drop the other cases.
> 
> TBH I use ./tests/backtrace as a general backtracing tool as src/stack is too
> conformant / complicated to use / with less verbose output.
> 
> While it may be more nice for the testcase having to type --backtrace-exe etc.
> etc. each time during debugging/troubleshooting I find very inconvenient.
> 
> Do you agree here?

Not really, because I don't see the logic of the current argument
parsing. But at least the command line syntax is now documented. It is
hard to see this as some general backtracing tool since it has so many
x86_64 specific hacks in it.

> > > +++ b/tests/run-backtrace.sh
> [...]
> > > +  # Backtrace core file.
> > > +  core="core.`ulimit -c unlimited; set +ex; testrun ${abs_builddir}/$child --gencore --run; true`"
> > 
> > Why the true? Don't you want to fail early if the core file cannot be generated?
> 
> Without 'true' the script will fail due to 'set -e'.
> 
> After program dumps a core its exit code is not zero.  'true' overrides it
> there.

Aha. How inconvenient. So you cannot see the difference between whether
the core file was actually generated or not because of some failure?

> Done, now:
> 	tests/run-backtrace-core-i386.sh
> 	tests/run-backtrace-core-x86_64.sh
> 	tests/run-backtrace-data.sh
> 	tests/run-backtrace-dwarf.sh
> 	tests/run-backtrace-native-biarch.sh
> 	tests/run-backtrace-native-core-biarch.sh
> 	tests/run-backtrace-native-core.sh
> 	tests/run-backtrace-native.sh

Thanks for splitting the test.

Next up, actually reviewing the new patch.

Cheers,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]