This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: frame unwinding patches


On Mon, 2017-04-03 at 23:15 +0200, Mark Wielaard wrote:
> On Mon, Apr 03, 2017 at 11:02:53AM +0200, Ulf Hermann wrote:
> > > Ping? Any progress on merging this functionality upstream?
> > > It can make quite a difference in unwinding.
> > 
> > The patches have also been in perfparser releases for over a year now. I
> > would like to see them upstream.
> 
> Yes, sorry. The patches looked fine. Then I thought I should try
> adding similar support to another arch to double check. And then
> I got distracted by something else. I'll get back to is asap and
> make sure they get in before the next release end of the month.

OK. Sorry for the delay. In general I like these patches. I even made a
i386 fp unwind backend function to play with the idea a bit more. See
attached. Does that look sane?

I do agree with Jan that frame pointer unwinding is notoriously
untrustworthy. Even with some sanity checks it is hard to know whether
you are doing a correct unwind. gdb gets away with it through pretty
advanced frame sniffers, which take a lot of low-level compiler
generation knowledge to get correct (and even then...). You only restore
the pc, stack and frame pointer registers (maybe incorrectly). So
afterwards, even if you got valid CFI data you might be stuck.

Ideally we provide the user with a flag to check how trustworthy a frame
is and/or an option to drop inexact unwinding completely. But that would
be an extension for another time. I do think that in the context of
dwfl_thread_getframes () it is appropriate. Currently all you can get
from a Dwfl_Frame is the pc anyway.

I do however have a problem with the testcases. I approved the
--allow-unknown option for the backtrace fp core tests (commit f9971cb)
but I now believe that was a mistake. It makes it really hard to see
whether the test actually tests anything. In fact for my i386 case the
testcase even succeeded when disabling the whole frame pointer unwinder
fallback...

I believe the issue with the missing symbol names is not caused by the
frame pointer unwinding not matching real function addresses, because
they should! But because of the way the testcase binaries are generated.
What you do is the following:

# gcc -static -O2 -fno-omit-frame-pointer -I ../ -I ../lib/ -D_GNU_SOURCE
#     -pthread -o backtrace.x86_64.fp.exec backtrace-child.c
#
# Then strip the .eh_frame and .eh_frame_hdr sections from the binary:
#
# strip -R .eh_frame backtrace.x86_64.fp.exec
# strip -R .eh_frame_hdr backtrace.x86_64.fp.exec
#
# The core is generated by calling the binary with --gencore

But the .eh_frame sections are allocated sections. Which means that by
removing them strip will alter the address layout of the program. Which
causes the symbol table addresses to not match up anymore.

I think they should be generated with -fno-asynchronous-unwind-tables to
avoid them being generated in the first place (in the main binary
functions, but keep those from the static library functions pulled in):

# gcc -static -O2 -fno-omit-frame-pointer -fno-asynchronous-unwind-tables \
#     -D_GNU_SOURCE -pthread -o tests/backtrace.i386.fp.exec -I. -Ilib \
#     tests/backtrace-child.c

That way the difference between having a frame pointer unwinder or not
is much clearer:

$ eu-stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core
PID 12044 - core
TID 12045:
#0  0x008a7416 __kernel_vsyscall
#1  0x08051ab9 raise
#2  0x080485c1 sigusr2
eu-stack: dwfl_thread_getframes tid 12045 at 0x80485c0 in [exe]: No DWARF information found
TID 12044:
#0  0x008a7416 __kernel_vsyscall
#1  0x0804ae30 pthread_join
#2  0x0804847c main
eu-stack: dwfl_thread_getframes tid 12044 at 0x804847b in [exe]: No DWARF information found

$ LD_LIBRARY_PATH=backends:libelf:libdw src/stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core
PID 12044 - core
TID 12045:
#0  0x008a7416 __kernel_vsyscall
#1  0x08051ab9 raise
#2  0x080485c1 sigusr2
#3  0x08048699 stdarg
#4  0x08048702 backtracegen
#5  0x0804871b start
#6  0x0804a7cf start_thread
#7  0x080746fe __clone
TID 12044:
#0  0x008a7416 __kernel_vsyscall
#1  0x0804ae30 pthread_join
#2  0x0804847c main
#3  0x08053188 __libc_start_main
#4  0x080481e1 _start
src/stack: dwfl_thread_getframes tid 12044 at 0x80481e0 in [exe]: No DWARF information found

Then if we check for one of the function names in the middle of the
frame pointer only unwound frames, like backtracegen, this would give us
a much better test of whether things worked correctly.

Could you try regenerating your test binaries with
-fno-asynchronous-unwind-tables? Then I'll try to adapt the testsuite to
check for the one of the frame pointer only unwound function names.

Thanks,

Mark
/* Get previous frame state for an existing frame state using frame pointers.
   Copyright (C) 2017 Red Hat, Inc.
   This file is part of elfutils.

   This file is free software; you can redistribute it and/or modify
   it under the terms of either

     * the GNU Lesser General Public License as published by the Free
       Software Foundation; either version 3 of the License, or (at
       your option) any later version

   or

     * the GNU General Public License as published by the Free
       Software Foundation; either version 2 of the License, or (at
       your option) any later version

   or both in parallel, as here.

   elfutils is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received copies of the GNU General Public License and
   the GNU Lesser General Public License along with this program.  If
   not, see <http://www.gnu.org/licenses/>.  */

#ifdef HAVE_CONFIG_H
# include <config.h>
#endif

#include <stdlib.h>
#include <assert.h>

#define BACKEND i386_
#include "libebl_CPU.h"

/* Register numbers for frame and stack pointers.  We take advantage of
   them being next to each other when calling getfunc and setfunc.  */
#define ESP 4
#define EBP (ESP + 1)

/* Most basic frame pointer chasing with EBP as frame pointer.
   PC = *(FP + 4), SP = FP + 8, FP = *FP.  */
bool
i386_unwind (Ebl *ebl __attribute__ ((unused)),
	     Dwarf_Addr pc __attribute__ ((unused)),
	     ebl_tid_registers_t *setfunc, ebl_tid_registers_get_t *getfunc,
	     ebl_pid_memory_read_t *readfunc, void *arg,
	     bool *signal_framep __attribute__ ((unused)))
{
  /* sp = 0, fp = 1 */
  Dwarf_Word regs[2];

  /* Get current stack and frame pointers.  */
  if (! getfunc (ESP, 2, regs, arg))
    return false;

  Dwarf_Word sp = regs[0];
  Dwarf_Word fp = regs[1];

  /* Sanity check.  We only support traditional stack frames.  */
  if (fp == 0 || sp == 0 || fp < sp)
    return false;

  /* Get the return address from the stack, it is our new pc.  */
  Dwarf_Word ret_addr;
  if (! readfunc (fp + 4, &ret_addr, arg) || ret_addr == 0)
    return false;

  /* Get new sp and fp.  */
  sp = fp + 8;
  if (! readfunc (fp, &fp, arg) || fp == 0 || sp >= fp)
    return false;

  /* Set new sp, fp and pc.  */
  regs[0] = sp;
  regs[1] = fp;
  if (! setfunc (ESP, 2, regs, arg) || ! setfunc (-1, 1, &ret_addr, arg))
    return false;

  return true;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]