This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Prelinking on ARM with Debug Link


Hi Torsten,

Sorry this is taking so long. I don't believe I fully understand the
issue. Because I don't immediately know what the right answer is I put
it away for a bit. Then it takes time again to get the full context
(which I believe I still don't fully have).

On Fri, 2016-04-01 at 23:18 +0200, Torsten Polle wrote:
> > Am 01.04.2016 um 15:07 schrieb Mark Wielaard <mjw@redhat.com>:
> > On Wed, 2016-03-30 at 22:05 +0200, Torsten Polle wrote:
> >>> Am 28.02.2016 um 21:51 schrieb Torsten Polle <Torsten.Polle@gmx.de>: 
> >>>> Am 23.02.2016 um 23:16 schrieb Torsten Polle <Torsten.Polle@gmx.de>:
> >>>>> Am 23.02.2016 um 17:46 schrieb Mark Wielaard <mjw@redhat.com>:
> >>>>> 
> >>>>> On Mon, 2016-02-22 at 22:45 +0100, Torsten Polle wrote:
> >>>>>> the principle idea of your patch works. But I had to rewrite it to
> >>>>>> really work in my environment. Could you please have a look at my
> >>>>>> proposal? Would something like that be acceptable?
> >>>>> 
> >>>>> Could you give an example of what didn't work?
> >>>>> You cast both debug_frame_off and dwbias to uint32_t before
> >>>>> substracting. Is that really necessary/correct? Both are Dwarf_Addrs
> >>>>> which are always uint64_t. And we care about the difference here.
> >>>> 
> >>>> In my case debug_frame_off is larger than dwbias. If for instance debug_frame_off is 0 and dwbias is -1, the result is -1. As the arithmethics on the 64bit host is unsigned, the result is 0xffffffffffffffff in hex notation and 18446744073709551615 in decimal notation. As stap-symbols.h is used for the target, the values are too large for the 32bit wide unsigned int, which is the data type for the field sec_load_offset. Therefore the compiler complains. The type cast to uint32_t forces the host compiler to use only a 32bit value. Therefore the result is 0xffffffff (4294967295). At least that is what happens on my host. I need the âuâ because even 4294967295 is considered to be too large.
> >>>> 
> >>>>>> I would like to double check in my environment if we could fall back to the hex notation again.
> >>>>> 
> >>>>> Isn't it simpler and consistent to just always use decimal in this case?
> >>>>> There should at least be a comment why we use a different notation for
> >>>>> elf32 vs elf64 targets.
> >>>> 
> >>>> I would go for a consistent notation. Personally, I prefer the hex notation for the addresses. I hope to be able to check whether the hex notation with casts works tomorrow and get back to you.
> >> 
> >>> Please find a new version of the patch attached. I changed the patch to use hexadecimal numbers consistently.
> >>> 
> >>> Kind Regards,
> >>> Torsten
> >>> <0001-PATCH-v2-Fix-Compilation-fails-for-prelinked-librari.patch>
> >> 
> >> any comments or thoughts on my proposal?
> > 
> > Sorry I had forgotten about this issue.
> > In your example you have a debug_frame_off with zero and a dwbias of -1.
> > That looks somewhat suspicious. Are we sure that isn't a special case?
> 
> No, this is not a special case. Itâs just a simplified example to illustrate the calculations. ;-)
> 
> In my initial mail [1], you can see the original values and calculations.
> 
> > The offending line in stap-symbols.h looks as follows:
> > .sec_load_offset = 0xffffffffffffebc0 /* 4dec8000 4dec9440 */
> 
> The values in the comment are the result of the following instrumentation.
> 
> >                  Dwarf_Addr dwbias = 0;
> >                  dwfl_module_getdwarf (m, &dwbias);
> >                  c->output << ".sec_load_offset = 0x"
> >                                    << hex << debug_frame_off - dwbias << dec
> >                                    << "/* "  << hex << debug_frame_off << " " << dwbias << dec << "*/"
> >                                    << "\n";
> 
> This means the actual values are 
> debug_frame_off = 0x4dec8000
> dwbias          = 0x4dec9440
> 
> > The offset of the .text section in the non-prelinked library is 0x15c40. 
> > The offset of the .text section in the prelinked library is 0x4decf080 - 0xdeb8000 = 0x17080.
> > 
> > This means that .sec_load_offset is exactly the difference 0x15c40 - 0x17080 ~ 0xffffffffffffebc0.

OK. So the issue is that we use a hex representation of a negative
number that might not fit when converted to an unsigned value.
The simplest fix seems to be to just use a (signed) decimal
representation, that can be converted to the unsigned value on
assignment. But you prefer to use the hex representation instead?

I don't completely understand what sec_load_offset represents and why it
is an unsigned value. We use it in two places:

runtime/sym.c (_stp_linenumber_lookup):

  // if addr is a kernel address, it will need to be adjusted
  if (!task)
    {
      int i;
      unsigned long offset = 0;
      // have to factor in the load_offset of (specifically) the .text section
      for (i=0; i<m->num_sections; i++)
        if (!strcmp(m->sections[i].name, ".text"))
          {
            offset = (m->sections[i].static_addr - m->sections[i].sec_load_offset);
            break;
          }

      if (addr < offset)
        return 0;
      addr = addr - offset;
    }

runtime/unwind.c (adjustStartLoc):

  if (is_ehframe)
    return startLoc + vm_addr;
  else
    return startLoc + vm_addr - s->sec_load_offset;

The first is clearly only used for the kernel. The second is only used
for .debug_frame addresses. And if you read the rest of the code
(".dynamic" is user space dynamic library (ET_DYN), ".absolute" is user
space executable (ET_EXEC), the kernel has a special name "_stext" and
only kernel modules have section names attached because they are
ET_REL), it looks like sec_load_offset is only used when handling kernel
modules.

This makes sense. Except for kernel modules all other "modules" exist of
just one executable segment/section. But kernel modules (ET_REL files)
can have multiple executable sections with different names and offsets.

So in practice sec_load_offset is only used for kernel modules to
calculate the section static_addr - sec_load_offset.

Given that I think we are solving the wrong problem. Calculating
sec_load_offset for anything except kernel modules doesn't make sense.
So maybe the simplest solution is the attached patch?

It is odd that we only seem to do this for the ".text" section. A kernel
module can have multiple code sections with different names. This isn't
a regression since we never seemed to handle any other section. But if
there are problems unwinding through some kernel modules then this might
explain it.

Cheers,

Mark
diff --git a/translate.cxx b/translate.cxx
index 5daf725..f10c98d 100644
--- a/translate.cxx
+++ b/translate.cxx
@@ -6911,10 +6911,17 @@ dump_unwindsym_cxt (Dwfl_Module *m,
 		    << "_debug_frame_hdr_" << secidx << ",\n";
           c->output << ".debug_hdr_len = " << debug_frame_hdr_len << ", \n";
 
-	  Dwarf_Addr dwbias = 0;
-	  dwfl_module_getdwarf (m, &dwbias);
-	  c->output << ".sec_load_offset = 0x"
-		    << hex << debug_frame_off - dwbias << dec << "\n";
+	  /* Only kernel modules has section names. And the only section
+	     name we are currently interested in (see above) is ".text".  */
+	  if (secname == ".text")
+	    {
+	      Dwarf_Addr dwbias = 0;
+	      dwfl_module_getdwarf (m, &dwbias);
+	      c->output << ".sec_load_offset = 0x"
+			<< hex << debug_frame_off - dwbias << dec << "\n";
+	    }
+	  else
+	    c->output << ".sec_load_offset = 0\n";
 
 	  c->output << "#else\n";
 	  c->output << ".debug_hdr = NULL,\n";

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]