Bug 10678 - vta-gcc: cannot find module nfs debuginfo: relocation refers to undefined symbol
Summary: vta-gcc: cannot find module nfs debuginfo: relocation refers to undefined symbol
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-21 12:30 UTC by Mark Wielaard
Modified: 2010-07-19 18:05 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2009-09-21 12:30:32 UTC
With a kernel compiled with gcc-vta (like on fedora rawhide), we get the following:
WARNING: cannot find module nfs debuginfo: relocation refers to undefined symbol

This causes the following testsuite failures:
systemtap.pass1-4/buildok.exp buildok/linuxmib-all-probes.stp
systemtap.pass1-4/buildok.exp buildok/nfsd-all-probes.stp
systemtap.pass1-4/buildok.exp buildok/rpc-embedded.stp

This is probably a gcc issue, but we need to track down what exactly is
happening before we can report it.
Comment 1 Mark Wielaard 2009-09-21 22:56:56 UTC
I am not sure I really understand the failure completely yet. But the issue
seems to be that we are trying to relocate the (nfs) kernel module, but the
kernel module contains relocations against symbols that are undefined, in this
case kmalloc_caches, which are defined in the kernel (vmlinux). But for a query
like stap -e 'probe module("nfs").function("*") { exit(); }' we don't seem to
have the kernel/vmlinux in the dwfl->modulelist, so we never find the symbol.
Obvious it would be even better if we don't even make libdwfl try to relocate
the ET_REL file.
Comment 2 Roland McGrath 2009-09-22 00:15:49 UTC
Ok, the case here is something that never came up before: the DWARF for an entry
in nfs.ko really does require relocation against a symbol defined in another
module.  This happens because there is a DW_AT_const_value on a variable that
has the compile-time constant value of &kmalloc_caches[10].  This is a proper
constant and it's the right thing for the compiler to produce.  In general, this
could happen in any module depending on another (not just depending on vmlinux).

In the medium term, this will be mitigated implicitly by the future libdw
behavior that won't care about this reloc until you want to evaluate that
DW_AT_const_value (or equivalent situation).  With that and nothing else, the
impact will be limited to the particular $foo being unresolvable when requested.

In the short term, it is pretty likely that this is only coming up with
references from foo.ko to vmlinux.  You can find all the other affected
.ko.debug's on hand easily by running tests/dwflmodtest -e foo.ko.debug on them
each and looking for the relocation error.  In all likelihood, -e vmlinux -e
foo.ko.debug will make that work on each of those.  If so, then it is an
adequate workaround for now to have stap just always include the "kernel" module
in the set admitted by its predicate.

In the long term/general case, you may have to figure out what other module is
required and go add it to the Dwfl.  In the medium term scenario, you could do
this lazily upon getting a "DWARF data requires relocation" error (and we can
make those give you the symbol details to make it easy)--or even possibly just
resolve those to a load-time intermodule reference in the generated module
(though that is trickier for a stap module loaded before its probee modules). 
In the short term scenario (i.e. current elfutils), you could do that
preemptively.  Whenever you want to do that, you can do it fairly easily by
using the modules.dep data.
Comment 3 Mark Wielaard 2009-09-22 11:52:11 UTC
(In reply to comment #2)
> In the short term, it is pretty likely that this is only coming up with
> references from foo.ko to vmlinux.  You can find all the other affected
> .ko.debug's on hand easily by running tests/dwflmodtest -e foo.ko.debug on them
> each and looking for the relocation error.  In all likelihood, -e vmlinux -e
> foo.ko.debug will make that work on each of those.  If so, then it is an
> adequate workaround for now to have stap just always include the "kernel" module
> in the set admitted by its predicate.

dwflmodtest points out there are 1195 (out of 1964) ko.debug files that have
unresolved relocations. Adding vmlinux resolves most issues. But there are still
31 left that keep having unresolved symbols.

sound/pci/vx222/snd-vx222.ko.debug
fs/cachefiles/cachefiles.ko.debug
drivers/i2c/busses/i2c-amd756-s4882.ko.debug
drivers/i2c/busses/i2c-nforce2-s4985.ko.debug
drivers/message/fusion/mptctl.ko.debug
drivers/message/i2o/i2o_proc.ko.debug
drivers/message/i2o/i2o_config.ko.debug
drivers/parport/parport_pc.ko.debug
drivers/scsi/pcmcia/qlogic_cs.ko.debug
drivers/isdn/hardware/avm/b1pcmcia.ko.debug
drivers/isdn/hardware/avm/b1pci.ko.debug
drivers/isdn/hardware/avm/t1pci.ko.debug
drivers/media/video/saa7134/saa7134-empress.ko.debug
drivers/media/video/saa7134/saa7134.ko.debug
drivers/media/video/cx88/cx88-alsa.ko.debug
drivers/media/video/cx88/cx88xx.ko.debug
drivers/media/video/ir-kbd-i2c.ko.debug
drivers/media/video/bt8xx/bttv.ko.debug
drivers/media/dvb/dm1105/dm1105.ko.debug
drivers/virtio/virtio_pci.ko.debug
drivers/net/ne2k-pci.ko.debug
drivers/net/wireless/orinoco/orinoco_tmd.ko.debug
drivers/net/wireless/orinoco/orinoco_plx.ko.debug
drivers/net/wireless/orinoco/orinoco_nortel.ko.debug
drivers/net/wireless/orinoco/orinoco_pci.ko.debug
net/irda/ircomm/ircomm.ko.debug
net/irda/irnet/irnet.ko.debug
net/dccp/dccp_ipv6.ko.debug
net/dccp/dccp_ipv4.ko.debug
net/mac80211/mac80211.ko.debug
net/atm/clip.ko.debug

Picking out virtio_pci for example, which refers to vring_interrupt in a
relocation, which is defined in virtio_ring.ko. So, it seems we need to do
something more clever than just adding vmlinux itself.
Comment 4 Frank Ch. Eigler 2009-09-22 16:08:44 UTC
(In reply to comment #2)
> Ok, the case here is something that never came up before: the DWARF for an entry
> in nfs.ko really does require relocation against a symbol defined in another
> module.

Could you spell out why this couldn't be represented the same
way as an unresolved reference to an extern symbol (SHF_UNDEF)?
Comment 5 Mark Wielaard 2009-09-23 08:13:37 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > Ok, the case here is something that never came up before: the DWARF for an entry
> > in nfs.ko really does require relocation against a symbol defined in another
> > module.
> 
> Could you spell out why this couldn't be represented the same
> way as an unresolved reference to an extern symbol (SHF_UNDEF)?

It isn't exactly like a plain undefined symbol. The issue is that some dwarf
construct uses a relocation based on a symbol address that cannot be resolved. I
guess Roland's medium term plan is to track such relocations to the affected
dwarf dies and mark them "incomplete" somehow. But currently we would just end
up with bogus dwarf info since we don't know which parts are affected and which
aren't.

It seems that for now the best we can do is make sure that the dwfl always
contains vmlinux and any depended module listed in
/lib/modules/<release>/modules.dep[.bin] just like depmod does.
Comment 6 Mark Wielaard 2009-10-01 22:39:21 UTC
commit ae06d951b9775c39b92913417c902eae0775e4b6
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Oct 2 00:28:46 2009 +0200

    PR10678 vta-gcc: module debuginfo: relocation refers to undefined symbol
    
    libdwfl tries to resolve all relocations in a module debuginfo file and
    if it cannot find a symbol used in a relocation it will fail when
    dwfl_module_getdwarf() is called. So we must make sure all possible
    dependencies of the module are also in the dwfl. We do this by trying
    to find and parse the modules.dep file and insert all dependencies
    into the dwfl.
    
    * setupdwfl.cxx (elfutils_kernel_path): Lift from setup_dwfl_kernel and
      make static.
      (is_comma_dash): New function.
      (modname_from_path): Likewise.
      (setup_mod_deps): Likewise.
      (setup_dwfl_report_kernel_p): Call setup_mod_deps().
    * testsuite/buildok/pr10678.stp: New test.
Comment 7 David Smith 2010-07-19 17:23:01 UTC
(In reply to comment #6)
> commit ae06d951b9775c39b92913417c902eae0775e4b6
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Fri Oct 2 00:28:46 2009 +0200
> 
>     * testsuite/buildok/pr10678.stp: New test.

This test probes 'module("ne2k_pci").function("ne2k_pci_open")'.  If a kernel
doesn't compile this module (or builds it into the kernel image itself) we'll
get an invalid failure here.  Are there other examples of this problem we could
probe?
Comment 8 Mark Wielaard 2010-07-19 18:05:02 UTC
(In reply to comment #7)
> This test probes 'module("ne2k_pci").function("ne2k_pci_open")'.  If a kernel
> doesn't compile this module (or builds it into the kernel image itself) we'll
> get an invalid failure here.  Are there other examples of this problem we could
> probe?

You can set a probe in any kernel module that shows up with unresolved
relocations. See a list of such kernel modules in comment #3 as gotten through
the dwflmodtest from elfutils as described in comment #2.