With a kernel compiled with gcc-vta (like on fedora rawhide), we get the following: WARNING: cannot find module nfs debuginfo: relocation refers to undefined symbol This causes the following testsuite failures: systemtap.pass1-4/buildok.exp buildok/linuxmib-all-probes.stp systemtap.pass1-4/buildok.exp buildok/nfsd-all-probes.stp systemtap.pass1-4/buildok.exp buildok/rpc-embedded.stp This is probably a gcc issue, but we need to track down what exactly is happening before we can report it.
I am not sure I really understand the failure completely yet. But the issue seems to be that we are trying to relocate the (nfs) kernel module, but the kernel module contains relocations against symbols that are undefined, in this case kmalloc_caches, which are defined in the kernel (vmlinux). But for a query like stap -e 'probe module("nfs").function("*") { exit(); }' we don't seem to have the kernel/vmlinux in the dwfl->modulelist, so we never find the symbol. Obvious it would be even better if we don't even make libdwfl try to relocate the ET_REL file.
Ok, the case here is something that never came up before: the DWARF for an entry in nfs.ko really does require relocation against a symbol defined in another module. This happens because there is a DW_AT_const_value on a variable that has the compile-time constant value of &kmalloc_caches[10]. This is a proper constant and it's the right thing for the compiler to produce. In general, this could happen in any module depending on another (not just depending on vmlinux). In the medium term, this will be mitigated implicitly by the future libdw behavior that won't care about this reloc until you want to evaluate that DW_AT_const_value (or equivalent situation). With that and nothing else, the impact will be limited to the particular $foo being unresolvable when requested. In the short term, it is pretty likely that this is only coming up with references from foo.ko to vmlinux. You can find all the other affected .ko.debug's on hand easily by running tests/dwflmodtest -e foo.ko.debug on them each and looking for the relocation error. In all likelihood, -e vmlinux -e foo.ko.debug will make that work on each of those. If so, then it is an adequate workaround for now to have stap just always include the "kernel" module in the set admitted by its predicate. In the long term/general case, you may have to figure out what other module is required and go add it to the Dwfl. In the medium term scenario, you could do this lazily upon getting a "DWARF data requires relocation" error (and we can make those give you the symbol details to make it easy)--or even possibly just resolve those to a load-time intermodule reference in the generated module (though that is trickier for a stap module loaded before its probee modules). In the short term scenario (i.e. current elfutils), you could do that preemptively. Whenever you want to do that, you can do it fairly easily by using the modules.dep data.
(In reply to comment #2) > In the short term, it is pretty likely that this is only coming up with > references from foo.ko to vmlinux. You can find all the other affected > .ko.debug's on hand easily by running tests/dwflmodtest -e foo.ko.debug on them > each and looking for the relocation error. In all likelihood, -e vmlinux -e > foo.ko.debug will make that work on each of those. If so, then it is an > adequate workaround for now to have stap just always include the "kernel" module > in the set admitted by its predicate. dwflmodtest points out there are 1195 (out of 1964) ko.debug files that have unresolved relocations. Adding vmlinux resolves most issues. But there are still 31 left that keep having unresolved symbols. sound/pci/vx222/snd-vx222.ko.debug fs/cachefiles/cachefiles.ko.debug drivers/i2c/busses/i2c-amd756-s4882.ko.debug drivers/i2c/busses/i2c-nforce2-s4985.ko.debug drivers/message/fusion/mptctl.ko.debug drivers/message/i2o/i2o_proc.ko.debug drivers/message/i2o/i2o_config.ko.debug drivers/parport/parport_pc.ko.debug drivers/scsi/pcmcia/qlogic_cs.ko.debug drivers/isdn/hardware/avm/b1pcmcia.ko.debug drivers/isdn/hardware/avm/b1pci.ko.debug drivers/isdn/hardware/avm/t1pci.ko.debug drivers/media/video/saa7134/saa7134-empress.ko.debug drivers/media/video/saa7134/saa7134.ko.debug drivers/media/video/cx88/cx88-alsa.ko.debug drivers/media/video/cx88/cx88xx.ko.debug drivers/media/video/ir-kbd-i2c.ko.debug drivers/media/video/bt8xx/bttv.ko.debug drivers/media/dvb/dm1105/dm1105.ko.debug drivers/virtio/virtio_pci.ko.debug drivers/net/ne2k-pci.ko.debug drivers/net/wireless/orinoco/orinoco_tmd.ko.debug drivers/net/wireless/orinoco/orinoco_plx.ko.debug drivers/net/wireless/orinoco/orinoco_nortel.ko.debug drivers/net/wireless/orinoco/orinoco_pci.ko.debug net/irda/ircomm/ircomm.ko.debug net/irda/irnet/irnet.ko.debug net/dccp/dccp_ipv6.ko.debug net/dccp/dccp_ipv4.ko.debug net/mac80211/mac80211.ko.debug net/atm/clip.ko.debug Picking out virtio_pci for example, which refers to vring_interrupt in a relocation, which is defined in virtio_ring.ko. So, it seems we need to do something more clever than just adding vmlinux itself.
(In reply to comment #2) > Ok, the case here is something that never came up before: the DWARF for an entry > in nfs.ko really does require relocation against a symbol defined in another > module. Could you spell out why this couldn't be represented the same way as an unresolved reference to an extern symbol (SHF_UNDEF)?
(In reply to comment #4) > (In reply to comment #2) > > Ok, the case here is something that never came up before: the DWARF for an entry > > in nfs.ko really does require relocation against a symbol defined in another > > module. > > Could you spell out why this couldn't be represented the same > way as an unresolved reference to an extern symbol (SHF_UNDEF)? It isn't exactly like a plain undefined symbol. The issue is that some dwarf construct uses a relocation based on a symbol address that cannot be resolved. I guess Roland's medium term plan is to track such relocations to the affected dwarf dies and mark them "incomplete" somehow. But currently we would just end up with bogus dwarf info since we don't know which parts are affected and which aren't. It seems that for now the best we can do is make sure that the dwfl always contains vmlinux and any depended module listed in /lib/modules/<release>/modules.dep[.bin] just like depmod does.
commit ae06d951b9775c39b92913417c902eae0775e4b6 Author: Mark Wielaard <mjw@redhat.com> Date: Fri Oct 2 00:28:46 2009 +0200 PR10678 vta-gcc: module debuginfo: relocation refers to undefined symbol libdwfl tries to resolve all relocations in a module debuginfo file and if it cannot find a symbol used in a relocation it will fail when dwfl_module_getdwarf() is called. So we must make sure all possible dependencies of the module are also in the dwfl. We do this by trying to find and parse the modules.dep file and insert all dependencies into the dwfl. * setupdwfl.cxx (elfutils_kernel_path): Lift from setup_dwfl_kernel and make static. (is_comma_dash): New function. (modname_from_path): Likewise. (setup_mod_deps): Likewise. (setup_dwfl_report_kernel_p): Call setup_mod_deps(). * testsuite/buildok/pr10678.stp: New test.
(In reply to comment #6) > commit ae06d951b9775c39b92913417c902eae0775e4b6 > Author: Mark Wielaard <mjw@redhat.com> > Date: Fri Oct 2 00:28:46 2009 +0200 > > * testsuite/buildok/pr10678.stp: New test. This test probes 'module("ne2k_pci").function("ne2k_pci_open")'. If a kernel doesn't compile this module (or builds it into the kernel image itself) we'll get an invalid failure here. Are there other examples of this problem we could probe?
(In reply to comment #7) > This test probes 'module("ne2k_pci").function("ne2k_pci_open")'. If a kernel > doesn't compile this module (or builds it into the kernel image itself) we'll > get an invalid failure here. Are there other examples of this problem we could > probe? You can set a probe in any kernel module that shows up with unresolved relocations. See a list of such kernel modules in comment #3 as gotten through the dwflmodtest from elfutils as described in comment #2.