[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name



The ksymtab address entry might not be actually pointing to a GLOBAL
visible symbol in .symtab of the same binary. E.g. that happens when
compiling the kernel with clang's -fsanitize-cfi* and friends. In
contrast, the ksymtab.name entry has to match an entry in .symtab. It
would otherwise upset linkers very much. So, rather than relying on the
address being resolvable in any way, we might as well look up the symbol
name in the .symtab directly as this is the symbol we later want to
analyze anyway and ksymtab.name has to match a visible symbol. In this
patch we fall back to a name lookup if anything we tried so far was not
successful. That makes the patch a bit more reviewable.

On the other hand, we should be able to entirely rely on the name based
lookup and drop the address based one. That would require some
refactoring that I did not start at this point.

I created some local test cases to validate my work. As of now,
tests/test-read-dwarf.cc has no explicit support for the
linux_kernel_mode, hence is not fully capable of asserting this
implementation.

Note, there is one additional case missing: If we have CFI binaries with
position relative relocations, we also need to touch how
populate_symbol_map_from_ksymtab_reloc works to do a similar method
there.

Long story short: if we decide to go this approach, I will refactor the
code and add tests accordingly and fill the gap for the missing cases.

Yes, this patch is a bit hacky at this point, but is supposed ot proof
that name based lookup can work here.

	* src/abg-dwarf-reader.cc (populate_symbol_map_from_ksymtab):
	add support for symtab lookup by ksymtab.name entries.
	(try_reading_first_ksymtab_entry): Likewise.

Signed-off-by: Matthias Maennich <maennich@google.com>
---
 src/abg-dwarf-reader.cc | 135 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 85b8e2461721..d1a220643945 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -7681,6 +7681,66 @@ public:
 
     symbol_address = maybe_adjust_fn_sym_address(symbol_address);
     symbol = lookup_elf_symbol_from_address(symbol_address);
+
+    // the ksymtab.addr might not be a match in .symtab.addr (e.g. if CFI is
+    // enabled. We can still lookup the symbol by name in the symtab.
+    if (!symbol)
+      {
+	bytes += symbol_value_size;  // advance now to the ksymtab.name entry
+	if (position_relative_relocations)
+	  {
+	    int32_t offset = 0;
+	    ABG_ASSERT(read_int_from_array_of_bytes(bytes, symbol_value_size,
+						    is_big_endian, offset));
+	    GElf_Shdr section_header;
+	    gelf_getshdr(section, &section_header);
+	    // the actual symbol address is relative to its position. Since we
+	    // do not know the position, we take the beginning of the section,
+	    // add the read_offset that we might have and finally apply the
+	    // offset we read from the section.
+	    symbol_address = section_header.sh_addr + read_offset + offset;
+	  }
+	else
+	  ABG_ASSERT(read_int_from_array_of_bytes(
+	      bytes, symbol_value_size, is_big_endian, symbol_address));
+
+	// the ksymtab.name entry has a corresponding entry in ksymtab_strings
+	// that we now discover.
+
+	Elf_Data* ksymtab_strings
+	    = elf_getdata(find_ksymtab_strings_section(), 0);
+	char*	  strings = reinterpret_cast<char*>(ksymtab_strings->d_buf);
+	GElf_Shdr section_header;
+	gelf_getshdr(find_ksymtab_strings_section(), &section_header);
+
+	GElf_Addr ksymtab_strings_offset
+	    = symbol_address - section_header.sh_addr;
+
+	if (ksymtab_strings_offset < section_header.sh_size)
+	  {
+	    const std::string& name
+		= strings + symbol_address - section_header.sh_addr;
+
+	    // now we got the name, let's find a match in the function or
+	    // variable symbols
+
+	    string_elf_symbols_map_type::const_iterator it
+		= fun_syms().find(name);
+	    if (it != fun_syms().end())
+	      {
+		symbol = it->second[0];
+	      }
+
+	    if (!symbol)
+	      {
+		it = var_syms().find(name);
+		if (it != var_syms().end())
+		  {
+		    symbol = it->second[0];
+		  }
+	      }
+	  }
+      }
     return symbol;
   }
 
@@ -8067,6 +8127,81 @@ public:
 	    adjusted_symbol_address =
 	      maybe_adjust_var_sym_address(symbol_address);
 	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
+
+	    // the ksymtab.addr might not be a match in .symtab.addr (e.g. if
+	    // CFI is enabled. We can still lookup the symbol by name in the
+	    // symtab.
+	    if (!symbol)
+	      {
+		ABG_ASSERT(read_int_from_array_of_bytes(
+		    &bytes[entry_offset
+			   + symbol_value_size], // advance to ksymtab.name
+		    symbol_value_size, is_big_endian, symbol_address));
+
+		// the ksymtab.name entry has a corresponding entry in
+		// ksymtab_strings that we now discover.
+
+		Elf_Data* ksymtab_strings
+		    = elf_getdata(find_ksymtab_strings_section(), 0);
+		char* strings
+		    = reinterpret_cast<char*>(ksymtab_strings->d_buf);
+		GElf_Shdr section_header;
+		gelf_getshdr(find_ksymtab_strings_section(), &section_header);
+
+		GElf_Addr ksymtab_strings_offset
+		    = symbol_address - section_header.sh_addr;
+		if (ksymtab_strings_offset < section_header.sh_size)
+		  {
+		    const std::string& name
+			= strings + symbol_address - section_header.sh_addr;
+
+		    // now we got the name, let's find a match in the function
+		    // or variable symbols. We also need to discover the actual
+		    // symbol address in .symtab to discover the dwarf
+		    // information later. For that, we iterate
+		    // (fun|var)_addr_sym maps to find a match. That is overly
+		    // expensive and can be optimized: TODO: use a proper
+		    // lookup map.
+
+		    addr_elf_symbol_sptr_map_type::const_iterator I, E;
+		    string_elf_symbols_map_type::const_iterator it
+			= fun_syms().find(name);
+		    if (it != fun_syms().end())
+		      {
+			symbol = it->second[0];
+			for (I = fun_addr_sym_map().begin(),
+			    E = fun_addr_sym_map().end();
+			     I != E; ++I)
+			  {
+			    if (I->second->get_name() == name)
+			      {
+				adjusted_symbol_address = I->first;
+			      }
+			  }
+		      }
+
+		    // we continue our search in the variable maps ...
+
+		    if (!symbol)
+		      {
+			it = var_syms().find(name);
+			if (it != var_syms().end())
+			  {
+			    symbol = it->second[0];
+			    for (I = var_addr_sym_map().begin(),
+				E = var_addr_sym_map().end();
+				 I != E; ++I)
+			      {
+				if (I->second->get_name() == name)
+				  {
+				    adjusted_symbol_address = I->first;
+				  }
+			      }
+			  }
+		      }
+		  }
+	      }
+
 	    if (!symbol)
 	      // This must be a symbol that is of type neither FUNC
 	      // (function) nor OBJECT (variable).  There are for intance,
-- 
2.24.0.393.g34dc348eaf-goog