This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [patch 2/2 v2] Fix loading core files without build-ids

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Mon, 27 May 2013 16:05:46 +0200
Subject: Re: [patch 2/2 v2] Fix loading core files without build-ids
On Thu, 2013-05-23 at 21:15 +0200, Jan Kratochvil wrote:
> commit 457b771b842a0f6d23118e4b39d93f6581fb9b41
> Author: Jan Kratochvil <jan.kratochvil@redhat.com>
> Date:   Thu May 23 20:19:16 2013 +0200
> 
>     Use DT_DEBUG library search first.
>     
>     libdwfl/
>     2013-05-23  Jan Kratochvil  <jan.kratochvil@redhat.com>
>     
>     	* argp-std.c (parse_opt) <ARGP_KEY_SUCCESS> <opt->core> <opt->e>: Set
>     	executable_for_core before calling dwfl_core_file_report.

OK.

>     	* core-file.c (free_r_debug_info): New function.

Since it doesn't actually free the struct itself, lets call it
clear_r_debug_info to avoid confusion.

>     	(dwfl_core_file_report): Move raw segments reporting
>     	lower.  New variable r_debug_info.
>     	(dwfl_core_file_report) (report_module): New function.
>     	(dwfl_core_file_report):  Call also report_module for EXEC_VADDR and
>     	VDSO_L_LD.  Call free_r_debug_info in the end.  Return sum of LISTED
>     	and SNIFFED.

OK, but see comments below.

>     	* dwfl_module_build_id.c (check_notes): Move into
>     	__libdwfl_find_elf_build_id.
>     	(__libdwfl_find_build_id): Rename to ...
>     	(__libdwfl_find_elf_build_id): ... here.  Add parameters build_id_bits,
>     	build_id_elfaddr and build_id_len.  Verify MOD vs. ELF.
>     	(__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
>     	set, rename data_vaddr to data_elfaddr.  Do not call found_build_id.
>     	(__libdwfl_find_elf_build_id): Update the check_notes caller, do not
>     	adjust its data_elfaddr parameter.
>     	(__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.

OK.

>     	* dwfl_segment_report_module.c (dwfl_segment_report_module): New
>     	parameter r_debug_info.  New variable name_is_final.  Adjust addresses
>     	according to R_DEBUG_INFO->MODULE.  Check conflicts against DWFL.
>     	Do not overwise NAME by SONAME if NAME_IS_FINAL.

s/overwise/overwrite/
Also see some questions below.

>     	* libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
>     	(struct r_debug_info_module, struct r_debug_info): New definitions.
>     	(dwfl_segment_report_module, dwfl_link_map_report): Add parameter
>     	r_debug_info.

OK.

>     	* link_map.c: Include fcntl.h.
>     	(report_r_debug): Add parameter r_debug_info, describe them in the
>     	function comment.  Delete dwfl_addrmodule call and its dependent code.
>     	Verify build-id before calling dwfl_report_elf, also supply
>     	executable_for_core to it and skip vDSO.  Report vdso_bias and
>     	vdso_l_ld when found.  Clear exec_vaddr when found.  Store
>     	r_debug_info->module info when appropriate.
>     	(dwfl_link_map_report): Add parameter r_debug_info.
>     	(dwfl_link_map_report) (consider_phdr): Add parameter offset.  Set
>     	r_debug_info.exec_vaddr when found.
>     	(dwfl_link_map_report): New variable in_ok.  Try to read IN from
>     	EXECUTABLE_FOR_CORE.  Update consider_phdr caller parameters.  Update
>     	report_r_debug caller parameters.

Was a bit hard to follow since I didn't know much about the link-map,
and there is no real documentation it seems except the glibc sources.
But in general looks OK. See some questions below.

>     tests/
>     2013-05-23  Jan Kratochvil  <jan.kratochvil@redhat.com>
>     
>     	* Makefile.am (EXTRA_DIST): Add test-core-lib.so.bz2,
>     	test-core.core.bz2 and test-core.exec.bz2.
>     	* run-addrname-test.sh: New test for these files.
>     	* run-unstrip-n.sh: Update expected output.  New test for these files.
>     	* test-core-lib.so.bz2: New file.
>     	* test-core.core.bz2: New file.
>     	* test-core.exec.bz2: New file.

The changes in test results now make sense to me. Reporting the full
file names was arguably wrong to begin with. I don't think we should
restore that, but if people do feel this is a regression we could add a
guesser to unstrip.c itself (but I don't think we should).

>  int
>  dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
>  {
> @@ -431,33 +444,85 @@ dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
>    /* Now we have NT_AUXV contents.  From here on this processing could be
>       used for a live process with auxv read from /proc.  */
>  
> +  struct r_debug_info r_debug_info;
> +  memset (&r_debug_info, 0, sizeof r_debug_info);
>    int listed = dwfl_link_map_report (dwfl, auxv, auxv_size,
> -				     dwfl_elf_phdr_memory_callback, elf);
> +				     dwfl_elf_phdr_memory_callback, elf,
> +				     &r_debug_info);
> +
> +  /* Call dwfl_segment_report_module, update *NDXP when appropriate.  Return
> +     true for a successfully reported module.  Otherwise return false and also
> +     call free_r_debug_info.  */
> +
> +  inline bool report_module (int *ndxp, int *count, const char *name)

Document that count gets updated on success.

The clearing of r_debug_info on failure is a nice because it prevents
accidental memory leaks, but also confused me on first read of the patch
(because I forgot the comment when reading the later code).
 
>    /* We return the number of modules we found if we found any.
>       If we found none, we return -1 instead of 0 if there was an
> -     error rather than just nothing found.  If link_map handling
> -     failed, we still have the sniffed modules.  */
> -  return sniffed == 0 || listed > sniffed ? listed : sniffed;
> +     error rather than just nothing found.  */
> +  return sniffed || listed >= 0 ? listed + sniffed : listed;

What about the case of sniffed == 0 && listed == 0?
Should that return -1?

> diff --git a/libdwfl/dwfl_segment_report_module.c b/libdwfl/dwfl_segment_report_module.c
> index 7cf7499..e2d0337 100644
> --- a/libdwfl/dwfl_segment_report_module.c
> +++ b/libdwfl/dwfl_segment_report_module.c
> @@ -83,7 +83,8 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const char *name,
>  			    Dwfl_Memory_Callback *memory_callback,
>  			    void *memory_callback_arg,
>  			    Dwfl_Module_Callback *read_eagerly,
> -			    void *read_eagerly_arg)
> +			    void *read_eagerly_arg,
> +			    const struct r_debug_info *r_debug_info)
>  {
>    size_t segment = ndx;
>  
> @@ -433,6 +434,43 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const char *name,
>  
>    dyn_vaddr += bias;
>  
> +  /* NAME found from link map has precedence over DT_SONAME possibly read
> +     below.  */
> +  bool name_is_final = false;
> +
> +  /* Try to match up DYN_VADDR against L_LD as found in link map.
> +     Segments sniffing may guess invalid address as the first read-only memory
> +     mapping may not be dumped to the core file (if ELF headers are not dumped)
> +     and the ELF header is dumped first with the read/write mapping of the same
> +     file at higher addresses.  */
> +  if (r_debug_info != NULL)
> +    for (const struct r_debug_info_module *module = r_debug_info->module;
> +	 module != NULL; module = module->next)
> +      if (module_start <= module->l_ld && module->l_ld < module_end)
> +	{
> +	  GElf_Addr fixup = module->l_ld - dyn_vaddr;
> +	  if (module_start + fixup <= module->l_ld
> +	      && module->l_ld < module_end + fixup)
> +	    {
> +	      module_start += fixup;
> +	      module_end += fixup;
> +	      dyn_vaddr += fixup;
> +	      bias += fixup;
> +	      if (module->name[0] != '\0')
> +		{
> +		  name = module->name;
> +		  name_is_final = true;
> +		}
> +	      break;
> +	    }
> +	}

fixup might need a comment. It wasn't immediately clear to me. The fixup
is the difference between the PT_LOAD address that the link-map know is
the correct one and the PT_DYNAMIC address (of which there can be only
one). And module_start points to the PT_LOAD address that was just
gotten from the phdrs. Right? Actually now that I do understand, that is
kind of what the comment above already says. I just didn't immediately
made the connection with the offset/fixup being static. Is fixup always
positive?

BTW. Why does link-map set module->name to the empty string when no name
(NULL) was found? Just convenience like here, so no NULL check is
needed, or is the empty name significant?

> diff --git a/libdwfl/link_map.c b/libdwfl/link_map.c
> @@ -352,9 +355,108 @@ report_r_debug (uint_fast8_t elfclass, uint_fast8_t elfdata,
>        /* We have to find the file's phdrs to compute along with l_addr
>  	 what its runtime address boundaries are.  */
>  
> -      // XXX hook for sysroot
> -      mod = INTUSE(dwfl_report_elf) (dwfl, basename (name),
> -				     name, -1, l_addr, true);
> +      Dwfl_Module *mod = NULL;
> +      if (iterations == 1 && dwfl->executable_for_core != NULL)
> +	{
> +	  /* Find the main executable.
> +	     FIXME: Try to also find it via build-id.  */
> +	  name = dwfl->executable_for_core;
> +	}

To solve that FIXME we should do somewhat the same as is done below with
the "inlined dwfl_report_elf" snippet? And we already do try to open the
executable if we cannot read the phdrs. Maybe this can be factored out a
bit? On the other hand, we are going way beyond call of duty here just
to read a somewhat incomplete/corrupt core file. So it might be lots of
work for not much gain.

> +      if (iterations != 2 && name != NULL)
> +	{
> +	  /* Find a shared library or main executable.  Do not try to
> +	     find a file for vDSO (where ITERATIONS equals 2).  */

I couldn't find where the iteration == 1 => exe, iteration == 2 => vDSO
comes from. Shouldn't the second check be iterations > 2 ?

> +	  /* This code is mostly inlined dwfl_report_elf.  */
> +	  // XXX hook for sysroot
> +	  int fd = open64 (name, O_RDONLY);
> +	  if (fd >= 0)
> +	    {
> +	      Elf *elf;
> +	      Dwfl_Error error = __libdw_open_file (&fd, &elf, true, false);
> +	      if (error == DWFL_E_NOERROR)
> +		{
> +		  const void *build_id_bits;
> +		  GElf_Addr build_id_elfaddr;
> +		  int build_id_len;
> +		  bool valid = true;
> +
> +		  /* FIXME: Bias L_ADDR should be computed from the prelink
> +		     state in memory (when the file got loaded), not against
> +		     the current on-disk file state as is computed below.
> +
> +		     This verification gives false positive if in-core ELF had
> +		     build-id but on-disk ELF does not have any.  But we cannot
> +		     reliably find ELF header and/or the ELF build id just from
> +		     the link map (and checking core segments is also not
> +		     reliable).  */
> +
> +		  if (__libdwfl_find_elf_build_id (NULL, elf, &build_id_bits,
> +						   &build_id_elfaddr,
> +						   &build_id_len) > 0
> +		      && build_id_elfaddr != 0)
> +		    {
> +		      GElf_Addr build_id_vaddr = build_id_elfaddr + l_addr;
> +		      release_buffer (0);
> +		      int segndx = INTUSE(dwfl_addrsegment) (dwfl,
> +							     build_id_vaddr,
> +							     NULL);
> +		      if (! (*memory_callback) (dwfl, segndx,
> +						&buffer, &buffer_available,
> +						build_id_vaddr, build_id_len,
> +						memory_callback_arg)
> +			  || memcmp (build_id_bits, buffer, build_id_len) != 0)
> +			{
> +			  /* File has valid build-id which cannot be verified
> +			     in memory.  */
> +			  valid = false;
> +			}
> +		    }
> +
> +		  if (valid)
> +		    // XXX hook for sysroot
> +		    mod = __libdwfl_report_elf (dwfl, basename (name), name,
> +						fd, elf, l_addr, true, true);
> +		  if (mod == NULL)
> +		    {
> +		      elf_end (elf);
> +		      close (fd);
> +		    }
> +		}
> +	    }
> +	}
> +      if (mod != NULL && iterations == 1)
> +	{
> +	  /* Cancel reporting of raw segments for the main executable as
> +	     its file has been found now.  */
> +	  if (r_debug_info != NULL)
> +	    r_debug_info->exec_vaddr = 0;
> +	}
> +      if (iterations == 2)
> +	{
> +	  /* vDSO is only reported to the caller, it is never searched
> +	     for on the disk.  */
> +	  assert (mod == NULL);
> +	  if (r_debug_info != NULL)
> +	    {
> +	      r_debug_info->vdso_bias = l_addr;
> +	      r_debug_info->vdso_l_ld = l_ld;
> +	    }
> +	}
> +      if (r_debug_info != NULL && mod == NULL)
> +	{
> +	  /* Save link map information about valid shared library (or
> +	     executable) which has not been found on disk.  */
> +	  const char *base = name == NULL ? "" : basename (name);
> +	  struct r_debug_info_module *module;
> +	  module = malloc (sizeof (*module) + strlen (base) + 1);
> +	  if (module == NULL)
> +	    return release_buffer (result);
> +	  module->l_ld = l_ld;
> +	  strcpy (module->name, base);
> +	  module->next = r_debug_info->module;
> +	  r_debug_info->module = module;
> +	}

Should this last if condition have iterations > 2 && ... ?
Or do you want the executable and vdso also in the r_debug_info?

> @@ -668,8 +773,65 @@ dwfl_link_map_report (Dwfl *dwfl, const void *auxv, size_t auxv_size,
>  	      .d_size = phnum * phent,
>  	      .d_buf = NULL
>  	    };
> -	  if ((*memory_callback) (dwfl, phdr_segndx, &in.d_buf, &in.d_size,
> -				  phdr, phnum * phent, memory_callback_arg))
> +	  bool in_ok = (*memory_callback) (dwfl, phdr_segndx, &in.d_buf,
> +					   &in.d_size, phdr, phnum * phent,
> +					   memory_callback_arg);
> +	  if (! in_ok && dwfl->executable_for_core != NULL)
> +	    {
> +	      /* AUXV -> PHDR -> DYNAMIC
> +		 Both AUXV and DYNAMIC should be always present in a core file.
> +		 PHDR may be missing in core file, try to read it from
> +		 EXECUTABLE_FOR_CORE to find where DYNAMIC is located in the
> +		 core file.  */
> +
> +	      int fd = open (dwfl->executable_for_core, O_RDONLY);
> +	      Elf *elf;
> +	      Dwfl_Error error = DWFL_E_ERRNO;
> +	      if (fd != -1)
> +		error = __libdw_open_file (&fd, &elf, true, false);
> +	      if (error != DWFL_E_NOERROR)
> +		{
> +		  __libdwfl_seterrno (error);
> +		  return false;
> +		}
> +	      GElf_Ehdr ehdr_mem, *ehdr = gelf_getehdr (elf, &ehdr_mem);
> +	      if (ehdr == NULL)
> +		{
> +		  elf_end (elf);
> +		  close (fd);
> +		  __libdwfl_seterrno (DWFL_E_LIBELF);
> +		  return false;
> +		}
> +	      if (ehdr->e_phnum != phnum || ehdr->e_phentsize != phent)
> +		{
> +		  elf_end (elf);
> +		  close (fd);
> +		  __libdwfl_seterrno (DWFL_E_BADELF);
> +		  return false;
> +		}
> +	      off_t off = ehdr->e_phoff;
> +	      assert (in.d_buf == NULL);
> +	      assert (in.d_size == phnum * phent);
> +	      in.d_buf = malloc (in.d_size);
> +	      if (unlikely (in.d_buf == NULL))
> +		{
> +		  elf_end (elf);
> +		  close (fd);
> +		  __libdwfl_seterrno (DWFL_E_NOMEM);
> +		  return false;
> +		}
> +	      ssize_t nread = pread_retry (fd, in.d_buf, in.d_size, off);
> +	      elf_end (elf);
> +	      close (fd);
> +	      if (nread != (ssize_t) in.d_size)
> +		{
> +		  free (in.d_buf);
> +		  __libdwfl_seterrno (DWFL_E_ERRNO);
> +		  return false;
> +		}
> +	      in_ok = true;
> +	    }

This looks right. But what a lot of extra work just so some parts can be
left out of a core file. And we better hope the given executable really
matches the core file.

Even though I had some questions above I do think in general this looks
OK.

Thanks,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]