This is the mail archive of the
elfutils-devel@sourceware.org
mailing list for the elfutils project.
Re: [patch 2/2 v2] Fix loading core files without build-ids
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Mon, 27 May 2013 16:05:46 +0200
- Subject: Re: [patch 2/2 v2] Fix loading core files without build-ids
On Thu, 2013-05-23 at 21:15 +0200, Jan Kratochvil wrote:
> commit 457b771b842a0f6d23118e4b39d93f6581fb9b41
> Author: Jan Kratochvil <jan.kratochvil@redhat.com>
> Date: Thu May 23 20:19:16 2013 +0200
>
> Use DT_DEBUG library search first.
>
> libdwfl/
> 2013-05-23 Jan Kratochvil <jan.kratochvil@redhat.com>
>
> * argp-std.c (parse_opt) <ARGP_KEY_SUCCESS> <opt->core> <opt->e>: Set
> executable_for_core before calling dwfl_core_file_report.
OK.
> * core-file.c (free_r_debug_info): New function.
Since it doesn't actually free the struct itself, lets call it
clear_r_debug_info to avoid confusion.
> (dwfl_core_file_report): Move raw segments reporting
> lower. New variable r_debug_info.
> (dwfl_core_file_report) (report_module): New function.
> (dwfl_core_file_report): Call also report_module for EXEC_VADDR and
> VDSO_L_LD. Call free_r_debug_info in the end. Return sum of LISTED
> and SNIFFED.
OK, but see comments below.
> * dwfl_module_build_id.c (check_notes): Move into
> __libdwfl_find_elf_build_id.
> (__libdwfl_find_build_id): Rename to ...
> (__libdwfl_find_elf_build_id): ... here. Add parameters build_id_bits,
> build_id_elfaddr and build_id_len. Verify MOD vs. ELF.
> (__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
> set, rename data_vaddr to data_elfaddr. Do not call found_build_id.
> (__libdwfl_find_elf_build_id): Update the check_notes caller, do not
> adjust its data_elfaddr parameter.
> (__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
OK.
> * dwfl_segment_report_module.c (dwfl_segment_report_module): New
> parameter r_debug_info. New variable name_is_final. Adjust addresses
> according to R_DEBUG_INFO->MODULE. Check conflicts against DWFL.
> Do not overwise NAME by SONAME if NAME_IS_FINAL.
s/overwise/overwrite/
Also see some questions below.
> * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
> (struct r_debug_info_module, struct r_debug_info): New definitions.
> (dwfl_segment_report_module, dwfl_link_map_report): Add parameter
> r_debug_info.
OK.
> * link_map.c: Include fcntl.h.
> (report_r_debug): Add parameter r_debug_info, describe them in the
> function comment. Delete dwfl_addrmodule call and its dependent code.
> Verify build-id before calling dwfl_report_elf, also supply
> executable_for_core to it and skip vDSO. Report vdso_bias and
> vdso_l_ld when found. Clear exec_vaddr when found. Store
> r_debug_info->module info when appropriate.
> (dwfl_link_map_report): Add parameter r_debug_info.
> (dwfl_link_map_report) (consider_phdr): Add parameter offset. Set
> r_debug_info.exec_vaddr when found.
> (dwfl_link_map_report): New variable in_ok. Try to read IN from
> EXECUTABLE_FOR_CORE. Update consider_phdr caller parameters. Update
> report_r_debug caller parameters.
Was a bit hard to follow since I didn't know much about the link-map,
and there is no real documentation it seems except the glibc sources.
But in general looks OK. See some questions below.
> tests/
> 2013-05-23 Jan Kratochvil <jan.kratochvil@redhat.com>
>
> * Makefile.am (EXTRA_DIST): Add test-core-lib.so.bz2,
> test-core.core.bz2 and test-core.exec.bz2.
> * run-addrname-test.sh: New test for these files.
> * run-unstrip-n.sh: Update expected output. New test for these files.
> * test-core-lib.so.bz2: New file.
> * test-core.core.bz2: New file.
> * test-core.exec.bz2: New file.
The changes in test results now make sense to me. Reporting the full
file names was arguably wrong to begin with. I don't think we should
restore that, but if people do feel this is a regression we could add a
guesser to unstrip.c itself (but I don't think we should).
> int
> dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
> {
> @@ -431,33 +444,85 @@ dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
> /* Now we have NT_AUXV contents. From here on this processing could be
> used for a live process with auxv read from /proc. */
>
> + struct r_debug_info r_debug_info;
> + memset (&r_debug_info, 0, sizeof r_debug_info);
> int listed = dwfl_link_map_report (dwfl, auxv, auxv_size,
> - dwfl_elf_phdr_memory_callback, elf);
> + dwfl_elf_phdr_memory_callback, elf,
> + &r_debug_info);
> +
> + /* Call dwfl_segment_report_module, update *NDXP when appropriate. Return
> + true for a successfully reported module. Otherwise return false and also
> + call free_r_debug_info. */
> +
> + inline bool report_module (int *ndxp, int *count, const char *name)
Document that count gets updated on success.
The clearing of r_debug_info on failure is a nice because it prevents
accidental memory leaks, but also confused me on first read of the patch
(because I forgot the comment when reading the later code).
> /* We return the number of modules we found if we found any.
> If we found none, we return -1 instead of 0 if there was an
> - error rather than just nothing found. If link_map handling
> - failed, we still have the sniffed modules. */
> - return sniffed == 0 || listed > sniffed ? listed : sniffed;
> + error rather than just nothing found. */
> + return sniffed || listed >= 0 ? listed + sniffed : listed;
What about the case of sniffed == 0 && listed == 0?
Should that return -1?
> diff --git a/libdwfl/dwfl_segment_report_module.c b/libdwfl/dwfl_segment_report_module.c
> index 7cf7499..e2d0337 100644
> --- a/libdwfl/dwfl_segment_report_module.c
> +++ b/libdwfl/dwfl_segment_report_module.c
> @@ -83,7 +83,8 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const char *name,
> Dwfl_Memory_Callback *memory_callback,
> void *memory_callback_arg,
> Dwfl_Module_Callback *read_eagerly,
> - void *read_eagerly_arg)
> + void *read_eagerly_arg,
> + const struct r_debug_info *r_debug_info)
> {
> size_t segment = ndx;
>
> @@ -433,6 +434,43 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const char *name,
>
> dyn_vaddr += bias;
>
> + /* NAME found from link map has precedence over DT_SONAME possibly read
> + below. */
> + bool name_is_final = false;
> +
> + /* Try to match up DYN_VADDR against L_LD as found in link map.
> + Segments sniffing may guess invalid address as the first read-only memory
> + mapping may not be dumped to the core file (if ELF headers are not dumped)
> + and the ELF header is dumped first with the read/write mapping of the same
> + file at higher addresses. */
> + if (r_debug_info != NULL)
> + for (const struct r_debug_info_module *module = r_debug_info->module;
> + module != NULL; module = module->next)
> + if (module_start <= module->l_ld && module->l_ld < module_end)
> + {
> + GElf_Addr fixup = module->l_ld - dyn_vaddr;
> + if (module_start + fixup <= module->l_ld
> + && module->l_ld < module_end + fixup)
> + {
> + module_start += fixup;
> + module_end += fixup;
> + dyn_vaddr += fixup;
> + bias += fixup;
> + if (module->name[0] != '\0')
> + {
> + name = module->name;
> + name_is_final = true;
> + }
> + break;
> + }
> + }
fixup might need a comment. It wasn't immediately clear to me. The fixup
is the difference between the PT_LOAD address that the link-map know is
the correct one and the PT_DYNAMIC address (of which there can be only
one). And module_start points to the PT_LOAD address that was just
gotten from the phdrs. Right? Actually now that I do understand, that is
kind of what the comment above already says. I just didn't immediately
made the connection with the offset/fixup being static. Is fixup always
positive?
BTW. Why does link-map set module->name to the empty string when no name
(NULL) was found? Just convenience like here, so no NULL check is
needed, or is the empty name significant?
> diff --git a/libdwfl/link_map.c b/libdwfl/link_map.c
> @@ -352,9 +355,108 @@ report_r_debug (uint_fast8_t elfclass, uint_fast8_t elfdata,
> /* We have to find the file's phdrs to compute along with l_addr
> what its runtime address boundaries are. */
>
> - // XXX hook for sysroot
> - mod = INTUSE(dwfl_report_elf) (dwfl, basename (name),
> - name, -1, l_addr, true);
> + Dwfl_Module *mod = NULL;
> + if (iterations == 1 && dwfl->executable_for_core != NULL)
> + {
> + /* Find the main executable.
> + FIXME: Try to also find it via build-id. */
> + name = dwfl->executable_for_core;
> + }
To solve that FIXME we should do somewhat the same as is done below with
the "inlined dwfl_report_elf" snippet? And we already do try to open the
executable if we cannot read the phdrs. Maybe this can be factored out a
bit? On the other hand, we are going way beyond call of duty here just
to read a somewhat incomplete/corrupt core file. So it might be lots of
work for not much gain.
> + if (iterations != 2 && name != NULL)
> + {
> + /* Find a shared library or main executable. Do not try to
> + find a file for vDSO (where ITERATIONS equals 2). */
I couldn't find where the iteration == 1 => exe, iteration == 2 => vDSO
comes from. Shouldn't the second check be iterations > 2 ?
> + /* This code is mostly inlined dwfl_report_elf. */
> + // XXX hook for sysroot
> + int fd = open64 (name, O_RDONLY);
> + if (fd >= 0)
> + {
> + Elf *elf;
> + Dwfl_Error error = __libdw_open_file (&fd, &elf, true, false);
> + if (error == DWFL_E_NOERROR)
> + {
> + const void *build_id_bits;
> + GElf_Addr build_id_elfaddr;
> + int build_id_len;
> + bool valid = true;
> +
> + /* FIXME: Bias L_ADDR should be computed from the prelink
> + state in memory (when the file got loaded), not against
> + the current on-disk file state as is computed below.
> +
> + This verification gives false positive if in-core ELF had
> + build-id but on-disk ELF does not have any. But we cannot
> + reliably find ELF header and/or the ELF build id just from
> + the link map (and checking core segments is also not
> + reliable). */
> +
> + if (__libdwfl_find_elf_build_id (NULL, elf, &build_id_bits,
> + &build_id_elfaddr,
> + &build_id_len) > 0
> + && build_id_elfaddr != 0)
> + {
> + GElf_Addr build_id_vaddr = build_id_elfaddr + l_addr;
> + release_buffer (0);
> + int segndx = INTUSE(dwfl_addrsegment) (dwfl,
> + build_id_vaddr,
> + NULL);
> + if (! (*memory_callback) (dwfl, segndx,
> + &buffer, &buffer_available,
> + build_id_vaddr, build_id_len,
> + memory_callback_arg)
> + || memcmp (build_id_bits, buffer, build_id_len) != 0)
> + {
> + /* File has valid build-id which cannot be verified
> + in memory. */
> + valid = false;
> + }
> + }
> +
> + if (valid)
> + // XXX hook for sysroot
> + mod = __libdwfl_report_elf (dwfl, basename (name), name,
> + fd, elf, l_addr, true, true);
> + if (mod == NULL)
> + {
> + elf_end (elf);
> + close (fd);
> + }
> + }
> + }
> + }
> + if (mod != NULL && iterations == 1)
> + {
> + /* Cancel reporting of raw segments for the main executable as
> + its file has been found now. */
> + if (r_debug_info != NULL)
> + r_debug_info->exec_vaddr = 0;
> + }
> + if (iterations == 2)
> + {
> + /* vDSO is only reported to the caller, it is never searched
> + for on the disk. */
> + assert (mod == NULL);
> + if (r_debug_info != NULL)
> + {
> + r_debug_info->vdso_bias = l_addr;
> + r_debug_info->vdso_l_ld = l_ld;
> + }
> + }
> + if (r_debug_info != NULL && mod == NULL)
> + {
> + /* Save link map information about valid shared library (or
> + executable) which has not been found on disk. */
> + const char *base = name == NULL ? "" : basename (name);
> + struct r_debug_info_module *module;
> + module = malloc (sizeof (*module) + strlen (base) + 1);
> + if (module == NULL)
> + return release_buffer (result);
> + module->l_ld = l_ld;
> + strcpy (module->name, base);
> + module->next = r_debug_info->module;
> + r_debug_info->module = module;
> + }
Should this last if condition have iterations > 2 && ... ?
Or do you want the executable and vdso also in the r_debug_info?
> @@ -668,8 +773,65 @@ dwfl_link_map_report (Dwfl *dwfl, const void *auxv, size_t auxv_size,
> .d_size = phnum * phent,
> .d_buf = NULL
> };
> - if ((*memory_callback) (dwfl, phdr_segndx, &in.d_buf, &in.d_size,
> - phdr, phnum * phent, memory_callback_arg))
> + bool in_ok = (*memory_callback) (dwfl, phdr_segndx, &in.d_buf,
> + &in.d_size, phdr, phnum * phent,
> + memory_callback_arg);
> + if (! in_ok && dwfl->executable_for_core != NULL)
> + {
> + /* AUXV -> PHDR -> DYNAMIC
> + Both AUXV and DYNAMIC should be always present in a core file.
> + PHDR may be missing in core file, try to read it from
> + EXECUTABLE_FOR_CORE to find where DYNAMIC is located in the
> + core file. */
> +
> + int fd = open (dwfl->executable_for_core, O_RDONLY);
> + Elf *elf;
> + Dwfl_Error error = DWFL_E_ERRNO;
> + if (fd != -1)
> + error = __libdw_open_file (&fd, &elf, true, false);
> + if (error != DWFL_E_NOERROR)
> + {
> + __libdwfl_seterrno (error);
> + return false;
> + }
> + GElf_Ehdr ehdr_mem, *ehdr = gelf_getehdr (elf, &ehdr_mem);
> + if (ehdr == NULL)
> + {
> + elf_end (elf);
> + close (fd);
> + __libdwfl_seterrno (DWFL_E_LIBELF);
> + return false;
> + }
> + if (ehdr->e_phnum != phnum || ehdr->e_phentsize != phent)
> + {
> + elf_end (elf);
> + close (fd);
> + __libdwfl_seterrno (DWFL_E_BADELF);
> + return false;
> + }
> + off_t off = ehdr->e_phoff;
> + assert (in.d_buf == NULL);
> + assert (in.d_size == phnum * phent);
> + in.d_buf = malloc (in.d_size);
> + if (unlikely (in.d_buf == NULL))
> + {
> + elf_end (elf);
> + close (fd);
> + __libdwfl_seterrno (DWFL_E_NOMEM);
> + return false;
> + }
> + ssize_t nread = pread_retry (fd, in.d_buf, in.d_size, off);
> + elf_end (elf);
> + close (fd);
> + if (nread != (ssize_t) in.d_size)
> + {
> + free (in.d_buf);
> + __libdwfl_seterrno (DWFL_E_ERRNO);
> + return false;
> + }
> + in_ok = true;
> + }
This looks right. But what a lot of extra work just so some parts can be
left out of a core file. And we better hope the given executable really
matches the core file.
Even though I had some questions above I do think in general this looks
OK.
Thanks,
Mark