Bug 29257 - Double free of demangled symbol name
Summary: Double free of demangled symbol name
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 10.1
: P2 normal
Target Milestone: 14.1
Assignee: Tom Tromey
URL:
Keywords:
: 30339 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-06-16 16:41 UTC by Magne Hov
Modified: 2023-04-13 20:00 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
test patch (1.03 KB, patch)
2022-12-06 19:07 UTC, Tom Tromey
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Magne Hov 2022-06-16 16:41:58 UTC
We've captured a GDB crash where a demangled C++ symbol name is freed twice - double free or corruption (fasttop):


#0  0x00007f270ffcdc8e in abort () from /tmp/undodb.639706.1655366126.1865206.f7b28b36e39ecdfc/debuggee-1-il8cy4y9/symbol-files/lib64/libc.so.6
#1  0x00007f271003d057 in __libc_message () from /tmp/undodb.639706.1655366126.1865206.f7b28b36e39ecdfc/debuggee-1-il8cy4y9/symbol-files/lib64/libc.so.6
#2  0x00007f27100441bc in malloc_printerr () from /tmp/undodb.639706.1655366126.1865206.f7b28b36e39ecdfc/debuggee-1-il8cy4y9/symbol-files/lib64/libc.so.6
#3  0x00007f2710045c04 in _int_free () from /tmp/undodb.639706.1655366126.1865206.f7b28b36e39ecdfc/debuggee-1-il8cy4y9/symbol-files/lib64/libc.so.6
#4  0x00000000008f8a59 in xfree<char> (ptr=0x5daaff0 "\300\304\070\a") at .././gdb-10.2/gdb/../gdbsupport/common-utils.h:62
#5  gdb::xfree_deleter<char>::operator() (this=<synthetic pointer>, ptr=0x5daaff0 "\300\304\070\a") at .././gdb-10.2/gdb/../gdbsupport/gdb_unique_ptr.h:34
#6  std::unique_ptr<char, gdb::xfree_deleter<char> >::~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimised out>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/unique_ptr.h:274
#7  general_symbol_info::compute_and_set_names (this=this@entry=0x7f26ebcd0808, linkage_name=..., copy_name=copy_name@entry=false, per_bfd=0x5742db0, hash=...)
    at .././gdb-10.2/gdb/symtab.c:881
#8  0x000000000079ad85 in minimal_symbol_reader::<lambda(minimal_symbol*, minimal_symbol*)>::operator() (end=<optimised out>, start=<optimised out>, __closure=<synthetic pointer>)
    at .././gdb-10.2/gdb/symtab.h:421
#9  gdb::parallel_for_each<minimal_symbol*, minimal_symbol_reader::install()::<lambda(minimal_symbol*, minimal_symbol*)> > (callback=..., last=<optimised out>, first=<optimised out>)
    at .././gdb-10.2/gdb/../gdbsupport/parallel-for.h:76
#10 minimal_symbol_reader::install (this=this@entry=0x7ffdedc85590) at .././gdb-10.2/gdb/minsyms.c:1409
#11 0x00000000006ad0a2 in elf_read_minimal_symbols (symfile_flags=8, ei=0x7ffdedc85570, objfile=0x82e8980) at .././gdb-10.2/gdb/elfread.c:1175
#12 elf_symfile_read (objfile=0x82e8980, symfile_flags=...) at .././gdb-10.2/gdb/elfread.c:1216
#13 0x00000000008f0c95 in read_symbols (objfile=0x82e8980, add_flags=...) at .././gdb-10.2/gdb/symfile.c:782
#14 0x00000000008f0595 in syms_from_objfile_1 (add_flags=..., addrs=0x7ffdedc856f0, objfile=0x82e8980) at .././gdb-10.2/gdb/symfile.c:979
#15 syms_from_objfile (add_flags=..., addrs=<optimised out>, objfile=0x82e8980) at .././gdb-10.2/gdb/symfile.c:996
#16 symbol_file_add_with_addrs (abfd=<optimised out>, name=0x3b21d00 ..., add_flags=..., addrs=<optimised out>, flags=..., parent=<optimised out>)
    at .././gdb-10.2/gdb/symfile.c:1099
#17 0x00000000008c9aa4 in solib_read_symbols (so=0x3b21af0, flags=...) at .././gdb-10.2/gdb/solib.c:688
#18 0x00000000008ca5d2 in solib_add (pattern=pattern@entry=0x0, from_tty=from_tty@entry=0, readsyms=1) at .././gdb-10.2/gdb/solib.c:993
#19 0x00000000008ca769 in handle_solib_event () at .././gdb-10.2/gdb/solib.c:1266


I've been trying to understand how interning of demangled symbol names works and I'm still learning, so apologies if this report isn't very precise.

During this crash, in minimal_symbol_reader::install I can see that we're finished collecting, sorting and compacting our list of existing and new minimal symbols. In my instance it looks like we have 9673 minsyms already defined for this objfile, and we're about to potentially add 8509 more. We're in the point where we compute and insert demangled names in parallel in gdb::parallel_for_each (though the machine has decided to use a single thread in this instance, so the main thread is doing all of the work).

We're crashing in the second phase of the parallel workload. general_symbol_info::compute_and_set_names has just finished and the demangled name that is managed by the following unique_xmalloc_ptr is released: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/symtab.c;hb=ce35d7163e779b1321058b22f005c70ce1524b25#l880.

It turns out that the same instance of this demangled name is assigned to multiple symbols. These symbols have identical mangled and demangled names, but they come from different source files and have different addresses.

At the time of crash, the first of these similar symbols I can trace back as being one of the existing symbols for this objfile. I.e we've called elf_symfile_read on this object before and we loaded a bunch of symbols then. At that point the demangled name got allocated from libc heap while first computing the demangled name. Then, because the demangled name hadn't been encountered yet, it was std::move'ed into the hash table that contains the demangled names.

Fair enough, we allocate the name while demangling it, and the ownership of that storage is transferred to the hash table slot.

Now - the second time we're doing elf_symfile_read for this object, we're again considering this symbol which was already interned, and this time the symbol already has had a demangled name assigned to, so we don't allocate a new name for it. The demangled name is already present in the hash table, so we don't go through the effort of allocating a new hash table slot for the demangled name and we also don't end up moving the storage for the demangled name into a new hash slot.

Because of this, the unique_xmalloc_ptr is still the rightful owner of this demangled name, so at the end of general_symbol_info::compute_and_set_names it decides to free that storage when it goes out of scope. But wait - since this is a symbol that is being re-examined this demangled name is the exact same one that was previously inserted into the hash table! We go ahead freeing it - and when the next pre-existing symbol that has the same demangled name is being considered we try to free the same name again - leading to our crash.

general_symbol_info::compute_and_set_names seems to do exactly one of these:
1. transfer the storage of the demangled name into the hash table
2. let the unique ptr free the storage on scope exit

I think the rationale behind doing this is to allow multiple identical demangled names to be allocated upfront while ensuring that only one unique name is interned into the hash table. Subsequent duplicates will find the existing hash table slot, will not have their storage transferred into the hash table and so will be freed by the unique ptr.

However, I'm not sure that this takes into account the fact that, for existing symbols, the demangled name may actually already be in the hashtable instead of having been allocated by symbol_find_demangled_name.

We're seeing this problem with our Time-Travel debugger UDB, which will move around in execution history and might make it seem like the same library is being loaded multiple times. Is it otherwise normal to have elf_symfile_read called on the same object multiple times? 

Note that I have a recording of this failure so I can produce more details if required. I only have this failure captured for GDB 10.2.
Comment 1 Magne Hov 2022-06-16 18:07:40 UTC
Found some extra details to add.

- The first time we're reading the symbols for this object we lookup separate debug info via Build ID.
- libdebuginfod is used to download a debug info file which is then passed to symfile_bfd_open which throws an error due to failing bfd_check_format.

So we manage to read in symbols from the executable file, but we error out while trying to read in debug info from a separate debug info file. I assume that this means we skip some of the initialisation of the shared library. Not sure whether that's part of this issue.

I'll try to cook up a smaller reproducer for this.
Comment 2 Tom Tromey 2022-06-18 16:29:43 UTC
The main question I have is why re-reading minimal symbols happens.
I guess this 'if' fails to trigger:

  if (objfile->per_bfd->minsyms_read
      && ei->stabsect == NULL
      && ei->mdebugsect == NULL
      && ei->ctfsect == NULL)
    {
      if (symtab_create_debug)
	gdb_printf (gdb_stdlog,
		    "... minimal symbols previously read\n");
      return;
    }

... but for which reason exactly?

That would help make a reproducer, I guess.
You could probably trigger it by making a new inferior
and using 'file' to open the same ELF file again.
Comment 3 Magne Hov 2022-06-23 13:53:43 UTC
Turns out that this is caused by a truncated separate debug info file provided by debuginfod.

---

Reproduction details:

------------------------
// lib.cpp:
void foo(void) {}
------------------------
// main.cpp:
void foo(void);
int main(void) { foo(); return 0; }
------------------------
# Makefile
all:
        c++ -shared lib.cpp -o libtest.so
        c++ -L. main.cpp -ltest -o test
------------------------

Check the Build ID of libtest.so and populate an empty debuginfo file in the debuginfod cache:

$ mkdir -p /home/mhov/.cache/debuginfod_client/cac153df26d5e817739e1b234746cb368da05664/
$ touch /home/mhov/.cache/debuginfod_client/cac153df26d5e817739e1b234746cb368da05664/debuginfo

Now launch the program in GDB and see that it fail to load the separate debug info file. Do a set solib-search-path to trigger a reload of the libraries, and see glibc complain:

$ LD_LIBRARY_PATH=. DEBUGINFOD_URLS=foo gdb --silent ./test
Reading symbols from ./test...
Download failed: No route to host.  Continuing without debug info for /home/mhov/work/GDB29257/./test.
(No debugging symbols found in ./test)
(gdb) start
Temporary breakpoint 1 at 0x1151
Starting program: /home/mhov/work/GDB29257/test
Download failed: No route to host.  Continuing without debug info for /home/mhov/work/GDB29257/system-supplied DSO at 0x7ffff7fce000.
Error while reading shared library symbols for ./libtest.so:
`/home/mhov/.cache/debuginfod_client/cac153df26d5e817739e1b234746cb368da05664/debuginfo': can't read symbols: file format not recognized.

Temporary breakpoint 1, 0x0000555555555151 in main ()
(gdb) set solib-search-path
Reading symbols from ./libtest.so...
Error while reading shared library symbols for ./libtest.so:
`/home/mhov/.cache/debuginfod_client/cac153df26d5e817739e1b234746cb368da05664/debuginfo': can't read symbols: file format not recognized.
Reading symbols from ./libtest.so...
free(): double free detected in tcache 2
Aborted (core dumped)

Sometime I must run set solib-search-path multiple times, and glibc reports different failure modes.

---

Analysis:

When opening a separate debug file via Build ID we normally verify its build id after opening it:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/build-id.c;hb=ce35d7163e779b1321058b22f005c70ce1524b25#l107

This is part of `find_separate_debug_file_by_buildid`, so if it returns a valid string (path) we can be fairly sure that the object is somewhat valid and it's safe to pass it to `symfile_bfd_open`.

When a build id file has been provided by debuginfod, however, we don't use `find_separate_debug_file_by_buildid`, and we first check it with `build_id_verify` after already having tried to open it with `symfile_bfd_open`.

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/elfread.c;hb=ce35d7163e779b1321058b22f005c70ce1524b25#l1329


This means that the failure modes of a truncated separate debug info file are different depending on whether the debug info file was found "locally" or via the debuginfod cache.

- For the "local" case the failure mode is that `find_separate_debug_file_by_buildid` returns an empty string.
- For the debuginfod case the failure mode is that `symfile_bfd_open` raises an exception.

The exception is caught here:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/solib.c;hb=ce35d7163e779b1321058b22f005c70ce1524b25#l696

That means that we got interrupted at this point in the sequence of loading symbols:

```
solib_read_symbols
    ...
    read_symbols
        elf_symfile_read
            elf_read_minimal_symbols  (this function completes fully)
            symfile_bfd_open          (for separate debug file fetched by debuginfod)
                throw                 (exception is thrown due to truncated separate debug file)

        (this is where minsyms_read would have been set, but it's skipped by the non-local exit)
    catch exception
```


Since elf_read_minimal_symbols completed successfully we have in fact read in minimal symbols. However, the exception that is caused by the truncated separate debug file prevents read_symbols from tracking this in objfile->per_bfd->minsyms_read.

---

Ideas of ways to fix this:

- Change the order of `symbfile_bfd_open` and `build_id_verify` for files provided from debuginfod. This would prevent `symfile_bfd_open` from being called with a corrupt file and thus prevent the exception from being raised.
- Catch this kind of exception when trying to open the file provided from debuginfod.
- Mark minimal symbols as read (objfile->per_bfd->minsyms_read = true) as soon as elf_read_minimal_symbols has completed. I'm not confident about this. There could be some other state that must also have been completed before we want to set this to true?
Comment 4 Magne Hov 2022-06-23 14:05:53 UTC
Presumably, an empty debuginfo file could end up in the debuginfod cache if you interrupt a download at the right time?
Comment 5 Aaron Merey 2022-06-23 16:43:05 UTC
Hi Magne, which version of elfutils are you using? I was only able to reproduce this using an older package (elfutils-debuginfod-client-0.186-1.fc35.x86_64) that uses 000-permission files to represent negative-cache-hits. When you used touch to create the file it didn't have 000 permissions, so debuginfod returns an empty file to gdb as if it's valid. Without manually adding the cache file this situation should not occur.

More recent versions of debuginfod use empty files to represent negative-cache-hits. In this case I can't reproduce the bug because debuginfod correctly doesn't return the empty file to gdb.

(It looks like F36 contains an update version of elfutils-debuginfod-client but F35 still provides an older 000-permisssion build, we should probably update F35's elfutils-debuginfod-client.)
Comment 6 Magne Hov 2022-06-24 16:16:20 UTC
Hi Aaron, 

Thanks for your reply. We're using elfutils-0.186 and it sounds like debuginfodclient switching over to using empty files as miss markers would explain why we see this issue. If mixing use of GDB built with 0.186 and 0.187 then the former will interpret the cache misses from the latter as valid entries.

Sticking to only using one version of debuginfodclient would solve this, but I think there might still be an argument for patching GDB to deal better with being given an empty debug info file.

For our use case we might want to ensure that we're using a custom DEBUGINFOD_CACHE_PATH as we often run on machines alongside other builds of GDB.
Comment 7 Tom Tromey 2022-12-06 18:42:07 UTC
> - Mark minimal symbols as read (objfile->per_bfd->minsyms_read = true) as soon as elf_read_minimal_symbols has completed. I'm not confident about this. There could be some other state that must also have been completed before we want to set this to true?

I think the problem with this is that minsyms are implemented weirdly.
There may be multiple sources of minsyms in a given objfile.
Now, probably how this should be done is each reader should make
its own object.  However, what's actually done is that they all
write to the same basic data structure, so it isn't marked as
completed until the very end.
Comment 8 Tom Tromey 2022-12-06 19:01:53 UTC
It looks like the debuginfod code thinks that symfile_bfd_open
won't throw but will instead return null:

		  gdb_bfd_ref_ptr debug_bfd (symfile_bfd_open (symfile_path.get ()));

		  if (debug_bfd == nullptr)
		    warning (_("File \"%s\" from debuginfod cannot be opened as bfd"),
			     filename);

... but that's wrong.

I have a patch to catch the exception.
Comment 9 Tom Tromey 2022-12-06 19:07:44 UTC
Created attachment 14485 [details]
test patch

Could you try this patch?
Comment 10 Aaron Merey 2022-12-07 20:55:58 UTC
(In reply to Tom Tromey from comment #9)
> Created attachment 14485 [details]
> test patch
> 
> Could you try this patch?

LGTM
Comment 11 Magne Hov 2022-12-09 17:55:57 UTC
Thanks Tom, your patch fixes the issue on my machine when applied on top of current master.
Comment 13 Tom Tromey 2023-04-13 17:46:13 UTC
*** Bug 30339 has been marked as a duplicate of this bug. ***
Comment 14 Sourceware Commits 2023-04-13 20:00:21 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f96328accde1e6302b62aa880675594618079cb3

commit f96328accde1e6302b62aa880675594618079cb3
Author: Tom Tromey <tromey@adacore.com>
Date:   Tue Dec 6 12:07:12 2022 -0700

    Avoid double-free with debuginfod
    
    PR gdb/29257 points out a possible double free when debuginfod is in
    use.  Aside from some ugly warts in the symbol code (an ongoing
    issue), the underlying issue in this particular case is that elfread.c
    seems to assume that symfile_bfd_open will return NULL on error,
    whereas in reality it throws an exception.  As this code isn't
    prepared for an exception, bad things result.
    
    This patch fixes the problem by introducing a non-throwing variant of
    symfile_bfd_open and using it in the affected places.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29257
Comment 15 Tom Tromey 2023-04-13 20:00:48 UTC
Fixed.