Bug 29434 - Memory leak in `dwarf_getscopes`
Summary: Memory leak in `dwarf_getscopes`
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: libdw (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-01 16:07 UTC by Matt Wozniski
Modified: 2023-02-28 11:38 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2022-08-04 00:00:00


Attachments
dwarf_getscopes debug trace (1.48 KB, patch)
2022-08-29 15:28 UTC, Mark Wielaard
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Wozniski 2022-08-01 16:07:49 UTC
Found by valgrind:

==173857== 64 bytes in 2 blocks are definitely lost in loss record 3,155 of 8,232
==173857==    at 0x480B7BB: malloc (vg_replace_malloc.c:380)
==173857==    by 0x90143DC: pc_record (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==    by 0x9019ABC: walk_children (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==    by 0x901974A: __libdw_visit_scopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==    by 0x9019A69: walk_children (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==    by 0x901974A: __libdw_visit_scopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==    by 0x9014691: dwarf_getscopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)

`dwarf_getscopes` ends with:
```
  if (result > 0)
    *scopes = a.scopes;

  return result;
```

but this is incorrect, since `a.scopes` may be non-NULL even if `result` is <= 0 and is leaked in this case since no reference is retained to it. Seems like this needs to be:
```
  if (result > 0)
    *scopes = a.scopes;
  else
    free(a.scopes);

  return result;
```
Comment 1 Mark Wielaard 2022-08-04 14:30:11 UTC
Thanks. That seems correct. If in the first visit a.scopes was allocated, but an error occurs later or in the second visit a.scopes isn't freed. Except in the case of a realloc failure. It would make sense to also not dealloc in that failure case so we can always free on failure before we return the error as you suggest.

BTW. Do you know what why result was negative (I assume it was negative, it could in theory also be zero, but that would be surprising if that would cause the leak)?
Comment 2 Pablo Galindo Salgado 2022-08-04 14:43:40 UTC
> it could in theory also be zero

In this case, the result was 0 (I am working with Matt)
Comment 3 Mark Wielaard 2022-08-04 15:13:18 UTC
(In reply to Pablo Galindo Salgado from comment #2)
> > it could in theory also be zero
> 
> In this case, the result was 0 (I am working with Matt)

Interesting, I assumed it was on a failure path.
My proposed patch would have been:

diff --git a/libdw/dwarf_getscopes.c b/libdw/dwarf_getscopes.c
index 5662eecf..676d62f3 100644
--- a/libdw/dwarf_getscopes.c
+++ b/libdw/dwarf_getscopes.c
@@ -100,7 +100,7 @@ origin_match (unsigned int depth, struct Dwarf_Die_Chain *die, void *arg)
   Dwarf_Die *scopes = realloc (a->scopes, nscopes * sizeof scopes[0]);
   if (scopes == NULL)
     {
-      free (a->scopes);
+      /* a->scopes will be freed by dwarf_getscopes on error.  */
       __libdw_seterrno (DWARF_E_NOMEM);
       return -1;
     }
@@ -198,6 +198,8 @@ dwarf_getscopes (Dwarf_Die *cudie, Dwarf_Addr pc, Dwarf_Die **scopes)
 
   if (result > 0)
     *scopes = a.scopes;
+  else if (result < 0)
+    free (a.scopes);
 
   return result;
 }

But if the result is zero I don't believe I fully understand yet how the leak happens.
Comment 4 Pablo Galindo Salgado 2022-08-04 15:52:32 UTC
The first result is already 0 in

int result = __libdw_visit_scopes (0, &cu, NULL, &pc_match, &pc_record, &a);

I think this is because walk_children finds no "real children" so it returns with the last return (ret < 0 ? -1 : 0;) every single time.
Comment 5 Mark Wielaard 2022-08-04 15:57:37 UTC
(In reply to Mark Wielaard from comment #3)
> But if the result is zero I don't believe I fully understand yet how the
> leak happens.

So I believe this may happen if we found an inlined subroutine with pc_record, but then don't find the abstract origin DIE with origin_match.

That does feels like it should be an error, but (currently) isn't.

Maybe this happens if the reference to the abstract origin is in a partial unit that isn't imported into the current CU.

It would be really nice to have an example of the DIE tree when this happens.
Comment 6 Pablo Galindo Salgado 2022-08-04 16:05:42 UTC
What would be the best way to provide that?
Comment 7 Pablo Galindo Salgado 2022-08-04 16:07:18 UTC
The dwarf trie comes from /usr/lib64/libc-2.17.so in a RHEL7 system
Comment 8 Pablo Galindo Salgado 2022-08-04 16:11:11 UTC
If you give me a patch adding print statements to libdw/dwarf_getscopes.c and libdw/libdw_visit_scopes.c I can give you the output.
Comment 9 Mark Wielaard 2022-08-29 15:28:08 UTC
Created attachment 14302 [details]
dwarf_getscopes debug trace

Maybe the following would give us a clue about what is really happening. It should print the DIE offsets that are being visited.
Comment 10 Mark Wielaard 2022-10-27 20:37:10 UTC
(In reply to Pablo Galindo Salgado from comment #8)
> If you give me a patch adding print statements to libdw/dwarf_getscopes.c
> and libdw/libdw_visit_scopes.c I can give you the output.

See attachment 14302 [details] on comment #9.
Comment 11 Pablo Galindo Salgado 2022-11-03 16:21:13 UTC
Unfortunately I cannot easily patch dwarf_getscopes and run the experiment :(
Comment 12 Pablo Galindo Salgado 2022-11-03 16:22:53 UTC
Hopefully I can report back with the output in a few days. Apologies for the delay!
Comment 13 Pablo Galindo Salgado 2022-11-03 16:44:25 UTC
Hummmmm, it seems that I cannot attach the output because is 87 MB of output.
Comment 14 Pablo Galindo Salgado 2022-11-03 16:45:44 UTC
I can tell you that the file finishes with:

  pc_record => [5d9f]
  pc_record <= [5d9f] 3
__libdw_visit_scopes result 3, scopes 0xa689d0, inlined 0, nscopes 3

:)
Comment 15 Pablo Galindo Salgado 2022-11-03 16:58:20 UTC
Here is the parts that reference the parts:

__libdw_visit_scopes for pc 0xf5b23, cudie [29ce89]
__libdw_visit_scopes result 0, scopes (nil), inlined 0, nscopes 0
__libdw_visit_scopes for pc 0x252208, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1a761b0, inlined 2, nscopes 3
__libdw_visit_scopes for origin_match 0x252208, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1a761b0, inlined 2, nscopes 3
__libdw_visit_scopes for pc 0x13dd69, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1b83810, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x131d71, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1a94580, inlined 7, nscopes 2
__libdw_visit_scopes for origin_match 0x131d71, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1a94580, inlined 7, nscopes 2
__libdw_visit_scopes for pc 0x13e992, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1b49b40, inlined 2, nscopes 1
__libdw_visit_scopes for origin_match 0x13e992, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1b49b40, inlined 2, nscopes 1
__libdw_visit_scopes for pc 0x12d6cb, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1ad8ba0, inlined 7, nscopes 2
__libdw_visit_scopes for origin_match 0x12d6cb, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1ad8ba0, inlined 7, nscopes 2
__libdw_visit_scopes for pc 0x12c246, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1a21b70, inlined 2, nscopes 1
__libdw_visit_scopes for origin_match 0x12c246, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1a21b70, inlined 2, nscopes 1
__libdw_visit_scopes for pc 0x12bef0, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1e44600, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x1e2e22, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1c0aa70, inlined 2, nscopes 1
__libdw_visit_scopes for origin_match 0x1e2e22, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1c0aa70, inlined 2, nscopes 1
__libdw_visit_scopes for pc 0x20ca3c, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1de8920, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x1ff6aa, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1ccc3f0, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x8d1a9, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1eb3c20, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x1faad0, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1eb18b0, inlined 2, nscopes 2
__libdw_visit_scopes for origin_match 0x1faad0, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1eb18b0, inlined 2, nscopes 2
__libdw_visit_scopes for pc 0x1f4538, cudie [b]
__libdw_visit_scopes result 0, scopes 0x1eaad70, inlined 4, nscopes 2
__libdw_visit_scopes for origin_match 0x1f4538, cudie [b]
__libdw_visit_scopes origin_match result 0, scopes 0x1eaad70, inlined 4, nscopes 2
__libdw_visit_scopes for pc 0x1d61a8, cudie [b]
__libdw_visit_scopes result 2, scopes 0x1ef26b0, inlined 0, nscopes 2
__libdw_visit_scopes for pc 0x22554, cudie [22ec]
__libdw_visit_scopes result 3, scopes 0x1917ca0, inlined 0, nscopes 3
Comment 16 Mark Wielaard 2022-12-21 13:06:33 UTC
Sorry this is a bit difficult to debug.

Would you be able to show the start of the log?
It should show how you are calling dwarf_getscopes.

I think I have found the RHEL7 glibc debug file. So I can match up some of the DIE offsets, but I have some trouble understanding what is going on.

It would also be helpful to see the other log statements in dwarf_getscopes to understand what the first and second calls to __libdw_visit_scopes have as arguments and return value.
Comment 17 Mark Wielaard 2023-02-22 17:18:51 UTC
So the code here changed a little with this patch:

commit b7c7d8776ed46e2237d18fb15c6b72e83cfa259b
Author: Mark Wielaard <mark@klomp.org>
Date:   Sun Jan 22 00:31:57 2023 +0100

    libdw: Search for abstract origin in the correct CU
    
    With gcc -flto the abstract origin of an inlined subroutine
    could be in a different CU. dwarf_getscopes might return an
    empty scope if it cannot find the abstract origin scope. So
    make sure to search in the
    
    We also tried to add the origin match in pc_record directly
    in the current inlined scope. This always failed, causing
    to do a needless traversal, followed by the full CU scan in
    dwarf_getscopes. Just always stop the pc_record search and
    then do the CU origin_match in dwarf_getscopes.
    
    Signed-off-by: Mark Wielaard <mark@klomp.org>

Which makes the condition of the first check slightly different:

-  if (result == 0 && a.scopes != NULL)
-    result = __libdw_visit_scopes (0, &cu, NULL, &origin_match, NULL, &a);
+  if (result >= 0 && a.scopes != NULL && a.inlined > 0)
+    {
+      /* We like the find the inline function's abstract definition
+         scope, but that might be in a different CU.  */
+      cu.die = CUDIE (a.inlined_origin.cu);
+      result = __libdw_visit_scopes (0, &cu, NULL, &origin_match, NULL, &a);
+    }

So with that I think my proposed patch in comment #3 might work.
But I have not been able to replicate the issue. So cannot easily check.
Comment 18 Mark Wielaard 2023-02-22 22:41:01 UTC
I could replicate a leak when there was an error looking up the inlined scopes, so I submitted the patch to fix that:
https://inbox.sourceware.org/elfutils-devel/20230222223901.1089881-1-mark@klomp.org/T/#u

Hopefully that also fixes the original issue (which I have been unable to replicate).
Comment 19 Mark Wielaard 2023-02-28 11:38:14 UTC
I hope this (plus the other changes to dwarf_getscopes) will fix the original issue too. It should be part of 0.189 (which should be released this week). If it didn't solve the issue please feel free to reopen this bug or file a new one.

commit e24d8a4a3ea106608bb3e8d33c4639cf71d0f08d
Author: Mark Wielaard <mark@klomp.org>
Date:   Wed Feb 22 23:34:00 2023 +0100

    libdw: Fix dwarf_getscopes memory leak on error
    
    When there is an error in dwarf_getscopes after the initial scopes
    have been allocated, e.g. when looking for the inlined scopes, then
    the scopes would leak. Fix this by explicitly free the scopes on error.
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=29434
    
    Signed-off-by: Mark Wielaard <mark@klomp.org>