Bug 29586

Summary: Debugger hangs on a shared library in memory-mapped file.
Product: gdb Reporter: Marcin Copik <mcopik>
Component: shlibsAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: asaffisher.dev, ppluzhnikov
Priority: P2    
Version: 12.1   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2024-08-27 00:00:00

Description Marcin Copik 2022-09-19 13:03:04 UTC
Hi!

We have an application that sends code to remote executors by transmitting the shared library as a binary payload between executors. The shared library data is written to a memory-mapped file, and we pass the file descriptor to dlopen. This works fine in our use case. However, we learned recently that this setup prevents us from using gdb when trying to debug the code.

We noticed that gdb hangs when trying to execute a function obtained from a shared library when memory-mapped files are used. If we replace the call to dlopen with a regular one, gdb processes the application correctly.

We attached gdb instance to the hanged gdb instance. When we build gdb from source in debug mode, we find out that gdb hangs on a call to fread in its internal cache of file descriptors (see the output below).

We reproduced the issue on gdb version 12.0.90-0ubuntu1, then on the official release 12.1. To reproduce the problem, run `build.sh` to compile the shared library from lib.c and the two versions of the `main.c`. The script creates two executables: `from_file` works fine with gdb, and `from_memory` is the one reproducing the problem. 

Gist with all source code samples: https://gist.github.com/mcopik/c6dc64e6b24aea9576d517ca00d1a9c0

Hanged gdb sessions:

(gdb) r
Starting program: /home/mcopik/bug_report/from_memory 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading from /proc/self/fd/4

Attached gdb session:

Attaching to process 573765
[New LWP 573767]
[New LWP 573768]
[New LWP 573769]
[New LWP 573770]
[New LWP 573771]
[New LWP 573772]
[New LWP 573773]
[New LWP 573774]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__GI___libc_read (nbytes=4096, buf=0x558434f1b220, fd=15) at ../sysdeps/unix/sysv/linux/read.c:26
26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) bt
#0  __GI___libc_read (nbytes=4096, buf=0x558434f1b220, fd=15) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=15, buf=0x558434f1b220, nbytes=4096) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f82b8697cb6 in _IO_new_file_underflow (fp=0x558434e4f2c0) at ./libio/libioP.h:947
#3  0x00007f82b86964b8 in __GI__IO_file_xsgetn (fp=0x558434e4f2c0, data=<optimized out>, n=64) at ./libio/fileops.c:1321
#4  0x00007f82b868ac29 in __GI__IO_fread (buf=buf@entry=0x7ffc32027bb0, size=size@entry=1, count=count@entry=64, fp=fp@entry=0x558434e4f2c0)
    at ./libio/iofread.c:38
#5  0x000055843364f53e in fread (__stream=0x558434e4f2c0, __n=64, __size=1, __ptr=0x7ffc32027bb0) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:293
#6  cache_bread_1 (nbytes=64, buf=0x7ffc32027bb0, f=0x558434e4f2c0) at cache.c:319
#7  cache_bread (abfd=<optimized out>, buf=0x7ffc32027bb0, nbytes=64) at cache.c:358
#8  0x000055843364e564 in bfd_bread (ptr=ptr@entry=0x7ffc32027bb0, size=<optimized out>, size@entry=64, abfd=<optimized out>, abfd@entry=0x558434f0f210)
    at bfdio.c:259
#9  0x000055843366ca53 in bfd_elf64_object_p (abfd=0x558434f0f210) at /home/mcopik/bug_report/build/gdb-12.1/bfd/elfcode.h:519
#10 0x000055843365199c in bfd_check_format_matches (abfd=0x558434f0f210, format=<optimized out>, matching=0x0) at format.c:344
#11 0x000055843351edfe in solib_bfd_open (pathname=0x558434e90200 "/proc/self/fd/4") at ./../gdbsupport/gdb_ref_ptr.h:130
#12 0x000055843351dee7 in solib_map_sections (so=0x558434f0ff00) at solib.c:540
#13 0x000055843351fe56 in update_solib_list (from_tty=<optimized out>) at solib.c:860
#14 0x0000558433520877 in solib_add (pattern=pattern@entry=0x0, from_tty=from_tty@entry=0, readsyms=1) at solib.c:960
#15 0x0000558433520b00 in handle_solib_event () at solib.c:1269
#16 0x0000558433252165 in bpstat_stop_status (aspace=<optimized out>, bp_addr=bp_addr@entry=140737353900800, thread=thread@entry=0x558434ddf090, ws=..., 
    stop_chain=stop_chain@entry=0x0) at breakpoint.c:5455
#17 0x00005584333dbc8b in handle_signal_stop (ecs=0x7ffc32028700) at infrun.c:6191
#18 0x00005584333dda68 in handle_stop_requested (ecs=<optimized out>) at infrun.c:4465
#19 handle_stop_requested (ecs=<optimized out>) at infrun.c:4460
#20 handle_inferior_event (ecs=0x7ffc32028700) at infrun.c:5695
#21 0x00005584333df48e in fetch_inferior_event () at infrun.c:4085
#22 0x00005584336f7ef6 in gdb_wait_for_event (block=block@entry=0) at event-loop.cc:700
#23 0x00005584336f83da in gdb_wait_for_event (block=0) at event-loop.cc:596
#24 gdb_do_one_event () at event-loop.cc:212
#25 0x0000558433424275 in start_event_loop () at main.c:421
#26 captured_command_loop () at main.c:481
#27 0x0000558433425e75 in captured_main (data=0x7ffc320288a0) at main.c:1351
#28 gdb_main (args=args@entry=0x7ffc320288d0) at main.c:1366
#29 0x00005584331b9d10 in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32

System: Linux master-node 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Compiler: gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0

GDB configuration: x86_64-pc-linux-gnu"
Comment 1 Asaf Fisher 2022-10-17 20:05:53 UTC
The reason this happens is because /proc/self is different between inferior and GDB. I proposed a patch to fix it today.
Comment 2 Paul Pluzhnikov 2024-08-27 03:48:39 UTC
I just hit this as well. Trivial repro with current trunk:

--- foo.c ---
#include <assert.h>

int fn (int x)
{
  assert (x > 0);
  return x + 1;
}

--- main1.c ---
#include <dlfcn.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
  int fd = open("./foo.so", O_RDONLY);
  char path[1024];

  sprintf(path, "/proc/self/fd/%d", fd);
  fprintf (stderr, "Before dlopen %s\n", path);

  void *h = dlopen(path, RTLD_LOCAL|RTLD_LAZY);
  int (*fn)(int) = dlsym(h, "fn");

  fprintf (stderr, "Before call to fn\n");
  return fn(0);
}

gcc -g -fPIC -shared -o foo.so foo.c &&
gcc -g main1.c -ldl && ./a.out



Before dlopen /proc/self/fd/3
Before call to fn
a.out: foo.c:5: fn: Assertion `x > 0' failed.
Aborted

---
Everything is as expected up to this point


gdb/gdb -q -ex run ./a.out

Reading symbols from ./a.out...
Starting program: /tmp/memfd/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Before dlopen /proc/self/fd/3

... GDB hangs forever trying to read from some pipe, and is blocking all signals requiring that I pkill -9 from another terminal.


strace -p $(pgrep gdb)
strace: Process 3885955 attached
read(16, ^Cstrace: Process 3885955 detached
 <detached ...>


ls -l /proc/3885955/fd/16
lr-x------ 1 ppluzhnikov primarygroup 64 Aug 27 03:43 /proc/3885955/fd/16 -> 'pipe:[95221642]'

---
It's also easy to make GDB not hang, but simply fail to load symbols:


--- main.c ---
#include <assert.h>
#include <dlfcn.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
  int fd = open("./foo.so", O_RDONLY);
  char path[1024];

  assert(dup2(fd, 133) != -1);
  sprintf(path, "/proc/self/fd/%d", 133);

  void *h = dlopen(path, RTLD_LOCAL|RTLD_LAZY);
  int (*fn)(int) = dlsym(h, "fn");

  return fn(0);
}

gcc -g main.c -ldl &&
gdb/gdb -q -ex run ./a.out


Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: Could not load shared library symbols for /proc/self/fd/133.
Do you need "set solib-search-path" or "set sysroot"?
a.out: foo.c:5: fn: Assertion `x > 0' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff7e520ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7e520ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7e04102 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7ded4f2 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff7ded415 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff7dfcd32 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff7fbf142 in ?? ()
#6  0x00007fffffffd868 in ?? ()
#7  0x00000000ffffd868 in ?? ()
#8  0x00007fffffffd750 in ?? ()
#9  0x000055555555524c in main () at main.c:18
Backtrace stopped: frame did not save the PC