31352 – [gdb/cli, recursive internal problem] sig_write uses gdb_stderr, which may be a string_file, which doesn't support write_async_safe

Bug 31352 - [gdb/cli, recursive internal problem] sig_write uses gdb_stderr, which may be a string_file, which doesn't support write_async_safe

Summary: [gdb/cli, recursive internal problem] sig_write uses gdb_stderr, which may be...

Status:	NEW

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	cli (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-02-07 13:50 UTC by Tom de Vries
Modified:	2024-04-12 20:11 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom de Vries 2024-02-07 13:50:33 UTC

I tried debugging a dap test-case (gdb.dap/pause.exp), by replacing:
...
@@ -661,13 +661,16 @@ quit (void)
 #else
   if (job_control
       /* If there is no terminal switching for this target, then we can't
 	 possibly get screwed by the lack of job control.  */
       || !target_supports_terminal_ours ())
-    throw_quit ("Quit");
+    __builtin_abort ();
   else
     throw_quit ("Quit (expect signal SIGINT when the program is resumed)");
 #endif
...
to try to produce a corefile.

I noticed this didn't produce a core file, but it did mention recursive internal problems, so I decided to try a bit harder:
...
@@ -347,7 +347,7 @@ internal_vproblem (struct internal_problem *problem,
   /* Don't allow infinite error/warning recursion.  */
   {
     static const char msg[] = "Recursive internal problem.\n";
-
+    __builtin_abort ();
     switch (dejavu)
       {
       case 0:
...
and managed to produce a core file, due to a segfault.

The segfault is due to running out of stack, and the stack loop looks like:
...
gdb) 
#16321 0x00000000014f89f5 in internal_error_loc (file=0x160fac0 "/data/vries/gdb/src/gdb/ui-file.h", line=72, 
    fmt=0x160faa4 "%s: write_async_safe") at /data/vries/gdb/src/gdbsupport/errors.cc:58
58	  internal_verror (file, line, fmt, ap);
(gdb) down
#16320 0x0000000000d2433d in internal_verror (file=0x160fac0 "/data/vries/gdb/src/gdb/ui-file.h", line=72, 
    fmt=0x160faa4 "%s: write_async_safe", ap=0x7ffc703be958) at /data/vries/gdb/src/gdb/utils.c:495
495	  internal_vproblem (&internal_error_problem, file, line, fmt, ap);
(gdb) 
#16319 0x0000000000d24307 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x276b5e0 <internal_error_problem>, 
    file=0x160fac0 "/data/vries/gdb/src/gdb/ui-file.h", line=72, fmt=0x160faa4 "%s: write_async_safe", 
    ap=0x7ffc703be958) at /data/vries/gdb/src/gdb/utils.c:350
350	    __builtin_abort ();
(gdb) 
#16318 0x00007f1a9e6553e5 in abort () from /lib64/libc.so.6
(gdb) 
#16317 0x00007f1a9e653d2b in raise () from /lib64/libc.so.6
(gdb) 
#16316 <signal handler called>
(gdb) 
#16315 0x00000000007a35eb in handle_fatal_signal (sig=6) at /data/vries/gdb/src/gdb/event-top.c:898
898	      sig_write ("\n\n");
(gdb) 
#16314 0x00000000007a35b1 in <lambda(char const*)>::operator()(const char *) const (__closure=0x7ffc703bd8af, 
    msg=0x15ed81c "\n\n") at /data/vries/gdb/src/gdb/event-top.c:893
893	    gdb_stderr->write_async_safe (msg, strlen (msg));
(gdb) 
#16313 0x000000000082e644 in ui_file::write_async_safe (this=0x7ffc703c1970, buf=0x15ed81c "\n\n", length_buf=2)
    at /data/vries/gdb/src/gdb/ui-file.h:72
72	  { gdb_assert_not_reached ("write_async_safe"); }
(gdb) 
#16312 0x00000000014f89f5 in internal_error_loc (file=0x160fac0 "/data/vries/gdb/src/gdb/ui-file.h", line=72, 
    fmt=0x160faa4 "%s: write_async_safe") at /data/vries/gdb/src/gdbsupport/errors.cc:58
58	  internal_verror (file, line, fmt, ap);
(gdb) 
...

AFAICT, what happens is:
- abort is raised
- abort is caught
- attempt to write backtrace using sig_write
- sigwrite does gdb_stderr->write_async_safe
- since gdb_stderr is set to a string_file, which doesn't have
  write_async_safe an internal_error is thrown
- the internal_error ends up calling the abort I added in internal_vproblem,
  and another abort is raised

This can easily be avoided by printing to stderr instead:
...
diff --git a/gdb/bt-utils.c b/gdb/bt-utils.c
index 6f68e269c51..f93e45688e8 100644
--- a/gdb/bt-utils.c
+++ b/gdb/bt-utils.c
@@ -56,7 +56,7 @@ libbacktrace_error (void *data, const char *errmsg, int errnum)
 
   const auto sig_write = [] (const char *msg) -> void
   {
-    gdb_stderr->write_async_safe (msg, strlen (msg));
+    fprintf (stderr, "%s", msg);
   };
 
   sig_write ("error creating backtrace: ");
@@ -80,7 +80,7 @@ libbacktrace_print (void *data, uintptr_t pc, const char *filename,
 {
   const auto sig_write = [] (const char *msg) -> void
   {
-    gdb_stderr->write_async_safe (msg, strlen (msg));
+    fprintf (stderr,"%s",  msg);
   };
 
   /* Buffer to print addresses and line numbers into.  An 8-byte address
@@ -131,7 +131,7 @@ gdb_internal_backtrace_1 ()
 {
   const auto sig_write = [] (const char *msg) -> void
   {
-    gdb_stderr->write_async_safe (msg, strlen (msg));
+    fprintf (stderr, msg);
   };e--
 
   /* Allow up to 25 frames of backtrace.  */
@@ -159,7 +159,7 @@ gdb_internal_backtrace ()
 #ifdef GDB_PRINT_INTERNAL_BACKTRACE
   const auto sig_write = [] (const char *msg) -> void
   {
-    gdb_stderr->write_async_safe (msg, strlen (msg));
+    fprintf (stderr, "%s", msg);
   };
 
   sig_write (_("----- Backtrace -----\n"));
diff --git a/gdb/event-top.c b/gdb/event-top.c
index 33aef7d7cc5..b3d16ecd710 100644
--- a/gdb/event-top.c
+++ b/gdb/event-top.c
@@ -890,7 +890,7 @@ handle_fatal_signal (int sig)
 #ifdef GDB_PRINT_INTERNAL_BACKTRACE
   const auto sig_write = [] (const char *msg) -> void
   {
-    gdb_stderr->write_async_safe (msg, strlen (msg));
+    fprintf (stderr, "%s", msg);
   };
 
   if (bt_on_fatal_signal)
...

With this patch, I can get rid of the abort in internal_vproblem and still get my core dump.

I don't know what is a proper fix for this.

Comment 1 Guinevere Larsen 2024-04-11 19:06:07 UTC

I've come across this same bug while trying to debug a segfault from my frame_unwind move to classes[1] on 32-bit arm, no changes required to internal_vproblem or quit function.

If you stop inside frame_unwind_legacy::sniffer and send a segfault to inner GDB, you can trigger the situation where internal_verror is writing to a gdb_stderr that doesn't have write_async_safe.

[1] https://inbox.sourceware.org/gdb-patches/20240408201915.1482831-4-blarsen@redhat.com/T/#u

Comment 2 Guinevere Larsen 2024-04-12 20:11:11 UTC

Just popping by to say that if any nullptr function pointer is called you'll get that to reproduce, no need to get any specific architecture, actually.