28711 – gdb closes when displaying structs with long field names in eclipse

Bug 28711 - gdb closes when displaying structs with long field names in eclipse

Summary: gdb closes when displaying structs with long field names in eclipse

Status:	RESOLVED FIXED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	mi (show other bugs)
Version:	HEAD

Importance:	P2 normal
Target Milestone:	12.1
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-12-17 13:07 UTC by Cristian Lupascu
Modified:	2022-10-31 16:54 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2021-12-17 00:00:00

Attachments
testcase and traces (16.59 KB, application/zip) 2021-12-17 13:07 UTC, Cristian Lupascu	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Cristian Lupascu 2021-12-17 13:07:56 UTC

Created attachment 13862 [details]
testcase and traces

Requirements for reproducing:
1) Eclipse (Version: 2021-12 (4.22.0)) with C/C++ plugin (C/C++ Development Tools	10.4.1.202109150103	org.eclipse.cdt.feature.group	Eclipse CDT).
I have setup eclipse to use the nightly GDB version (GNU gdb (GDB) 12.0.50.20211217-git), but the bug still occurs.

2) OS: Linux (Linux vm 5.11.0-43-generic #47~20.04.2-Ubuntu SMP Mon Dec 13 11:06:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux). If you use the provided test case, the bug can be reproduced on both physical and virtual machines.


Steps to reproduce:
1) Create a C project in eclipse and add test.c to it (from the attachment). The file contains a simple testcase that defines a struct with 2000 fields with increasing name lengths up to 2000 characters.

2) Build and debug. After initiating the debug session the program should be stopped at main(). Mouse over the variable "var" to display it OR open the Variables view in eclipse and expand the struct.

3) The debug session unexpectedly ends. GDB has exited with code 0.


Analysis:
1) Note that eclipse starts GDB with "/home/cristi/Downloads/gdb-12.0.50.20211217/gdb/gdb --interpreter mi2 --nx -q --interpreter console -ex new-ui mi /dev/pts/0 -ex set pagination off -ex show version". Eclipse uses "new-ui mi /dev/pts/0" only on Linux. This is why this bug is not reproducible on Windows or Mac.

2) Using "sudo strace -s 5000 -p $(pidof gdb)" reveals the issue. I've included the full output in the attachment. The crucial part is the following lines:

```
read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
lseek(9, -66, SEEK_CUR)                 = -1 ESPIPE (Illegal seek)
lseek(9, -66, SEEK_CUR)                 = -1 ESPIPE (Illegal seek)
lseek(9, -66, SEEK_CUR)                 = -1 ESPIPE (Illegal seek)
```

GDB doesn't receive the full command in the first read, then it attempts to lseek() back to the first "\n" from the second read(). The lseek() fails with ESPIPE and soon after GDB exits.

3) File descriptor 9 is "/dev/pts/0", the communication terminal between IDE and GDB, so lseek() can't be used on it.
lrwx------ 1 cristi cristi 64 dec 17 14:49 9 -> /dev/pts/0


Additional info:
On a phyisical machine, this issue is not noticeable in a practical scenario. You would need a field name with a length of 1000 characters in order to reproduce it.
However, this issue can easily happen on a VM in a realistic scenario. As you can see above, a struct field with a name less that 64 characters could easily reproduce the issue.

Comment 1 Andrew Burgess 2021-12-17 16:53:17 UTC

Was able to reproduce this without using Eclipse, with current HEAD.

First, I had to patch GDB to slow it down a little.  I observed, that when this bug triggers in Eclipse the complete command is now read in the fist "read" call.  When GDB issues the second "read" call, it gets, not only the remainder of the first command, including the trailing \n, but it also gets the start of the next command sent from eclipse.

When I tried to reproduce this in GDB without eclipse, I couldn't reproduce this read pattern, even bulk pasting in the commands, at least for me, GDB seemed to process things too quickly.

So I added this patch to GDB:

### START ####

diff --git a/gdb/event-top.c b/gdb/event-top.c
index 530ea298247..e12a87a7910 100644
--- a/gdb/event-top.c
+++ b/gdb/event-top.c
@@ -854,6 +874,7 @@ gdb_readline_no_editing_callback (gdb_client_data client_data)
 
   buffer_grow_char (&line_buffer, '\0');
   result = buffer_finish (&line_buffer);
+  sleep (1);
   ui->input_handler (gdb::unique_xmalloc_ptr<char> (result));
 }
 

### END ###

And now I am able to reproduce the failure.

(1) Create a pseudo terminal to use as the MI terminal - in another shell I do `tty ; tail -f /dev/null`, and copy the path to the pseudo terminal that is printed, then

(2) Compile the test.c program from the original bug report,

(3) Start GDB: ./gdb/gdb --data-directory ./gdb/data-directory/ --interpreter mi2 --nx -q --interpreter console -ex "new-ui mi /dev/pts/13" -ex "set pagination off" -ex "show version" -ex "start" ./test.x
    Replace "/dev/pts/13" with the path copied in step #1.

(4) Now switch back to the terminal used in step #1, this should have started up as an MI terminal.  Paste in the following commands.  These need to be pasted in a single paste action, NOT one at a time:

### START ### 
-var-create --thread 1 --frame 0 - * var
-var-list-children var1
-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
### END ###

For me, GDB will then exit every time.  What I observe is that the call to `fgetc` in gdb_readline_no_editing_callback returns EOF after it has read the 1024 long command - this is the first command where the \n appears in the next read block.  I still don't understand why that call is returning EOF....

Comment 2 Andrew Burgess 2021-12-17 16:55:17 UTC

I suspect the command I included in the last comment will probably not display correctly (unless you have a crazy wide monitor, or tiny font).  You'll need to reformat these as 4 lines, each starting with `-var-.....`.  Hope that makes sense.

Comment 3 Andrew Burgess 2021-12-18 11:39:08 UTC

I tried starting GDB without placing the MI interface onto a separate terminal, like this:

  ./gdb/gdb --data-directory ./gdb/data-directory/ --interpreter mi2 --nx -q -ex "set pagination off" -ex "show version" -ex "start" ./test.x

Then pasted the same set of MI commands as before, and the test completes successfully.  So it seems to be something about the way the separate MI interface is handled.

I looked in top.c at `new_ui_command` function, and saw this comment:

    /* Open specified terminal.  Note: we used to open it three times,
       once for each of stdin/stdout/stderr, but that does not work
       with Windows named pipes.  */

So, on a whim I tried this patch:

### START ###

diff --git a/gdb/top.c b/gdb/top.c
index 1f9e649a85d..470881b5db6 100644
--- a/gdb/top.c
+++ b/gdb/top.c
@@ -366,10 +366,12 @@ new_ui_command (const char *args, int from_tty)
     /* Open specified terminal.  Note: we used to open it three times,
        once for each of stdin/stdout/stderr, but that does not work
        with Windows named pipes.  */
-    gdb_file_up stream = open_terminal_stream (tty_name);
+    gdb_file_up stream1 = open_terminal_stream (tty_name);
+    gdb_file_up stream2 = open_terminal_stream (tty_name);
+    gdb_file_up stream3 = open_terminal_stream (tty_name);
 
     std::unique_ptr<ui> ui
-      (new struct ui (stream.get (), stream.get (), stream.get ()));
+      (new struct ui (stream1.get (), stream2.get (), stream3.get ()));
 
     ui->async = 1;
 
@@ -380,7 +382,9 @@ new_ui_command (const char *args, int from_tty)
     interp_pre_command_loop (top_level_interpreter ());
 
     /* Make sure the file is not closed.  */
-    stream.release ();
+    stream1.release ();
+    stream2.release ();
+    stream3.release ();
 
     ui.release ();
   }

### END ###

And retested the original GDB command line (with separate MI interface), and now the problem is resolved.

I don't know exactly means, but maybe someone else understands, I'll just keep digging...

Comment 4 Andrew Burgess 2021-12-20 18:44:00 UTC

So what happens is that a lot of input arrives on the read file descriptor in one go.  GDB does a fgetc, and glibc then does a read on the file descriptor.  I see glibc read up to 1024 bytes.  Clearly, the original bug reporter saw much smaller reads from glibs, but that's not really important.

If the first command that arrives (including the commands final \n character) is larger than one read buffer (so for me, larger than 1024 bytes), then glibc will perform a second read, also of up to 1024 bytes to find the rest of the command.

If we imagine that the final \n character is the first character in the second read buffer, and that we get a full 1024 bytes in the second read buffer, then GDB has read 1023 bytes more than it actually needed.  As a result, the file position of the file descriptor is 1023 bytes ahead of where glibc actually thinks it should be in the file.

But, moving on, GDB processes the first command, which results in some output.  GDB wants to print this output, and eventually, this output is sent to the output file descriptor via glibc.

glibc notices that the file position is 1023 bytes ahead of where it should be, and so tries to lseek the file position back to the expected location.

lseek isn't supported on terminals, and so things start to go wrong.  I haven't bothered to track down exactly what causes GDB to exit, because I'm not convinced it's important.  What matters is that by sharing the file descriptor for both reading and writing, we end up triggering these invalid lseeks from within glibc.


It turns out that the code I changed in comment #3, once upon a time, did open the terminal 3 times.  This was changed in this commit:

  commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
  Date:   Thu Jul 18 17:20:04 2019 +0100

    Fix for using named pipes on Windows

The idea seemed to be, to use a named pipe on windows instead of a terminal.

First, I don't know anything about named pipes on windows, but...

... if we consider named pipes on Linux, I'm not convinced that using named pipes will work here.  My assumption is that the debugger frontend would create a named pipe, and then try to attach the MI interface to that pipe.

GDB would then be writing MI output to the pipe, and also, reading incoming commands from the pipe.

The problem is, that when GDB writes out MI output, it will also see that output as incoming commands.

On the other end, we have the same problem.  The front end writes commands to the pipe and then tries to read MI output from the pipe.  But the command that was just written will be available for reading, and so will be consumed as MI output.

In short, you end up in a mess with everyone pumping output into the pipe and then competing to consume that same output.

What I wonder instead is, maybe we should change the `new-ui` command.  Currently we allow `new-ui INTERPRETER PATH-TO-PTTY`.  Maybe we should allow something like this too: `new-ui INTERPRETER PATH-TO-INPUT-PIPE PATH-TO-OUTPUT-PIPE PATH-TO-ERROR-PIPE`

If only one path is provided, we open it 3 times for in/out/err.  If three paths are provided then each is opened once.  This would require the front end to then manage three named pipes though.

Maybe named pipes on windows behave differently though...

Comment 5 Florian Weimer 2021-12-24 17:03:14 UTC

(In reply to Andrew Burgess from comment #4)
> So what happens is that a lot of input arrives on the read file descriptor
> in one go.  GDB does a fgetc, and glibc then does a read on the file
> descriptor.  I see glibc read up to 1024 bytes.  Clearly, the original bug
> reporter saw much smaller reads from glibs, but that's not really important.
> 
> If the first command that arrives (including the commands final \n
> character) is larger than one read buffer (so for me, larger than 1024
> bytes), then glibc will perform a second read, also of up to 1024 bytes to
> find the rest of the command.
> 
> If we imagine that the final \n character is the first character in the
> second read buffer, and that we get a full 1024 bytes in the second read
> buffer, then GDB has read 1023 bytes more than it actually needed.  As a
> result, the file position of the file descriptor is 1023 bytes ahead of
> where glibc actually thinks it should be in the file.
> 
> But, moving on, GDB processes the first command, which results in some
> output.  GDB wants to print this output, and eventually, this output is sent
> to the output file descriptor via glibc.
> 
> glibc notices that the file position is 1023 bytes ahead of where it should
> be, and so tries to lseek the file position back to the expected location.

This description suggests to me that GDB does not use two different stdio streams for reading and writing. With one stream, POSIX requires fseek calls when switching from reading to writing and vice versa. This is obviously a non-starter for terminal devices because they do not support seeking.

So opening the same terminal device multiple times is the way to go, it's how stdin, stdout, stderr can coexist on one terminal.

Comment 6 Tom Tromey 2021-12-26 19:06:16 UTC

(In reply to Andrew Burgess from comment #4)

> What I wonder instead is, maybe we should change the `new-ui` command. 
> Currently we allow `new-ui INTERPRETER PATH-TO-PTTY`.  Maybe we should allow
> something like this too: `new-ui INTERPRETER PATH-TO-INPUT-PIPE
> PATH-TO-OUTPUT-PIPE PATH-TO-ERROR-PIPE`
> 
> If only one path is provided, we open it 3 times for in/out/err.  If three
> paths are provided then each is opened once.  This would require the front
> end to then manage three named pipes though.
> 
> Maybe named pipes on windows behave differently though...

We could ask the original author perhaps.

One other idea is to detect the named pipe case on Windows, either
by detecting it after the first open, or maybe by moving open_terminal_stream
to the *-hdep files and changing what it returns somehow.

Comment 7 Andrew Burgess 2021-12-27 10:15:09 UTC

* tromey at sourceware dot org via Gdb-prs <gdb-prs@sourceware.org> [2021-12-26 19:06:16 +0000]:

> https://sourceware.org/bugzilla/show_bug.cgi?id=28711
> 
> Tom Tromey <tromey at sourceware dot org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |tromey at sourceware dot org
> 
> --- Comment #6 from Tom Tromey <tromey at sourceware dot org> ---
> (In reply to Andrew Burgess from comment #4)
> 
> > What I wonder instead is, maybe we should change the `new-ui` command. 
> > Currently we allow `new-ui INTERPRETER PATH-TO-PTTY`.  Maybe we should allow
> > something like this too: `new-ui INTERPRETER PATH-TO-INPUT-PIPE
> > PATH-TO-OUTPUT-PIPE PATH-TO-ERROR-PIPE`
> > 
> > If only one path is provided, we open it 3 times for in/out/err.  If three
> > paths are provided then each is opened once.  This would require the front
> > end to then manage three named pipes though.
> > 
> > Maybe named pipes on windows behave differently though...
> 
> We could ask the original author perhaps.

I already reached out to the original patch author asking for any
insights they might have, but given the time of year, I'm not
expecting a reply any time soon.

Comment 8 Andrew Burgess 2022-01-17 16:43:16 UTC

I posted this possible fix to the mailing list:

  https://sourceware.org/pipermail/gdb-patches/2022-January/185209.html

Comment 9 Sourceware Commits 2022-02-07 10:25:07 UTC

The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d08cbc5d3203118da5583296e49273cf82378042

commit d08cbc5d3203118da5583296e49273cf82378042
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Wed Dec 22 12:57:44 2021 +0000

    gdb: unbuffer all input streams when not using readline
    
    This commit should fix PR gdb/28711.  What's actually going on is
    pretty involved, and there's still a bit of the story that I don't
    understand completely, however, from my observed results, I think that
    the change I propose making here (or something very similar) is going
    to be needed.
    
    The original bug report involves using eclipse to drive gdb using mi
    commands.  A separate tty is spun off in which to send gdb the mi
    commands, this tty is created using the new-ui command.
    
    The behaviour observed is that, given a particular set of mi commands
    being sent to gdb, we sometimes see an ESPIPE error from a lseek
    call, which ultimately results in gdb terminating.
    
    The problems all originate from gdb_readline_no_editing_callback in
    gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
    allow glibc to perform buffering on the FILE object being used.
    
    I say sometime, because, gdb_readline_no_editing_callback already
    includes a call to disable the glibc buffering, but this is only done
    if the input stream is not a tty.  In our case the input stream is a
    tty, so the buffering is left in place.
    
    The first step to understanding why this problem occurs is to
    understand that eclipse sends multiple commands to gdb very quickly
    without waiting for and answer to each command, eclipse plans to
    collect all of the command results after sending all the commands to
    gdb.  In fact, eclipse sends the commands to gdb that they appear to
    arrive in the gdb process as a single block of data.  When reproducing
    this issue within the testsuite I find it necessary to send multiple
    commands using a single write call.
    
    The next bit of the story gets a little involved, and this is where my
    understanding is not complete.  I can describe the behaviour that I
    observe, and (for me at least) I'm happy that what I'm seeing, if a
    little strange, is consistent.  In order to fully understand what's
    going on I think I would likely need to dive into kernel code, which
    currently seems unnecessary given that I'm happy with the solution I'm
    proposing.
    
    The following description all relates to input from a tty in which I'm
    not using readline.  I see the same problems either when using a
    new-ui tty, or with gdb's standard, non-readline, mi tty.
    
    Here's what I observe happening when I send multiple commands to gdb
    using a single write, if I send gdb this:
    
      command_1\ncommand_2\ncommand_3
    
    then gdb's event loop will wake up (from its select) as it sees there
    is input available.  We call into gdb_readline_no_editing_callback,
    where we call fgetc, glibc will do a single big read, and get back
    just:
    
      command_1\n
    
    that is, despite there being multiple lines of input available, I
    consistently get just a single line.  From glibc a single character is
    returned from the fgetc call, and within gdb we accumulate characters,
    one at a time, into our own buffer.  Eventually gdb sees the '\n'
    character, and dispatches the whole 'command_1' into gdb's command
    handler, which processes the command and prints the result.  We then
    return to gdb_readline_no_editing_callback, which in turn returns to
    gdb's event loop where we re-enter the select.
    
    Inside the select we immediately see that there is more input waiting
    on the input stream, drop out of the select, and call back into
    gdb_readline_no_editing_callback.  In this function we again call
    fgetc where glibc performs another big read.  This time glibc gets:
    
      command_2\n
    
    that is, we once again get just a single line, despite there being a
    third line available.  Just like the first command we copy the whole
    string, character by character into gdb's buffer, then handle the
    command.  After handling the command we go to the event loop, enter,
    and then exit the select, and call back to the function
    gdb_readline_no_editing_callback.
    
    In gdb_readline_no_editing_callback we again call fgetc, this time
    glibc gets the string:
    
      command_3\n
    
    like before, we copy this to gdb's buffer and handle the command, then
    we return to the event loop.  At this point the select blocks while we
    wait for more input to arrive.
    
    The important bit of this is that someone, somewhere is, it appears,
    taking care to split the incoming write into lines.
    
    My next experiment is to try something like:
    
      this_is_a_very_long_command\nshort_command\n
    
    However, I actually make 'this_is_a_very_long_command' very long, as
    in many hundreds of characters long.  One way to do this is:
    
      echo xxxxxx.....xxxxx
    
    and just adding more and more 'x' characters as needed.  What I'm
    aiming for is to have the first command be longer than glibc's
    internal read buffer, which, on my machine, is 1024 characters.
    
    However, for this discussion, lets imagine that glibc's buffer is just
    8 characters (we can create just this situation by adding a suitable
    setbuf call into gdb_readline_no_editing_callback).
    
    Now, if I send gdb this data:
    
      abcdefghij\nkl\n
    
    The first read from glibc will get 'abcdefgh', that is a full 8
    character buffer.  Once gdb has copied these to its buffer we call
    fgetc again, and now glibc will get 'ij\n', that is, just like before,
    multiple lines are split at the '\n' character.  The full command,
    which is now in gdb's buffer can be handled 'abcdefghij', after which
    we go (via the event loop) back to gdb_readline_no_editing_callback.
    Now we call fgetc, and glibc will get 'kl\n', which is then handled in
    the normal way.
    
    So far, so good.  However, there is, apparently, one edge case where
    the above rules don't apply.
    
    If the '\n' character is the first character read from the kernel,
    then the incoming lines are not split up.  So, given glibc's 8
    character buffer, if I send gdb this:
    
      abcdefgh\nkl\n
    
    that is the first command is 8 characters plus a newline, then, on the
    first read (from within glibc) we get 'abcdefgh' in a single buffer.
    As there's no newline gdb calls fgetc again, and glibc does another
    large read, now we get:
    
      \nkl\n
    
    which doesn't follow the above pattern - the lines are not split into
    separate buffers!
    
    So, gdb reads the first character from glibc using fgetc, this is the
    newline.  Now gdb has a complete command, and so the command is
    handled.  We then return to the event loop and enter the select.
    
    The problem is that, as far as the kernel is concerned, there is no
    more input pending, it's all been read into glibc's buffer, and so the
    select doesn't return.  The second command is basically stuck in
    glibc's buffer.
    
    If I send another command to gdb, or even just send an empty
    command (a lone newline) then the select returns, we call into
    gdb_readline_no_editing_callback, and now gdb sees the second
    command.
    
    OK, so the above is interesting, but it doesn't explain the ESPIPE
    error.
    
    Well, that's a slightly different, but related issue.  The ESPIPE
    case will _only_ show up when using new-ui to create the separate tty
    for mi commands, and is a consequence of this commit:
    
      commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
      Date:   Thu Jul 18 17:20:04 2019 +0100
    
          Fix for using named pipes on Windows
    
    Prior to this commit, the new-ui command would open the tty three
    times, once each for stdin, stderr, and stdout.  After this commit we
    open the tty just once and reuse the FILE object for all three roles.
    
    Consider the problem case, where glibc has (unexpectedly) read the
    second command into its internal buffer.  When we handle the first
    command we usually end up having to write something to the mi output
    stream.
    
    After the above commit the same FILE object represents both the input
    and output streams, so, when gdb tries to write to the FILE object,
    glibc spots that there is input pending within the input buffer, and
    so assumes that we have read ahead of where we should be in the input
    file.  To correct for this glibc tries to do an lseek call to
    reposition the file offset of the output stream prior to writing to
    it.  However, as the output stream is a tty, and seeking is not
    supported on a tty, this lseek call fails, this results in the ESPIPE,
    which ultimately causes gdb to terminate.
    
    So, now we understand why the ESPIPE triggers (which was what caused
    the gdb crash in the original bug report), and we also understand that
    sometime gdb will not handle the second command in a timely
    fashion (if the first command is just the wrong length). So, what to
    do about all this?
    
    We could revert the commit mentioned above (and implement its
    functionality another way).  This would certainly resolve the ESPIPE
    issue, the buffered input would now only be on the input stream, the
    output stream would have no buffered input, and so glibc would never
    try to lseek, and so we'd never get the ESPIPE error.
    
    However, this only solves one of the two problems.  We would still
    suffer from the problem where, if the first command is just the wrong
    length, the second command will not (immediately) get handled.
    
    The only solution I can see to this problem is to unbuffer the input
    stream.  If glibc is not buffering the input, but instead, we read
    incoming data character by character from the kernel, then everything
    will be fine.  As soon as we see the newline at the end of the first
    command we will handle the first command.  As glibc will have no
    buffered input it will not be tempted to lseek, so no ESPIPE error.
    When we go have to the event loop there will be more data pending in
    the kernel, so the select will immediately return, and the second
    command will be processed.
    
    I'm tempted to suggest that we should move the unbuffering of the
    input stream out of gdb_readline_no_editing_callback and do it
    somewhere earlier, more like when we create the input streams.
    However, I've not done that in this commit for a couple of reasons:
    
      1. By keeping the unbuffering in gdb_readline_no_editing_callback
      I'm making the smallest possible change that fixes the bug.  Moving
      the unbuffering somewhere better can be done as a refactor later, if
      that 's felt to be important,
    
      2. I don't think making repeated calls to unbuffer the input will
      have that much performance impact.  We only make the unbuffer call
      once per call to gdb_readline_no_editing_callback, and, if the input
      stream is already unbuffered we'll return pretty quickly, so I don't
      see this as being massively costly,
    
      3. Tom is currently doing lots of gdb stream management changes and
      I want to minimise the chances we'll conflict.
    
    So, this commit just changes gdb_readline_no_editing_callback to
    always unbuffer the input stream.
    
    The test for this issue sends two commands in a loop, with the first
    command growing bigger each time around the loop.  I actually make the
    first command bigger by just adding whitespace to the front, as gdb
    still has to read the complete command (including whitespace) via
    glibc, so this is enough to trigger the bug.
    
    The original bug was reported when using a virtual machine, and in
    this situation we see this in the strace output:
    
      read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
      read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
    
    I'm not completely sure what's going on here, but it appears that the
    kernel on the virtual machine is delivering the input to glibc slower
    than I see on my real hardware; glibc asks for 1024 bytes, but only
    gets 64 bytes the first time.  In the second read we see the problem
    case, the first character is the newline, but then the entire second
    command is included.
    
    If I run this exact example on my real hardware then the first command
    would not be truncated at 64 bytes, instead, I'd expect to see the
    newline included in the first read, with the second command split into
    a second read.
    
    So, for testing, I check cases where the first command is just a few
    characters (starting at 8 character), all the way up to 2048
    characters.  Hopefully, this should mean we hit the problem case for
    most machine setups.
    
    The only last question relates to commit afe09f0b6311a4d that I
    mentioned earlier.  That commit was intended to provide support for
    Microsoft named pipes:
    
      https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
    
    I know next to nothing about this topic beyond a brief scan of the
    above link, but I think these windows named pipe are closer in
    behaviour to unix sockets than to unix named fifos.
    
    I am a little nervous that, after the above commit, we now use the
    same FILE for in, err, and out streams.  In contrast, in a vanilla C
    program, I would expect different FILE objects for each stream.
    
    Still, I'm reluctant to revert the above commit (and provide the same
    functionality a different way) without a specific bug to point at,
    and, now that the streams are unbuffered, I expect a lot of the read
    and write calls are going straight to the kernel with minimal glibc
    involvement, so maybe it doesn't really matter.  Anyway, I haven't
    touched the above patch, but it is something to keep in mind when
    working in this area.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711

Comment 10 Andrew Burgess 2022-02-08 10:16:34 UTC

I think this issue should now be resolved.  Feel free to reopen the bug if you are still seeing this problem.

Comment 11 Cristian Lupascu 2022-02-09 19:48:02 UTC

Thank you! I've tested with eclipse and I can't reproduce the issue anymore.

Comment 12 Jonah Graham 2022-10-31 16:48:58 UTC

@Andrew,

I am concerned that the fix for this issue has caused a regression on Windows that breaks Eclipse CDT in a new way. I don't have a setup to build GDB for Windows, but the user reported trying the ARM pre-built one and I can reproduce the problem with it. https://github.com/eclipse-embed-cdt/eclipse-plugins/issues/546

Here is a little snippet that demonstrates the problem:

$ cat input
1234-list-thread-gro
5678-list-thread-gro
9012-list-thread-gro

$ cat input | ./arm-none-eabi-gdb --silent --interpreter=mi2
=thread-group-added,id="i1"
(gdb)
1234^error,msg="Undefined MI command: list-thread-gro",code="undefined-command"
(gdb)
678^error,msg="Undefined MI command: list-thread-gro",code="undefined-command"
(gdb)
9012^error,msg="Undefined MI command: list-thread-gro",code="undefined-command"
(gdb)

As you can see above the 5 (in 5678) is missing. The commands (-list-thread-gro)  is incorrect on purpose as the length of the command and the newline of the input file all seem to affect whether or not this problem is seen.

This is also printed at the end of the run, but I think that is because of EOF on input:
&"warning: Exception condition detected on fd 0\n"
&"error detected on stdin\n"

Comment 13 Jonah Graham 2022-10-31 16:54:17 UTC

(In reply to Jonah Graham from comment #12)
> @Andrew,
> 
> I am concerned that the fix for this issue has caused a regression on

I have just seen ac16b09d7e5fd0013ffa27e4d0531c0af12a529a seems to be the fix for this part and that the GDB from ARM, even though released recently was built with an old commit that didn't include ac16b09d7e5fd0013ffa27e4d0531c0af12a529a. Sorry for the noise.