28800 – non-ascii character cannot display correctly in tui-mode's extended-prompt

Bug 28800 - non-ascii character cannot display correctly in tui-mode's extended-prompt

Summary: non-ascii character cannot display correctly in tui-mode's extended-prompt

Status:	UNCONFIRMED

Alias:	None

Product:	gdb
Classification:	Unclassified
Component:	tui (show other bugs)
Version:	11.1

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-01-20 16:03 UTC by wuzy01
Modified:	2023-05-26 13:26 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2022-01-27 00:00:00

Attachments
Tentative patch (843 bytes, patch) 2023-05-24 17:56 UTC, Tom de Vries	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description wuzy01 2022-01-20 16:03:26 UTC

## Reproduce Procedures

```
gdb -tui main
```

Then `set extended-prompt \w \f:\t\n❯ `

After inputting each character, the prompt string will be messed.

it looks non-ascii character cannot display correctly in tui-mode.

OS: 5.15.11

Comment 1 Andreas Schwab 2022-01-20 16:11:35 UTC

(gdb) set extended-prompt \w \f:\t\n❯ 
Python Exception <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't enc
ode character u'\u276f' in position 10: ordinal not in range(128)

Comment 2 Andrew Burgess 2022-01-27 12:26:20 UTC

Andreas,

This is a slightly different issue you are seeing.  I'm guessing you have gdb.prompt_hook set.  This ends up calling gdbpy_before_prompt_hook in python.c.

If we assume Python 3 for a moment, then in this function we convert the prompt to a unicode object, assuming UTF-8 encoding.  This unicode object is then passed to the users python code.

If the user returns the same prompt unchanged, or even some other utf-8 encoded prompt string, we then convert that string back to bytes using the host_charset.

From the error message you see, it would appear your hostchar set is maybe 'ascii'?  I'm guessing it's certainly not utf-8.

You could try: 'set host-charset UTF8' and see if the problem is resolved.

The asymmetry in our use of different unicode encodings seems like a bad thing to me ... I wonder if we should just fix on one particular scheme, maybe utf-8 for some of the cases like this?

However, we should probably spin this conversation into a separate bug as this is different to the original unicode within tui bug.

Comment 3 Tom de Vries 2023-05-24 15:40:43 UTC

The prompt is printed by tui_puts_internal, which outputs every byte in the string individually.

As demonstrator patch, by making tui_puts_internal behave more like tui_puts, that is, output entire strings:
...
diff --git a/gdb/tui/tui-io.c b/gdb/tui/tui-io.c
index a1eadcd937d..5cc26f02174 100644
--- a/gdb/tui/tui-io.c
+++ b/gdb/tui/tui-io.c
@@ -521,8 +521,23 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
   int prev_col = 0;
   bool saw_nl = false;
 
-  while ((c = *string++) != 0)
+  while (true)
     {
+      const char *next = strpbrk (string, "\n\1\2\033\t");
+
+      /* Print the plain text prefix.  */
+      size_t n_chars = next == nullptr ? strlen (string) : next - string;
+      if (n_chars > 0)
+	waddnstr (w, string, n_chars);
+
+      /* We finished.  */
+      if (next == nullptr)
+	break;
+
+      c = *next;
+      if (c == 0)
+	break;
+      
       if (c == '\n')
 	saw_nl = true;
 
@@ -530,6 +545,7 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
 	{
 	  /* Ignore these, they are readline escape-marking
 	     sequences.  */
+	  ++next;
 	}
       else
 	{
@@ -538,10 +554,12 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
 	      size_t bytes_read = apply_ansi_escape (w, string - 1);
 	      if (bytes_read > 0)
 		{
-		  string = string + bytes_read - 1;
+		  next = next + bytes_read - 1;
 		  continue;
 		}
 	    }
+	  else
+	    next++;
 	  do_tui_putc (w, c);
 
 	  if (height != nullptr)
@@ -552,6 +570,8 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
 	      prev_col = col;
 	    }
 	}
+
+      string = next;
     }
   if (TUI_CMD_WIN != nullptr && w == TUI_CMD_WIN->handle.get ())
     update_cmdwin_start_line ();
...
I can see this that the behaviour is now correct:
...
└────────────────────────────────────────────────────────────────┘
None No process In:                                   ??   PC: ?? 
/data/vries/gdb <no frame>:<no attribute num on current thread>
❯
...

I'm not sure yet if this is a proper fix, I suspect that'll involve accumulating using mbrtowc or some such.

Comment 4 Tom de Vries 2023-05-24 17:56:26 UTC

Created attachment 14907 [details]
Tentative patch

Comment 5 Tom de Vries 2023-05-26 13:26:50 UTC

Submitted patch: https://sourceware.org/pipermail/gdb-patches/2023-May/199880.html