[newlib-cygwin] Cygwin: console: Handle Unicode surrogate pairs.

Takashi Yano tyan0@sourceware.org
Tue Nov 16 14:23:00 GMT 2021


https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=782aac590af7f065877168848d5fbb20535bfcf9

commit 782aac590af7f065877168848d5fbb20535bfcf9
Author: Johannes Schindelin <johannes.schindelin@gmx.de>
Date:   Tue Nov 16 11:26:10 2021 +0100

    Cygwin: console: Handle Unicode surrogate pairs.
    
    When running Cygwin's Bash in the Windows Terminal (see
    https://docs.microsoft.com/en-us/windows/terminal/ for details), Cygwin
    is receiving keyboard input in the form of UTF-16 characters.
    
    UTF-16 has that awkward challenge that it cannot map the full Unicode
    range, and to make up for it, there are the ranges U+D800-U+DBFF and
    U+DC00-U+DFFF which are illegal except when they come in a pair encoding
    for Unicode characters beyond U+FFFF.
    
    Cygwin does not handle such surrogate pairs correctly at the moment, as
    can be seen e.g. when running Cygwin's Bash in the Windows Terminal and
    then inserting an emoji (e.g. via Windows + <dot>, which opens an emoji
    picker on recent Windows versions): Instead of showing an emoji, this
    shows the infamous question mark in a black triangle, i.e. the invalid
    Unicode character.
    
    Let's special-case surrogate pairs in this scenario.
    
    This fixes https://github.com/git-for-windows/git/issues/3281
    
    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Diff:
---
 winsup/cygwin/fhandler_console.cc | 17 ++++++++++++++++-
 winsup/cygwin/release/3.3.3       |  6 ++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc
index 0501b36fa..f4241ee82 100644
--- a/winsup/cygwin/fhandler_console.cc
+++ b/winsup/cygwin/fhandler_console.cc
@@ -919,7 +919,22 @@ fhandler_console::process_input_message (void)
 	    }
 	  else
 	    {
-	      nread = con.con_to_str (tmp + 1, 59, unicode_char);
+	      WCHAR second = unicode_char >= 0xd800 && unicode_char <= 0xdbff
+		  && i + 1 < total_read ?
+		  input_rec[i + 1].Event.KeyEvent.uChar.UnicodeChar : 0;
+
+	      if (second < 0xdc00 || second > 0xdfff)
+		{
+		  nread = con.con_to_str (tmp + 1, 59, unicode_char);
+		}
+	      else
+		{
+		  /* handle surrogate pairs */
+		  WCHAR pair[2] = { unicode_char, second };
+		  nread = sys_wcstombs (tmp + 1, 59, pair, 2);
+		  i++;
+		}
+
 	      /* Determine if the keystroke is modified by META.  The tricky
 		 part is to distinguish whether the right Alt key should be
 		 recognized as Alt, or as AltGr. */
diff --git a/winsup/cygwin/release/3.3.3 b/winsup/cygwin/release/3.3.3
index 1eb25e2fc..c1e8cefbd 100644
--- a/winsup/cygwin/release/3.3.3
+++ b/winsup/cygwin/release/3.3.3
@@ -16,3 +16,9 @@ Bug Fixes
 - Fix long-standing problem that new files don't get created with the
   FILE_ATTRIBUTE_ARCHIVE DOS attribute set.
   Addresses: https://cygwin.com/pipermail/cygwin/2021-November/249909.html
+
+- Handle Unicode surrogate pairs in console. Cygwin console does not
+  handle surrogate pairs correctly at the moment.  Fix issue that
+  running bash in Windows Terminal and inserting an emoji does not
+  work as expected.
+  Addresses: https://github.com/git-for-windows/git/issues/3281


More information about the Cygwin-cvs mailing list