New implementation of pseudo console support (experimental)

Johannes Schindelin Johannes.Schindelin@gmx.de
Tue Sep 1 04:46:53 GMT 2020


Hi Corinna,

On Mon, 31 Aug 2020, Corinna Vinschen wrote:

> On Aug 31 21:17, Johannes Schindelin wrote:
> > [...]
> > So I had a look at the code, and it seems that
> > `fhandler_pty_slave::setup_locale()` forces the output encoding to
> > C.ASCII if Pseudo Console support is enabled:
> >
> >   char locale[ENCODING_LEN + 1] = "C";
> >   char charset[ENCODING_LEN + 1] = "ASCII";
> >   LCID lcid = get_langinfo (locale, charset);
> >
> >   /* Set console code page from locale */
> >   if (get_pseudo_console ())
> >     {
> >       UINT code_page;
> >       if (lcid == 0 || lcid == (LCID) -1)
> >         code_page = 20127; /* ASCII */
>
> This looks wrong, actually.  The default behaviour of Cygwin since
> Cygwin 1.7 was to assume UTF-8, even if the application doesn't call
> setlocale.  This means the locale is "C", so ASCII is expected.
> However, even in this case, the internal conversions use UTF-8.
> See function internal_setlocale() in nlsfuncs.cc, lines 1553/1554.
>
> We never switched the console codepage, though, because the codepage
> doesn't make much sense when using wide character functions only,
> i. e. WriteConsoleW.  Only the alternate charset is 437/ASCII.  So,
> if the pseudo console actually *requires* to set the charset...

Well, it is worse, as I have reported elsewhere in this thread. For some
reason (which was not answered yet, and which I am still very much
interested in knowing), the Console output code page is _still_ used
in `disable_pcon`.

That smells completely wrong. Why would the actual Console output encoding
be involved when Pseudo Console support is disabled, when it was not at
all used in v3.0.7 (which is supposedly using the same code paths that
`disable_pcon` is still expected to use)?

>
>
> >       else if (!GetLocaleInfo (lcid,
> >                                LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUMBER,
> >                                (char *) &code_page, sizeof (code_page)))
> >         code_page = 20127; /* ASCII */
> >       SetConsoleCP (code_page);
> >       SetConsoleOutputCP (code_page);
>
> can we please default to UTF-8 here even if the code page is ASCII?

Yes, please. In fact, I am tempted to do this:

-- snip --
diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 43eebc174..65b4d45fa 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -2867,7 +2867,16 @@ fhandler_pty_slave::setup_locale (void)
   char charset[ENCODING_LEN + 1] = "ASCII";
   LCID lcid = get_langinfo (locale, charset);

-  /* Set console code page form locale */
+  /* Special-case the UTF-8 character set */
+  if (strcasecmp (charset, "UTF-8") == 0)
+    {
+      get_ttyp ()->term_code_page = CP_UTF8;
+      SetConsoleCP (CP_UTF8);
+      SetConsoleOutputCP (CP_UTF8);
+      return;
+    }
+
+  /* Set console code page from locale */
   if (get_pseudo_console ())
     {
       UINT code_page;
-- snap --

The main reason why I am hesitating is that I smell a bigger problem here:
the mere fact that a code path that is supposed not to use Console
functions at all (`disable_pcon`) _does_ respect the output code page
indicates to me that that code path was changed in a totally unintended
way.

Ciao,
Johannes


More information about the Cygwin-developers mailing list