[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Johannes Schindelin Johannes.Schindelin@gmx.de
Mon Sep 7 21:08:03 GMT 2020


Hi,

On Mon, 7 Sep 2020, Takashi Yano via Cygwin-patches wrote:

> On Mon, 7 Sep 2020 10:26:33 +0200
> Corinna Vinschen wrote:
> > Hi Takashi,
> > On Sep  5 17:43, Takashi Yano via Cygwin-patches wrote:
> > > On Fri, 4 Sep 2020 21:22:35 +0200
> > > Corinna Vinschen wrote:
> > >
> > > > Btw., the main loop in
> > > > fhandler_pty_master::pty_master_fwd_thread() calls
> > > >
> > > >   char *buf = convert_mb_str (cygheap->locale.term_code_page,
> > > >                               &nlen, CP_UTF8, ptr, wlen);
> > > >                                      ^^^^^^^
> > > >   [...]
> > > >   WriteFile (to_master_cyg, ...
> > > >
> > > > But then, after the code breaks from that loop, it calls
> > > >
> > > >   char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
> > > >                               GetConsoleOutputCP (), ptr, wlen);
> > > >                               ^^^^^^^^^^^^^^^^^^^^^
> > > >   [...]
> > > >   process_opost_output (to_master_cyg, ...
> > > >
> > > > process_opost_output then calls WriteFile on that to_master_cyg handle,
> > > > just like the WriteFile call above.
> > > >
> > > > Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
> > > > well?
> > >
> > > That is correct. The first conversion is for the case that pseudo
> > > console is enabled, and the second one is for the case that pseudo
> > > console is disabled.
> > >
> > > Pseudo console converts charset from console code page to UTF-8.
> > > Therefore, data read from from_slave is always UTF-8 when pseudo
> > > console is enabled. Moreover, OPOST processing is done in pseudo
> > > console, so write data simply by WriteFile() is enough.
> > >
> > > If pseudo console is disabled, cmd.exe and so on uses console
> > > code page, so the code page of data read from from_slave is
> > > GetConsoleOutputCP(). In this case, OPOST processing is necessary.
> >
> > This is really confusing me.  We never set the console codepage in the
> > old pty code before, it was just pipes transmitting bytes.  Why do we
> > suddenly have to handle native apps running in a console in this case?!?
>
> This is actually not related to pseudo console. In Japanese environment,
> cmd.exe output CP932 string by default. This caused gabled output in old
> cygwin such as 3.0.7. The code for the case that pseudo console is
> disabled is to fix this.

It is related to Pseudo Console insofar as it was slipped in as part of
the Pseudo Console patches.

And what Takashi reports as a bug fix is the underlying reason for the
tickets in MSYS2 (and elsewhere) that I mentioned.

In fact, I even suggested in
https://github.com/msys2/MSYS2-packages/issues/1974#issuecomment-685475967
to revert that change.

What Takashi describes as "correct behavior" unfortunately seems not to be
very common in practice, which is why I contend that from the users' point
of view, it could not matter less whether the console applications are
"correct" or not. From the point of view of users who have their `LANG`
set to something like `en_US.UTF-8`, the encoding was correct before, and
now it is no longer correct. And _that_ is the correctness users actually
care about.

Ciao,
Dscho


More information about the Cygwin-patches mailing list