[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Fri Sep 4 14:05:13 GMT 2020


On 2020-09-04 06:44, Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>>
>> On Thu, 3 Sep 2020 19:59:12 +0200
>> Corinna Vinschen wrote:
>>> The only idea I had so far was, changing the way __set_charset_from_locale
>>> works from within _setlocale_r:
>>>
>>> We could add a Cygwin-specific function only fetching the codepage and
>>> call it unconditionally from _setlocale_r.  __set_charset_from_locale is
>>> called with a new parameter "codepage", so it doesn't have to fetch the
>>> CP by itself, but it's still only called from _setlocale_r if necessary.
>>>
>>> Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
>>> could be done at the point the codepage is actually required.
>>
>> I think I have found the answer to your request.
>> Patch attached. What do you think of this patch?
>>
>> Calling initial_setlocale() is necessary because
>> nl_langinfo() always returns "ANSI_X3.4-1968"
>> regardless locale setting if this is not called.
> 
> Well, this is correct.  Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally.  Everything,
> including the terminal, is supposed to default to UTF-8.  That's the
> most sane default, even if an application is not locale-aware.
> 
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C".  That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
> 
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions.  What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
> 
> If we change that, setup_locale can simply go away.  Below is a patch
> doing just that.  Can you please check if that works in your test
> scenarios?
> 
> However, there's something which worries me.  Why do we need or set the
> pseudo terminal codepage at all?  I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes.  Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console?  That would simplify
> this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.

IIRC his locale was ja_JP.UTF-8 but he got English messages on CP 932!

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]


More information about the Cygwin-patches mailing list