This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


On Sep 25 06:40, Andy Koppe wrote:
> 2009/9/24 Corinna Vinschen:
> > Note that this affects all strings used in Cygwin internally, not only
> > filenames. ?User and group names, environment strings, ...
> 
> I hadn't thought of that, but yep, it's the logical conclusion. I
> think the charset specified via setlocale(LC_CTYPE,...) should only
> affect what's specified in POSIX, i.e. the ctype stuff itself,
> multibyte conversion functions, wchar I/O, and anything I'm
> forgetting.

The problem is that some of these strings are fetch only once, when
the first Cygwin process starts.  The user name, for instance, which is
inherited by child processes.  The CWD is also typically only fetched
once, when the first process starts up or when a process calls chdir.
Then it's inherited by child processes as is.  Right now setlocale
will change the CWD according to the locale settings, but this seems
wrong.

> > If an application switches to another locale, all the names internally
> > stored are not switched as well. ?So they are potentially wrong after
> > a setlocale.
> 
> The important thing is that file names, user names, and env variables
> are represented by the same byte sequences throughout the life of a
> program. Determining their translation at program startup ensures
> that.

But that's not the only important point.  If two applications having
different locales talk to each other, they have different ideas of the
filenames.  Even their CWD could look different, even though it's
actually the same.

> Now, if an application calls setlocale with a charset other than
> what's set in the environment and it interprets the byte sequences
> according to that charset, then that's its own responsibility. This
> would go awry on Linux too, e.g. if a system is set up with
> LANG=en.ISO-8859-1, apps shouldn't try to?display filenames as UTF-8.

Yeah, but it's only a problem displaying data on the screen.  How bad is
that?  As far as the actual filenames/usernames/wossnames are concerned,
the applications all see the same thing, since the strings are stored as
byte streams, not as UTF-16 values which are subject to different
interpretations in different multibyte environments.

Given that, it seems to me that the best approach is to stick to one
single representation of system object names throughout the lifetime
of a process tree.

In contrast, the console chrset is a rather minor problem and it could
be even funny to change that at will.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]