This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?

On Mar 19 21:11, Corinna Vinschen wrote:
> On Mar 19 19:41, Eric Blake wrote:
> > Corinna Vinschen <corinna-cygwin <at>> writes:
> > > ...unless Cygwin itself would call setlocale().
> > 
> > I'm not a fan of that.  POSIX is explicit that an application that 
> > intentionally avoids calling setlocale() shall behave as though it had called 
> > setlocale(LC_ALL,"C").
> > [...]
> But I admit that I'm not very happy with this idea either.  Still, we
> have to convert from MB to WC and vice-versa independently of the
> application, while other systems based on byte charsets simply don't
> have this problem.

Here's another idea:

If the codeset is not UTF-8, and if a filename contains wide chars not
representable in the current ANSI codeset, use the good old ASCII "SO/SI"

Example:  Assuming the ANSI codepage is CP1252.  Assuming the filename
is in UTF-16

All chars except for \x1234 are convertible to the current ANSI code
page.  The convertible chars are converted as usual.  The
non-convertible characters are converted to an ASCII SO/SI sequence:


On the way back, Cygwin converts SO/SI sequences back to their
UTF-16 counterpart and converts everything else using the current\
codepage to UTF-16 conversion.

This would allow to manipulate all files on the disk regardless of
using characters invalid in the current CP.

Does that solution make sense?


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]