cygwin_conv_path POSIX->WIN_A conversion

Andy Koppe andy.koppe@gmail.com
Fri Nov 18 10:57:00 GMT 2011


On 14 November 2011 10:33, Corinna Vinschen wrote:
> On Nov 11 09:27, Eric Blake wrote:
>> On 11/11/2011 07:46 AM, Corinna Vinschen wrote:
>> > So I was wondering if the CCP_POSIX_TO_WIN_A function shouldn't be
>> > changed so that it converts the pathname to the current ANSI or OEM
>> > charset instead, depending on the value returned by the AreFileApisANSI
>> > function.
>>
>> Yes, that sounds right to me,
>>
>> >
>> > I think this would be more correct than converting to the current Cygwin
>> > multibyte charset.  The downside is, that this *might* break backward
>> > compatibility.  However, if an application converts a Cygwin POSIX path
>> > to a native Windows multibyte path, isn't it always for the sake of
>> > calling a Win32 ANSI function or to submit the path to a native Windows
>> > application?
>>
>> Precisely for this reason - the only sane reason to convert to native is
>> to use the resulting string in native calls.
>
> I'm just worried that this would open a can of worms.
>
> If CCP_POSIX_TO_WIN_A always converts to ANSI/OEM, shouldn't
> CCP_WIN_A_TO_POSIX always convert from ANSI/OEM?

Yes, I think so.

> However, if the DOS
> path has been entered on the Cygwin command line, it will very likely
> not be given in the current ANSI/OEM CP, but rather in the Cygwin
> charset.

A program that assumes something other than the Cygwin charset for
command line arguments is buggy.

Having said that, I assume the concern here is about pre-1.7 programs,
where assuming the ANSI/OEM codepage for command line arguments would
have been reasonable. However, such programs won't actually be using
CCP_POSIX_TO_WIN_A and CCP_WIN_A_TO_POSIX, since those were only
introduced with 1.7. Instead, they'll be using the deprecated
cygwin_conv_to_posix_path() and its relatives.

I understand those currently do the same as their cygwin_conv_path_t
equivalents, but that doesn't have to be that way. So how about if
those legacy functions keep current behaviour in an attempt to
maximise backward compatibility, whereas CCP_POSIX_TO_WIN_A  and
CCP_WIN_A_TO_POSIX are changed to do what they say they do?


> Having said that, I'm wondering if we shouldn't leave the current
> conversion alone and rather add new flags to cygwin_conv_path, so that
> the *caller* can specify whether the conversion should be done using the
> Cygwin or the Windows multibyte charset, or always UTF-8.  Something
> along these lines:
>
>  CCP_CYGWIN_CODESET = 0,       <-- Do you have a better idea?
>  CCP_WIN32_ANSI_CP  = 0x10,
>  CCP_WIN32_OEM_CP   = 0x20,
>  CCP_UTF8_CODESET   = 0x30,

It's a possibility, but I find it a bit confusing and unnecessary.
Windows paths can already be converted to/from any required codeset by
going via the wide (i.e. WIN_W) version of the path and converting
with the appropriate choice of
MultiByteToWideChar/WideCharToMultiByte/mbstowcs/wcstombs.

Andy



More information about the Cygwin-developers mailing list