This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 16-bit wchar_t on Windows and Cygwin

On 2 February 2011 16:35, Corinna Vinschen wrote:
> On Feb Â2 17:28, Corinna Vinschen wrote:
>> On Feb Â2 17:02, Bruno Haible wrote:
>> > But if you say that the application should convert UTF-16 surrogates
>> > to UTF-32 before calling iswalpha: That's certainly a requirement
>> > for Cygwin 1.7.x application that want to support the entire Unicode
>> > character set. But it's outside of POSIX, and many GNU programs will
>> > not want to include this added complexity. Just try to apply this
>> > suggestion to gnulib's quotearg.c, then estimate the time someone
>> > would need to apply it also to regcomp.c, strftime.c, mbscasestr.c,
>> > coreutils/src/wc.c, and so on.
>> Cygwin's regcomp is taken from FreeBSD and is UTF-16 capable, including
>> surrogate handling. ÂIt only required two changes in the code.
> Btw., I would be sure glad if Cygwin would use a wchar_t of 4 bytes as
> well. ÂThe problem is that this requires too many changes at once to
> work right, and it would introduce a lot of backward compatibility
> problems which would have to be handled.

Cygwin 1.7 might have been a good point for that change, because the
lack of proper locale and charset support in previous versions meant
that backward compatibility was much less of a concern than it is now.
But it's a difficult change indeed, and it's not entirely clear that
it's worthwhile. I guess 64-bit Cygwin (if or when it happens) might
be the next opportunity.

> If only the one's who decided that wchar_t in Cygwin should have the
> same size as WCHAR_T in the underlying Windows would have thought twice
> about the implications...

Windows Unicode support was introduced with Windows NT in 1993,
whereas Unicode was only extended beyond 16 bits with version 2.0 in
1996. Cygwin was first released the year before. If the Unicode
extension was a consideration at all (which I'd doubt), wchar_t !=
WCHAR probably seemed far more daunting than having to deal with
surrogates at some point down the line.


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]