raise(-1) has stopped returning an error recently
Corinna Vinschen
corinna-cygwin@cygwin.com
Thu Nov 25 12:54:42 GMT 2021
On Nov 24 11:01, Brian Inglis via Cygwin wrote:
> On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote:
> > > On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote:
> > > > Do Cygwin and/or Windows support surrogate pairs in UTF-8?
> >
> > You mean UTF-16. UTF-8 doesn't know surrogate pairs, UTF-16 does.
> > Originally there was UCS-2, 16 bits, with only 65536 code points.
> > However, Unicode left the BMP already with version 2.0 in 1996, so
> > UTF-16 and surrogate pairs became necessary. Windows as well as Cygwin
> > support them.
>
> How does Cygwin support UTF-16 locales with surrogate pairs?
UTF-16 locales? There's no such thing. UTF-16 is just the 16 bit
representation for Unicode, and as such, is independent of the locale.
On the user side, Cygwin only supports UTF-8 as Unicode representation.
Internally you can then convert them to wchar_t which is UTF-16.
> Are they the "native" locales inherited from Windows if others are not
> specified e.g. UTF-8, some OEM SBCS or MBCS?
Just try `locale -av' and you'll see all supported locales and their
respective default codeset. All of them can be used with .utf8
specifier to use UTF-8 instead of the default codeset. Some of them
use UTF-8 as default codeset anyway, e. g., fa_IR or yo_NG.
> > > There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess
> > > surrogate pairs in UTF-8 "mostly work".
> >
> > UTF-16. The surrogate stuff is evil at times. Have a look at the
> > __utf8_wctomb function in
> > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c
> > Lone surrogate halfs in an input stream are a problem, for instance.
>
> Thus the confusion with grep surrogate pair tests which appear to be running
> under a UTF-8 locale: see attached surrogate pair extract from cygport
> --debug grep.cygport check.
An STC in plain C might be helpful.
Corinna
More information about the Cygwin
mailing list