This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: regex library fails git tests

On Jul 22 11:17, Eric Blake wrote:
> On 07/22/2013 02:12 AM, Corinna Vinschen wrote:
> >>> However, please note that this behaviour, while being provided by glibc
> >>> and now by Cygwin, is *not* standards-compliant.  In the narrow sense
> >>> the characters beyond 0x7f are still invalid ASCII chars, and other
> >>> functions working with wchar_t strings won't be as forgiving when using
> >>> invalid input.
> >>>
> > After some sleep, I think I now understand why the glibc devs made
> > regcomp to work this way.  This behaviour is backward compatible to non
> > locale-aware applications.  In the "C" locale, a char is just some
> > arbitrary byte between 0 and 255.  So this pattern always worked before
> > in the "C locale, therefore it makes sense that it continues to work,
> > even if it won't when using other locales/codesets.
> By the way, there is currently a big debate going on in the Austin Group
> (the people responsible for POSIX) on whether the "C" locale must be
> 8-bit clean (the way glibc behaves) or whether it was intended to allow
> UTF-8 encoding by default (the way musl libc wants to behave); and
> resolution of the debate will require input from the C standards
> committee.  There may be some interesting fallout, no matter which
> solution is finally reached.

Thanks for letting us know.  This really may get interesting...


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]