This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug in collation functions?

On Oct 29 08:59, Ken Brown wrote:
> On 10/29/2015 4:30 AM, Corinna Vinschen wrote:
> >On Oct 29 08:50, Corinna Vinschen wrote:
> >>On Oct 28 21:58, Eric Blake wrote:
> >>>On 10/28/2015 04:14 PM, Ken Brown wrote:
> >>>>It's my understanding that collation is supposed to take whitespace and
> >>>>punctuation into account in the POSIX locale but not in other locales.
> >>>
> >>>Not quite right. It is up to the locale definition whether whitespace
> >>>affects collation.  But you are correct that in the POSIX locale,
> >>>whitespace must not be ignored in collation.
> >>>
> >>>>This doesn't seem to be the case on Cygwin.  Here's a test case using
> >>>>wcscoll, but the same problem occurs with strcoll.
> >>>
> >>>That's because the locale definitions are different in cygwin than they
> >>>are in glibc.  But it is not a bug in Cygwin; POSIX allows for different
> >>>systems to have different locale definitions while still using the same
> >>>locale name like en_US.UTF-8.
> >>
> >>Btw, strcoll and wcscoll in Cygwin are implemented using the Windows
> >>function CompareStringW with the LCID set to the locale matching the
> >>POSIX locale setting.  I'm rather glad I didn't have to implement this
> >>by myself... :}
> >
> >OTOH, CompareString has a couple of flags to control its behaviour, see
> >
> >
> >Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but there
> >are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS.  I'm open to a
> >discussion how to change the settings to more closely resemble the rules
> >on Linux.
> >
> >E.g.  wcscoll simply calls wcscmp rather than CompareStringW for the
> >C/POSIX locale anyway.  So, would it makes sense to set the flags to
> >NORM_IGNORESYMBOLS in other locales?
> I think so.  That's what the native Windows build of emacs does in this
> situation.

Is that all it's doing?  I'm asking because using NORM_IGNORESYMBOLS
does not exaclty resemble the behaviour on Linux on my W10 box:

    "11" > "1.1" in POSIX locale
!!! "11" > "1.1" in en_US.UTF-8 locale
    "11" > "1 2" in POSIX locale
    "11" < "1 2" in en_US.UTF-8 locale


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgpRH6Qh_jjBz.pgp
Description: PGP signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]