Bug in collation functions?
Eric Blake
eblake@redhat.com
Thu Oct 29 18:09:00 GMT 2015
On 10/29/2015 10:13 AM, Ken Brown wrote:
> Never mind. My test case was flawed, because it didn't check for the
> possibility that wcscoll might return 0. Here's a revised definition of
> the "compare" function:
>
> void
> compare (const wchar_t *a, const wchar_t *b, const char *loc)
> {
> setlocale (LC_COLLATE, loc);
> int res = wcscoll (a, b);
> char c = res < 0 ? '<' : res > 0 ? '>' : '=';
> printf ("\"%ls\" %c \"%ls\" in %s locale\n", a, c, b, loc);
> }
>
> With this change (and the use of NORM_IGNORESYMBOLS) the test returns
> the following on Cygwin:
>
> $ ./wcscoll_test
> "11" > "1.1" in POSIX locale
> "11" = "1.1" in en_US.UTF-8 locale
> "11" > "1 2" in POSIX locale
> "11" < "1 2" in en_US.UTF-8 locale
>
> It still differs from Linux, but it's good enough to make the emacs test
> pass. Moreover, this behavior actually seems more reasonable to me than
> the Linux behavior. After all, if you're ignoring punctuation, how can
> you decide which of "11" or "1.1" comes first?
Careful. POSIX is proposing some wording that say that normal locales
should always implement a fallback of last resort (and that locales that
do not do so should have a special name including '@', to make it
obvious). It is not standardized yet, but worth thinking about.
http://austingroupbugs.net/view.php?id=938
http://austingroupbugs.net/view.php?id=963
The intent of that wording is that if ignoring punctuation could cause
two strings to otherwise compare equal, the fallback of a total ordering
on all characters means that the final result of strcoll() will not be 0
unless the two strings are identical.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 604 bytes
Desc: OpenPGP digital signature
URL: <http://cygwin.com/pipermail/cygwin/attachments/20151029/ef798567/attachment.sig>
More information about the Cygwin
mailing list