This is the mail archive of the
mailing list for the Cygwin project.
Re: Bug in collation functions?
- From: Ken Brown <kbrown at cornell dot edu>
- To: cygwin at cygwin dot com
- Date: Sat, 31 Oct 2015 13:13:30 -0400
- Subject: Re: Bug in collation functions?
- Authentication-results: sourceware.org; auth=none
- References: <20151029075050 dot GE5319 at calimero dot vinschen dot de> <20151029083057 dot GH5319 at calimero dot vinschen dot de> <56321815 dot 7000203 at cornell dot edu> <20151029153516 dot GJ5319 at calimero dot vinschen dot de> <56323F2E dot 4030807 at cornell dot edu> <56324598 dot 9060604 at cornell dot edu> <56324E82 dot 7000402 at redhat dot com> <563268A4 dot 6000005 at cornell dot edu> <56329462 dot 2090206 at cornell dot edu> <56329BE8 dot 808 at cornell dot edu> <20151030120320 dot GO5319 at calimero dot vinschen dot de> <56337996 dot 2000400 at cornell dot edu>
On 10/30/2015 10:07 AM, Ken Brown wrote:
On 10/30/2015 8:03 AM, Corinna Vinschen wrote:
On Oct 29 18:21, Ken Brown wrote:
The fallback I had in mind is to return the shorter string if they have
different lengths and otherwise to revert to wcscmp.
I had a longer look into this suggestion and the below code and it took
me some time to find out what bugged me with it:
What about str/wcsxfrm?
Per POSIX, calling strcmp on the result of strxfrm is equivalent to
calling strcoll (analogue with wcs*). If you extend *coll to perform an
extra check on the length, you will have cases in which the above rule
fails. You can't perform the length test on the result of *xfrm and
expect the same result as in *coll.
In fact, when calling LCMapStringW with NORM_IGNORESYMOLS (you would
have to do this anyway if we add this flag in *coll), the resulting
transformed strings created from the input strings "11" and "1.1" would
be identical, so a length test on the xfrm string is not meaningful at
The bottom line is, afaics, we must make sure that CompareStringW and
LCMapStringW are called the same way, and their result/output has to be
returned to the caller. Performing an extra check in *coll which can't
be reliably performed in *xfrm is not feasible.
Does that make sense?
Yes, I see the problem, and I don't see a good way around it. So I
think we probably have to leave things as they are and live with the
fact that we can't do comparisons that ignore whitespace and punctuation.
The alternative of allowing str/wcscoll to return 0 on unequal strings
doesn't seem feasible in view of Eric's comments.
I have one other idea. What would you think of defining a function
cygwin_strcoll that's like strcoll but with an extra bool parameter
'ignoresymbols'? If ignoresymbols = false, this would be the same as
strcoll. If ignoresymbols = true, this would use NORM_IGNORESYMBOLS
with the fallback I suggested.
That way applications that prefer to be more glibc-compatible and don't
need strxfrm could do something like
#define strcoll(A,B) cygwin_strcoll ((A), (B), true)
If you think this is reasonable, I'll submit a patch. If not, no problem.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple