This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Consistency between strxfrm and strcoll?


On 24 Mar 2016 01:35, Carlos O'Donell wrote:
> POSIX requires that strxfrm and strcoll produce consistent results.
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html
> ~~~
> The transformation is such that if strcmp() is applied to two transformed
> strings, it shall return a value greater than, equal to, or less than 0,
> corresponding to the result of strcoll() [CX] [Option Start]  or
> strcoll_l(), [Option End]  respectively, applied to the same two original
> strings [CX] [Option Start]  with the same locale. [Option End]
> ~~~
> 
> However, the program attached to this upstream Red Hat bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1320356
> 
> Shows that for some locales, for some randomly generated UTF-8 strings
> within the 11-bit 2-byte sequence (U+0080->U+07ff), you get inconsistent
> sortings.
> 
> Would it beneficial if we made our testing more robust and covered a
> broader more deterministic set of tests for sorting?
> 
> Our current scripts/sort-test.sh are pretty limited both in the languages
> they cover and the character set coverage for sorting.
> 
> Then we'd have to determine why strxfrm and strcoll return different answers.
> It's not entirely surprising given the algorithmic differences.

i wouldn't mind having a random fuzzer, and turning all the failures it
found into a static list of regression tests.  then the fuzzer we could
run by hand from time to time while the regression tests would be part
of our static test list.
-mike

Attachment: signature.asc
Description: Digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]