strfry() tries to shuffle its string using random swaps, but it uses the wrong strategy, and thus not all permutations are equally likely. The code doing the shuffling itself looks like this: len = strlen (string); for (i = 0; i < len; ++i) { int32_t j; char c; __random_r (&rdata, &j); j %= len; c = string[i]; string[i] = string[j]; string[j] = c; } In other words, for the string 'abc' j will always be between 0 and 2 (inclusive), giving the following possibilities (all equally likely): j0 j1 j2 result 0 0 0 cab 0 0 1 bca 0 0 2 bac 0 1 0 cba 0 1 1 acb 0 1 2 abc 0 2 0 bca 0 2 1 abc 0 2 2 acb 1 0 0 cba 1 0 1 acb 1 0 2 abc 1 1 0 cab 1 1 1 bca 1 1 2 bac 1 2 0 acb 1 2 1 bac 1 2 2 bca 2 0 0 acb 2 0 1 bac 2 0 2 bca 2 1 0 abc 2 1 1 cab 2 1 2 cba 2 2 0 bac 2 2 1 cba 2 2 2 cab Sorting and counting gives us the following distribution: 4 abc 5 acb 5 bac 5 bca 4 cab 4 cba In other words, this is clearly skewed; some strings will appear 25% more often than others. Steinar H. Gunderson <sgunderson@bigfoot.com> proposed the following patch: --- string/strfry.c 2007-04-21 23:12:47.000000000 +0200 +++ string/strfry.c 2007-04-21 23:22:46.000000000 +0200 @@ -38,17 +38,17 @@ } len = strlen (string); - for (i = 0; i < len; ++i) + for (i = 0; i < len - 1; ++i) { int32_t j; char c; __random_r (&rdata, &j); - j %= len; + j %= (len - i); c = string[i]; - string[i] = string[j]; - string[j] = c; + string[i] = string[j + i]; + string[j + i] = c; } return string; It turns strfry() into a proper Fisher-Yates shuffle. This gives exactly n! paths for a string with N characters, and since there are n! possible permutations, this means a one-to-one mapping, and all possibilities are equally likely.
This function is a joke. Don't you have better things to do? It's not worth arguing about and so to safe me time I added a modifeed versson of the patch.
Your version is still not right, as now some strings could not appear anymore. For example for the string "abc" the strings "abc" and "acb" could never appear. The version already attached to this bug report returns all cases with a correct distribution.
Stop wasting people's time! Nobody cares about this crap. I made a last change but it's just too embarasing to even admit that.
(In reply to comment #3) > Stop wasting people's time! Nobody cares about this crap. I made a last change > but it's just too embarasing to even admit that. Refusing someone else's better free code without reason? You're a dick. You--
Please do not reopen bugs spuriously, thank you.
This bug is apparently still not fixed. See: http://www.eglibc.org/archives/patches/msg00646.html which provides a patch. Ulrich wrote, earlier in the thread: > This function is a joke. Don't you have better things to do? The problem is that the function is not marked as a joke, and only its name (and some elements of the description) is actually funny. The task that is does is something reasonable to want to do, and it seems reasonably simple to provide a good implementation, so why not? glibc is a high quality product, so let's make the jokes high quality too! If on the other hand the glibc maintainers do not consider this function to be worth maintaining to the high standards of glibc, then please either remove it, or remove its documentation, or mark it as "obsolete, do not use", which should stop people both from using it and from reporting bugs. Without any such action, it seems unreasonable to say that bug reports about this function are wasting the maintainers' time. memfrob is a much better example of how to make this type of joke: its much simpler algorithm is obviously correct, as it has no statistical complexities, and its man page explains that it's useless for encryption. Hence, no-one is going to complain that it's bad for encryption, and no-one is going to find implementation bugs. In a library as high-profile and important as glibc, bad jokes are going to come back and bite you just like other bad design decisions. If the maintainers don't understand this, the joke, if there is one here, is very much on them.
Just a small dwarf wishing you a happy April 1st!
Still a issue in current git.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
Well, nothing in the doco (http://www.gnu.org/software/libc/manual/html_node/strfry.html) for strfry() stated that all combinations would be equally likely. The only mention of uniform distribution was that it was used to do the swaps themselves, not that the results would be uniformly distributed. In any case, when I make stir fry, it's rarely distributed perfectly. Having said that, Ulrich probably spent more time complaining about the patch than it would have taken to just put the damn thing in :-) I have to go with Reuben here. If it's a joke, get rid of it, it has no place in a serious piece of software. If not, we should be willing to accept patches that make it better in some way, given any constraints from elsewhere (such as time needed by maintainers to do it).
Isn't this fixed ? Looking at current source it appears to have a proper Fisher-Yates shuffle for strfry now
The function Ravier mentions is at https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strfry.c;h=6092d4315842bd581d04a620f3621d0455929b2c;hb=HEAD. The commit responsible is written on 2007-05-08 by Ulrich Drepper. https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=string/strfry.c;h=a8b202d176885240d2d70d4247ae0c48b4ac51f2;hp=112dd4adb334a766f722ebca5fc52cd0e5a6dd93;hb=c306d807358c5ef73f7cd7d79e5f401c60892ca4;hpb=37f402350d24576e21f81a209e77705932f0181f I *believe*, from visual inspection, that Drepper's version is equivalent to Gunderson's patch: * the len - 1 is obvious * the j + i is rolled into the j assignment I cannot agree with Jarno's claim in comment 2 as a result. Of course just looking at code is very likely wrong, but at least Ravier agrees with me. PS: if we *really* want to burn some time, we can probably talk about modulo bias [rejection sampling] and how it does not handle strings more than 2^31 long. That would be silly though.
(In reply to Mingye Wang from comment #12) > I *believe*, from visual inspection, that Drepper's version is equivalent to > Gunderson's patch: > * the len - 1 is obvious > * the j + i is rolled into the j assignment > > I cannot agree with Jarno's claim in comment 2 as a result. Of course just > looking at code is very likely wrong, but at least Ravier agrees with me. Please do not rewrite history ! My comment applies to the version at the time of my comment, that is: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strfry.c;h=a8b202d176885240d2d70d4247ae0c48b4ac51f2;hb=c306d807358c5ef73f7cd7d79e5f401c60892ca4 Ulrich Drepper fixed the issue in that commit *after* my comment: https://sourceware.org/git/?p=glibc.git;a=commit;h=0008163a30fec920ede228aea2f6e06e75e76514
Closing, this has been fixed with this commit in glibc version 2.7: commit 0008163a30fec920ede228aea2f6e06e75e76514 Author: Ulrich Drepper <drepper@redhat.com> Date: Wed Aug 22 16:04:18 2007 +0000 * nis/nis_table.c (nis_list): Don't fail if __follow_path returned NIS_NOTFOUND.
The actual CVS commit was https://repo.or.cz/glibc/history.git/commit/dfd1fbd6e2 commit dfd1fbd6e2 Author: Ulrich Drepper <drepper@redhat.com> Date: Wed Aug 22 07:27:39 2007 +0000 (strfry): It's not even worth documenting...