<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://sourceware.org/bugzilla/page.cgi?id=bugzilla.dtd">

<bugzilla version="4.4+"
          urlbase="http://sourceware.org/bugzilla/"
          
          maintainer="overseers@sourceware.org"
>

    <bug>
          <bug_id>4403</bug_id>
          
          <creation_ts>2007-04-21 22:36:00 +0000</creation_ts>
          <short_desc>strfry() gives skewed distributions</short_desc>
          <delta_ts>2012-05-06 18:56:55 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>glibc</product>
          <component>libc</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>REOPENED</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Aurelien Jarno">aurelien</reporter>
          <assigned_to name="Not yet assigned to anyone">unassigned</assigned_to>
          <cc>awreece</cc>
    
    <cc>glibc-bugs</cc>
    
    <cc>kirill</cc>
    
    <cc>slashtrooll</cc>
          <cf_gcchost>x86_64-unknown-linux-gnu</cf_gcchost>
          <cf_gcctarget>x86_64-unknown-linux-gnu</cf_gcctarget>
          <cf_gccbuild>x86_64-unknown-linux-gnu</cf_gccbuild>
          <cf_reconfirmed_on>2012-05-06 00:00:00</cf_reconfirmed_on>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>16519</commentid>
    <comment_count>0</comment_count>
    <who name="Aurelien Jarno">aurelien</who>
    <bug_when>2007-04-21 22:36:13 +0000</bug_when>
    <thetext>strfry() tries to shuffle its string using random swaps, but it uses the
wrong strategy, and thus not all permutations are equally likely. The
code doing the shuffling itself looks like this:

  len = strlen (string);
  for (i = 0; i &lt; len; ++i)
    {
      int32_t j;
      char c;

      __random_r (&amp;rdata, &amp;j);
      j %= len;

      c = string[i];
      string[i] = string[j];
      string[j] = c;
    }

In other words, for the string &apos;abc&apos; j will always be between 0 and 2
(inclusive), giving the following possibilities (all equally likely):

  j0 j1 j2  result
   0  0  0  cab
   0  0  1  bca
   0  0  2  bac
   0  1  0  cba
   0  1  1  acb
   0  1  2  abc
   0  2  0  bca
   0  2  1  abc
   0  2  2  acb
   1  0  0  cba
   1  0  1  acb
   1  0  2  abc
   1  1  0  cab
   1  1  1  bca
   1  1  2  bac
   1  2  0  acb
   1  2  1  bac
   1  2  2  bca
   2  0  0  acb
   2  0  1  bac
   2  0  2  bca
   2  1  0  abc
   2  1  1  cab
   2  1  2  cba
   2  2  0  bac
   2  2  1  cba
   2  2  2  cab

Sorting and counting gives us the following distribution:

   4  abc
   5  acb
   5  bac
   5  bca
   4  cab
   4  cba

In other words, this is clearly skewed; some strings will appear 25% more often
than others.

Steinar H. Gunderson &lt;sgunderson@bigfoot.com&gt; proposed the following patch:

--- string/strfry.c     2007-04-21 23:12:47.000000000 +0200
+++ string/strfry.c     2007-04-21 23:22:46.000000000 +0200
@@ -38,17 +38,17 @@
     }

   len = strlen (string);
-  for (i = 0; i &lt; len; ++i)
+  for (i = 0; i &lt; len - 1; ++i)
     {
       int32_t j;
       char c;

       __random_r (&amp;rdata, &amp;j);
-      j %= len;
+      j %= (len - i);

       c = string[i];
-      string[i] = string[j];
-      string[j] = c;
+      string[i] = string[j + i];
+      string[j + i] = c;
     }

   return string;

It turns strfry() into a proper Fisher-Yates shuffle. This gives exactly
n! paths for a string with N characters, and since there are n! possible
permutations, this means a one-to-one mapping, and all possibilities are
equally likely.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16823</commentid>
    <comment_count>1</comment_count>
    <who name="Ulrich Drepper">drepper.fsp</who>
    <bug_when>2007-05-08 04:28:42 +0000</bug_when>
    <thetext>This function is a joke.  Don&apos;t you have better things to do?  It&apos;s not worth
arguing about and so to safe me time I added a modifeed versson of the patch.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17084</commentid>
    <comment_count>2</comment_count>
    <who name="Aurelien Jarno">aurelien</who>
    <bug_when>2007-05-20 21:48:50 +0000</bug_when>
    <thetext>Your version is still not right, as now some strings could not appear anymore.

For example for the string &quot;abc&quot; the strings &quot;abc&quot; and &quot;acb&quot; could never 
appear.

The version already attached to this bug report returns all cases with a 
correct distribution.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>18729</commentid>
    <comment_count>3</comment_count>
    <who name="Ulrich Drepper">drepper.fsp</who>
    <bug_when>2007-08-22 07:29:53 +0000</bug_when>
    <thetext>Stop wasting people&apos;s time!  Nobody cares about this crap.  I made a last change
but it&apos;s just too embarasing to even admit that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36094</commentid>
    <comment_count>4</comment_count>
    <who name="Evan Carroll">me</who>
    <bug_when>2009-05-09 17:00:37 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Stop wasting people&apos;s time!  Nobody cares about this crap.  I made a last change
&gt; but it&apos;s just too embarasing to even admit that.

Refusing someone else&apos;s better free code without reason? You&apos;re a dick. You--</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36100</commentid>
    <comment_count>5</comment_count>
    <who name="Petr Baudis">pasky</who>
    <bug_when>2009-05-10 08:03:52 +0000</bug_when>
    <thetext>Please do not reopen bugs spuriously, thank you.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>45120</commentid>
    <comment_count>6</comment_count>
    <who name="Reuben Thomas">rrt</who>
    <bug_when>2010-09-16 12:05:23 +0000</bug_when>
    <thetext>This bug is apparently still not fixed. See:

http://www.eglibc.org/archives/patches/msg00646.html

which provides a patch.

Ulrich wrote, earlier in the thread:

&gt; This function is a joke.  Don&apos;t you have better things to do?

The problem is that the function is not marked as a joke, and only its name (and
some elements of the description) is actually funny. The task that is does is
something reasonable to want to do, and it seems reasonably simple to provide a
good implementation, so why not? glibc is a high quality product, so let&apos;s make
the jokes high quality too!

If on the other hand the glibc maintainers do not consider this function to be
worth maintaining to the high standards of glibc, then please either remove it,
or remove its documentation, or mark it as &quot;obsolete, do not use&quot;, which should
stop people both from using it and from reporting bugs.

Without any such action, it seems unreasonable to say that bug reports about
this function are wasting the maintainers&apos; time.

memfrob is a much better example of how to make this type of joke: its much
simpler algorithm is obviously correct, as it has no statistical complexities,
and its man page explains that it&apos;s useless for encryption. Hence, no-one is
going to complain that it&apos;s bad for encryption, and no-one is going to find
implementation bugs.

In a library as high-profile and important as glibc, bad jokes are going to come
back and bite you just like other bad design decisions. If the maintainers don&apos;t
understand this, the joke, if there is one here, is very much on them.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54321</commentid>
    <comment_count>7</comment_count>
    <who name="My small pet dwarf">slashtrooll</who>
    <bug_when>2012-03-31 21:05:54 +0000</bug_when>
    <thetext>Just a small dwarf wishing you a happy April 1st!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55020</commentid>
    <comment_count>8</comment_count>
    <who name="Andreas Jaeger">aj</who>
    <bug_when>2012-05-06 18:56:55 +0000</bug_when>
    <thetext>Still a issue in current git.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>