Bug 15615 - Poor quality output from rand_r
Summary: Poor quality output from rand_r
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-12 23:39 UTC by Rich Felker
Modified: 2014-06-13 15:07 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
test program to generate data for analysis by dieharder (209 bytes, text/x-csrc)
2013-06-12 23:39 UTC, Rich Felker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2013-06-12 23:39:09 UTC
Created attachment 7075 [details]
test program to generate data for analysis by dieharder

Implementing a decent rand_r is very tricky because the interface requirement forces the full PRNG state to fit in 32 bits; this rules out pretty much all good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be.

glibc's rand_r is based on the LCG published in the C standard:

    next = next * 1103515245 + 12345;
    return next / 65536 % 32768;

This sample code in the standard assumes RAND_MAX == 32767. The lowest bit of the output has period 2^17 and the highest bit has period 2^31. As such, if only some of the bits are to be used, the highest bits should be kept and the lower bits discarded; this is the principle the sample code above is using for generating 15-bit output from a 32-bit internal state.

However, glibc is trying to generate 31-bit output from a 32-bit state. Just discarding the lowest bit would not be acceptable (the new lowest bit would have period 4), so something different needs to be done. glibc's approach is to advance the LCG 3 times (since 3 and 2^32 are relatively prime, this does not hurt the period of the internal state) and take 10-11 bits from each iteration to make a 31-bit output. Unfortunately, this approach has 2 serious flaws:

1. The 10-11 bits taken are bits 16-25 or 16-26, which have moderately low period, rather than bits 21-31 or 22-31 which would have maximal period. Thus, a lot of the potential period length is being thrown away.

2. Concatenating the bits from multiple steps, rather than using a single step of the LCG, introduces bias into the output. The frequency of each possible output is not the same; instead, some outputs occur much more often, and some not at all. Bias is especially noticable when the period is short, which it necessarily is with a 32-bit state, and which is made much worse by item 1 above.

If nothing else, item 1 should be fixed by adjusting the bitshifts to use the highest available bits, rather than the middle bits, to obtain the 3 10-11 bit values. This is straightforward and a pure improvement.

To fully fix rand_r, the approach of concatenating multiple iterations should be abandoned in favor of a single-LCG-iteration approach followed by an invertable transformation on the output. Obviously a 32-bit cryptographic block cipher would give the best statistical properties, but it would be slow. In musl, we are now using the "temper" function from mersenne twister. Another solution that seems to work is performing a mix of bit rotations, bswap, and multiplication by well-chosen odd numbers, applied to the output of the LCG.

I am attaching a test program for use with the dieharder PRNG test suite that demonstrates the low quality of glibc's rand_r. In particular, at least the OPSO, OQSO, and DNA tests show problems (tests 5, 6, and 7) and offhand it makes sense that this kind of bias would affect them. Note that, since dieharder requires 32-bit input as opposed to 31-bit input, I slightly modified the glibc algorithm to pull 11,11,10 bits instead of 11,10,10 bits, and included it in the test program rather than calling glibc's rand_r. Producing tests that work directly with glibc's rand_r to demonstrate the problem would be a lot more work.

To run the test, compile glibc_rand_r.c and pipe output to dieharder:

./glibc_rand_r | dieharder -a -g 200
Comment 1 Ondrej Bilka 2013-06-13 08:26:27 UTC
On Wed, Jun 12, 2013 at 11:39:09PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615
> 
>             Bug ID: 15615
>            Summary: Poor quality output from rand_r
>            Product: glibc
>            Version: unspecified
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: libc
>           Assignee: unassigned at sourceware dot org
>           Reporter: bugdal at aerifal dot cx
>                 CC: drepper.fsp at gmail dot com
> 
> Created attachment 7075 [details]
>   --> http://sourceware.org/bugzilla/attachment.cgi?id=7075&action=edit
> test program to generate data for analysis by dieharder
> 
> Implementing a decent rand_r is very tricky because the interface requirement
> forces the full PRNG state to fit in 32 bits; this rules out pretty much all
> good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be.
> 
> glibc's rand_r is based on the LCG published in the C standard:
> 
>     next = next * 1103515245 + 12345;
>     return next / 65536 % 32768;
>
A problem here is that for many users predictability is much more
important than quality. Developer expects that when he uses rand_r with
state that he controls will not vary. This can cause extra debbuging hastle when
code mysteriously fails on one machine but not other or desync issues.

> To fully fix rand_r, the approach of concatenating multiple iterations should
> be abandoned in favor of a single-LCG-iteration approach followed by an
> invertable transformation on the output. Obviously a 32-bit cryptographic block
> cipher would give the best statistical properties, but it would be slow. In

This is false, I have a replacement of this with four rounds of AES. On
intel using aesenc I performance is better than current, I did not
propose that due of problems above. I wrote a RFC for random
replacement on libc-alpha, browse archives.
Comment 2 Rich Felker 2013-06-13 12:38:42 UTC
On Thu, Jun 13, 2013 at 08:26:27AM +0000, neleai at seznam dot cz wrote:
> A problem here is that for many users predictability is much more
> important than quality. Developer expects that when he uses rand_r with
> state that he controls will not vary. This can cause extra debbuging hastle
> when
> code mysteriously fails on one machine but not other or desync issues.

Could you explain better what you're concerned about? By
"predictable", do you mean keeping the same sequence it's had in the
past? Aside from that, any PRNG with 32-bit state and 31-bit output is
equally "predictable".

> > To fully fix rand_r, the approach of concatenating multiple iterations should
> > be abandoned in favor of a single-LCG-iteration approach followed by an
> > invertable transformation on the output. Obviously a 32-bit cryptographic block
> > cipher would give the best statistical properties, but it would be slow. In
> 
> This is false, I have a replacement of this with four rounds of AES. On
> intel using aesenc I performance is better than current, I did not
> propose that due of problems above. I wrote a RFC for random
> replacement on libc-alpha, browse archives.

AES itself does not use 32-bit blocks, so you must be using a modified
version. Would you care to explain? I searched the archives but could
not find your post.
Comment 3 Ondrej Bilka 2013-06-14 12:10:59 UTC
On Thu, Jun 13, 2013 at 12:38:42PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615
> 
> --- Comment #2 from Rich Felker <bugdal at aerifal dot cx> ---
> On Thu, Jun 13, 2013 at 08:26:27AM +0000, neleai at seznam dot cz wrote:
> > A problem here is that for many users predictability is much more
> > important than quality. Developer expects that when he uses rand_r with
> > state that he controls will not vary. This can cause extra debbuging hastle
> > when
> > code mysteriously fails on one machine but not other or desync issues.
> 
> Could you explain better what you're concerned about? By
> "predictable", do you mean keeping the same sequence it's had in the
> past? Aside from that, any PRNG with 32-bit state and 31-bit output is
> equally "predictable".
> 
> > > To fully fix rand_r, the approach of concatenating multiple iterations should
> > > be abandoned in favor of a single-LCG-iteration approach followed by an
> > > invertable transformation on the output. Obviously a 32-bit cryptographic block
> > > cipher would give the best statistical properties, but it would be slow. In
> > 
> > This is false, I have a replacement of this with four rounds of AES. On
> > intel using aesenc I performance is better than current, I did not
> > propose that due of problems above. I wrote a RFC for random
> > replacement on libc-alpha, browse archives.
> 
> AES itself does not use 32-bit blocks, so you must be using a modified
> version. Would you care to explain? I searched the archives but could
> not find your post.
> 
Here, I wrote a version relevant to random. I did this to see how fast I
could get if I employ paralellism and inlining.

http://sourceware.org/ml/libc-help/2012-12/msg00005.html

To test rand_r equivalent I wrote a simple generator (which is for
mostly to test performance, I did not look for quality.)

  movd    (%rdi),%xmm0
  movdqa %xmm0,%xmm1

  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  movd %xmm1, (%rdi)
  movd %xmm1, %eax
  shr $1, %eax


On sandy bridge this code runs at half of speed of rand_r.
Comment 4 Rich Felker 2013-06-14 15:37:30 UTC
On Fri, Jun 14, 2013 at 12:10:59PM +0000, neleai at seznam dot cz wrote:
> To test rand_r equivalent I wrote a simple generator (which is for
> mostly to test performance, I did not look for quality.)
> 
>   movd    (%rdi),%xmm0
>   movdqa %xmm0,%xmm1
> 
>   aesenc %xmm0,%xmm1
>   aesenc %xmm0,%xmm1
>   aesenc %xmm0,%xmm1
>   aesenc %xmm0,%xmm1
>   movd %xmm1, (%rdi)
>   movd %xmm1, %eax
>   shr $1, %eax

There's no reason to believe this code will have acceptable period or
be unbiased. Instead of storing the AES result back to the state, you
should simply increment the state value (or advance it via a LCG). In
other words, low-period PRNG using a cryptographic block cipher must
use it in CTR mode unless the cipher itself has proper period when
composed with itself (which is extremely unlikely but easily testable
when the period is bounded by 2^32).

In any case, I think the extreme low quality of rand_r qualifies as a
bug. I'm not partial to any particular fix, but any fix should have:

- maximal possible period given the constraint of 32-bit state, i.e.
  period of 2^32.

- no bias (equal frequency of all outputs)

- minimal/no statistical flaws other than those mandated by the
  constraint of short period (which in turn comes from the constraint
  of 32-bit state).
Comment 5 Ondrej Bilka 2013-06-25 06:58:21 UTC
On Fri, Jun 14, 2013 at 03:37:30PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615
> 
> --- Comment #4 from Rich Felker <bugdal at aerifal dot cx> ---
> On Fri, Jun 14, 2013 at 12:10:59PM +0000, neleai at seznam dot cz wrote:
> > To test rand_r equivalent I wrote a simple generator (which is for
> > mostly to test performance, I did not look for quality.)
> > 
> >   movd    (%rdi),%xmm0
> >   movdqa %xmm0,%xmm1
> > 
> >   aesenc %xmm0,%xmm1
> >   aesenc %xmm0,%xmm1
> >   aesenc %xmm0,%xmm1
> >   aesenc %xmm0,%xmm1
> >   movd %xmm1, (%rdi)
> >   movd %xmm1, %eax
> >   shr $1, %eax
> 
> There's no reason to believe this code will have acceptable period or
> be unbiased. Instead of storing the AES result back to the state, you

Well I wrote above 

> > To test rand_r equivalent I wrote a simple generator (which is for
> > mostly to test performance, I did not look for quality.)
Which was only to show that using cryptographic functions is not too
slow. (You can alternatively use CRC32 instruction.)

I am still not convinced that changing implementation is improvement as
everybody which cares about quality uses random_r. 

I would accept an warning that rand_r is weak and one should use
random_r.
Comment 6 Rich Felker 2013-06-25 12:25:46 UTC
On Tue, Jun 25, 2013 at 06:58:21AM +0000, neleai at seznam dot cz wrote:
> I am still not convinced that changing implementation is improvement as
> everybody which cares about quality uses random_r. 

It's definitely an improvement in quality. This has been measured
extensively, and much of the improvement is not just empirical but
provable mathematically (uniformity).

The only question in my mind is whether there are applications which
depend on the existing low quality, e.g. to keep generating the same
outputs based on the same seeds.

> I would accept an warning that rand_r is weak and one should use
> random_r.

random_r is not portable. I agree rand_r is low quality, but the
motivation for using it is that it's the only portable prng that's
thread-safe, restartable, and has thread-local state (so that its
output is not affected by simultaneous use in other threads). I would
not promote its use (an equally-trivial 64- or 128-bit-state prng is
just as small and easy to write and can easily be dropped into any
application) but I can think of two potential reasons for wanting a
rand_r with the best quality it can provide within its interface
limitations:

1. Applications currently using rand_r that have not been tested
heavily, at least not on glibc.

2. Naive "fixing" of libraries that use rand to use rand_r by
programmers not aware of the quality issues with rand_r.