Created attachment 7075 [details] test program to generate data for analysis by dieharder Implementing a decent rand_r is very tricky because the interface requirement forces the full PRNG state to fit in 32 bits; this rules out pretty much all good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be. glibc's rand_r is based on the LCG published in the C standard: next = next * 1103515245 + 12345; return next / 65536 % 32768; This sample code in the standard assumes RAND_MAX == 32767. The lowest bit of the output has period 2^17 and the highest bit has period 2^31. As such, if only some of the bits are to be used, the highest bits should be kept and the lower bits discarded; this is the principle the sample code above is using for generating 15-bit output from a 32-bit internal state. However, glibc is trying to generate 31-bit output from a 32-bit state. Just discarding the lowest bit would not be acceptable (the new lowest bit would have period 4), so something different needs to be done. glibc's approach is to advance the LCG 3 times (since 3 and 2^32 are relatively prime, this does not hurt the period of the internal state) and take 10-11 bits from each iteration to make a 31-bit output. Unfortunately, this approach has 2 serious flaws: 1. The 10-11 bits taken are bits 16-25 or 16-26, which have moderately low period, rather than bits 21-31 or 22-31 which would have maximal period. Thus, a lot of the potential period length is being thrown away. 2. Concatenating the bits from multiple steps, rather than using a single step of the LCG, introduces bias into the output. The frequency of each possible output is not the same; instead, some outputs occur much more often, and some not at all. Bias is especially noticable when the period is short, which it necessarily is with a 32-bit state, and which is made much worse by item 1 above. If nothing else, item 1 should be fixed by adjusting the bitshifts to use the highest available bits, rather than the middle bits, to obtain the 3 10-11 bit values. This is straightforward and a pure improvement. To fully fix rand_r, the approach of concatenating multiple iterations should be abandoned in favor of a single-LCG-iteration approach followed by an invertable transformation on the output. Obviously a 32-bit cryptographic block cipher would give the best statistical properties, but it would be slow. In musl, we are now using the "temper" function from mersenne twister. Another solution that seems to work is performing a mix of bit rotations, bswap, and multiplication by well-chosen odd numbers, applied to the output of the LCG. I am attaching a test program for use with the dieharder PRNG test suite that demonstrates the low quality of glibc's rand_r. In particular, at least the OPSO, OQSO, and DNA tests show problems (tests 5, 6, and 7) and offhand it makes sense that this kind of bias would affect them. Note that, since dieharder requires 32-bit input as opposed to 31-bit input, I slightly modified the glibc algorithm to pull 11,11,10 bits instead of 11,10,10 bits, and included it in the test program rather than calling glibc's rand_r. Producing tests that work directly with glibc's rand_r to demonstrate the problem would be a lot more work. To run the test, compile glibc_rand_r.c and pipe output to dieharder: ./glibc_rand_r | dieharder -a -g 200
On Wed, Jun 12, 2013 at 11:39:09PM +0000, bugdal at aerifal dot cx wrote: > http://sourceware.org/bugzilla/show_bug.cgi?id=15615 > > Bug ID: 15615 > Summary: Poor quality output from rand_r > Product: glibc > Version: unspecified > Status: NEW > Severity: normal > Priority: P2 > Component: libc > Assignee: unassigned at sourceware dot org > Reporter: bugdal at aerifal dot cx > CC: drepper.fsp at gmail dot com > > Created attachment 7075 [details] > --> http://sourceware.org/bugzilla/attachment.cgi?id=7075&action=edit > test program to generate data for analysis by dieharder > > Implementing a decent rand_r is very tricky because the interface requirement > forces the full PRNG state to fit in 32 bits; this rules out pretty much all > good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be. > > glibc's rand_r is based on the LCG published in the C standard: > > next = next * 1103515245 + 12345; > return next / 65536 % 32768; > A problem here is that for many users predictability is much more important than quality. Developer expects that when he uses rand_r with state that he controls will not vary. This can cause extra debbuging hastle when code mysteriously fails on one machine but not other or desync issues. > To fully fix rand_r, the approach of concatenating multiple iterations should > be abandoned in favor of a single-LCG-iteration approach followed by an > invertable transformation on the output. Obviously a 32-bit cryptographic block > cipher would give the best statistical properties, but it would be slow. In This is false, I have a replacement of this with four rounds of AES. On intel using aesenc I performance is better than current, I did not propose that due of problems above. I wrote a RFC for random replacement on libc-alpha, browse archives.
On Thu, Jun 13, 2013 at 08:26:27AM +0000, neleai at seznam dot cz wrote: > A problem here is that for many users predictability is much more > important than quality. Developer expects that when he uses rand_r with > state that he controls will not vary. This can cause extra debbuging hastle > when > code mysteriously fails on one machine but not other or desync issues. Could you explain better what you're concerned about? By "predictable", do you mean keeping the same sequence it's had in the past? Aside from that, any PRNG with 32-bit state and 31-bit output is equally "predictable". > > To fully fix rand_r, the approach of concatenating multiple iterations should > > be abandoned in favor of a single-LCG-iteration approach followed by an > > invertable transformation on the output. Obviously a 32-bit cryptographic block > > cipher would give the best statistical properties, but it would be slow. In > > This is false, I have a replacement of this with four rounds of AES. On > intel using aesenc I performance is better than current, I did not > propose that due of problems above. I wrote a RFC for random > replacement on libc-alpha, browse archives. AES itself does not use 32-bit blocks, so you must be using a modified version. Would you care to explain? I searched the archives but could not find your post.
On Thu, Jun 13, 2013 at 12:38:42PM +0000, bugdal at aerifal dot cx wrote: > http://sourceware.org/bugzilla/show_bug.cgi?id=15615 > > --- Comment #2 from Rich Felker <bugdal at aerifal dot cx> --- > On Thu, Jun 13, 2013 at 08:26:27AM +0000, neleai at seznam dot cz wrote: > > A problem here is that for many users predictability is much more > > important than quality. Developer expects that when he uses rand_r with > > state that he controls will not vary. This can cause extra debbuging hastle > > when > > code mysteriously fails on one machine but not other or desync issues. > > Could you explain better what you're concerned about? By > "predictable", do you mean keeping the same sequence it's had in the > past? Aside from that, any PRNG with 32-bit state and 31-bit output is > equally "predictable". > > > > To fully fix rand_r, the approach of concatenating multiple iterations should > > > be abandoned in favor of a single-LCG-iteration approach followed by an > > > invertable transformation on the output. Obviously a 32-bit cryptographic block > > > cipher would give the best statistical properties, but it would be slow. In > > > > This is false, I have a replacement of this with four rounds of AES. On > > intel using aesenc I performance is better than current, I did not > > propose that due of problems above. I wrote a RFC for random > > replacement on libc-alpha, browse archives. > > AES itself does not use 32-bit blocks, so you must be using a modified > version. Would you care to explain? I searched the archives but could > not find your post. > Here, I wrote a version relevant to random. I did this to see how fast I could get if I employ paralellism and inlining. http://sourceware.org/ml/libc-help/2012-12/msg00005.html To test rand_r equivalent I wrote a simple generator (which is for mostly to test performance, I did not look for quality.) movd (%rdi),%xmm0 movdqa %xmm0,%xmm1 aesenc %xmm0,%xmm1 aesenc %xmm0,%xmm1 aesenc %xmm0,%xmm1 aesenc %xmm0,%xmm1 movd %xmm1, (%rdi) movd %xmm1, %eax shr $1, %eax On sandy bridge this code runs at half of speed of rand_r.
On Fri, Jun 14, 2013 at 12:10:59PM +0000, neleai at seznam dot cz wrote: > To test rand_r equivalent I wrote a simple generator (which is for > mostly to test performance, I did not look for quality.) > > movd (%rdi),%xmm0 > movdqa %xmm0,%xmm1 > > aesenc %xmm0,%xmm1 > aesenc %xmm0,%xmm1 > aesenc %xmm0,%xmm1 > aesenc %xmm0,%xmm1 > movd %xmm1, (%rdi) > movd %xmm1, %eax > shr $1, %eax There's no reason to believe this code will have acceptable period or be unbiased. Instead of storing the AES result back to the state, you should simply increment the state value (or advance it via a LCG). In other words, low-period PRNG using a cryptographic block cipher must use it in CTR mode unless the cipher itself has proper period when composed with itself (which is extremely unlikely but easily testable when the period is bounded by 2^32). In any case, I think the extreme low quality of rand_r qualifies as a bug. I'm not partial to any particular fix, but any fix should have: - maximal possible period given the constraint of 32-bit state, i.e. period of 2^32. - no bias (equal frequency of all outputs) - minimal/no statistical flaws other than those mandated by the constraint of short period (which in turn comes from the constraint of 32-bit state).
On Fri, Jun 14, 2013 at 03:37:30PM +0000, bugdal at aerifal dot cx wrote: > http://sourceware.org/bugzilla/show_bug.cgi?id=15615 > > --- Comment #4 from Rich Felker <bugdal at aerifal dot cx> --- > On Fri, Jun 14, 2013 at 12:10:59PM +0000, neleai at seznam dot cz wrote: > > To test rand_r equivalent I wrote a simple generator (which is for > > mostly to test performance, I did not look for quality.) > > > > movd (%rdi),%xmm0 > > movdqa %xmm0,%xmm1 > > > > aesenc %xmm0,%xmm1 > > aesenc %xmm0,%xmm1 > > aesenc %xmm0,%xmm1 > > aesenc %xmm0,%xmm1 > > movd %xmm1, (%rdi) > > movd %xmm1, %eax > > shr $1, %eax > > There's no reason to believe this code will have acceptable period or > be unbiased. Instead of storing the AES result back to the state, you Well I wrote above > > To test rand_r equivalent I wrote a simple generator (which is for > > mostly to test performance, I did not look for quality.) Which was only to show that using cryptographic functions is not too slow. (You can alternatively use CRC32 instruction.) I am still not convinced that changing implementation is improvement as everybody which cares about quality uses random_r. I would accept an warning that rand_r is weak and one should use random_r.
On Tue, Jun 25, 2013 at 06:58:21AM +0000, neleai at seznam dot cz wrote: > I am still not convinced that changing implementation is improvement as > everybody which cares about quality uses random_r. It's definitely an improvement in quality. This has been measured extensively, and much of the improvement is not just empirical but provable mathematically (uniformity). The only question in my mind is whether there are applications which depend on the existing low quality, e.g. to keep generating the same outputs based on the same seeds. > I would accept an warning that rand_r is weak and one should use > random_r. random_r is not portable. I agree rand_r is low quality, but the motivation for using it is that it's the only portable prng that's thread-safe, restartable, and has thread-local state (so that its output is not affected by simultaneous use in other threads). I would not promote its use (an equally-trivial 64- or 128-bit-state prng is just as small and easy to write and can easily be dropped into any application) but I can think of two potential reasons for wanting a rand_r with the best quality it can provide within its interface limitations: 1. Applications currently using rand_r that have not been tested heavily, at least not on glibc. 2. Naive "fixing" of libraries that use rand to use rand_r by programmers not aware of the quality issues with rand_r.