This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: True Random Numbers


Robert,

Have you considered the view that putting pseudo-random number
generators through statistical tests to see when they fail is like
going fishing and take a picture when it fails?  This was Pierre
L'Ecuyer's argument for putting pseudo-random number generators
through application tests.


Ken
-----------------------------------------------------------------------
News: OptimaNumerics Libraries 3.0: Over 200% Faster
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
OptimaNumerics Ltd.
E-mail: cjtan@OptimaNumerics.com      Telephone: +44 798 941 7838    
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015 
-----------------------------------------------------------------------




On 2004-11-15 10:47 -0500 Robert G. Brown (rgb@phy.duke.edu) wrote:

> Date: Mon, 15 Nov 2004 10:47:11 -0500 (EST)
> From: Robert G. Brown <rgb@phy.duke.edu>
> To: C J Kenneth Tan -- OptimaNumerics <cjtan@OptimaNumerics.com>
> Cc: Rodney Sparapani <rsparapa@post.its.mcw.edu>, hobbsk@ohiou.edu,
>     gsl-discuss@sources.redhat.com
> Subject: Re: True Random Numbers
> 
> On Mon, 15 Nov 2004, C J Kenneth Tan -- OptimaNumerics wrote:
> 
> > 
> > 
> > On 2004-11-05 14:17 -0600 Rodney Sparapani (rsparapa@post.its.mcw.edu) wrote:
> > 
> > > I guess it is possible.  I think the main problem with true random numbers is 
> > > that you can't reproduce them as needed.  That's why pseudo-random numbers are 
> > > so popular.  As long as you keep track of your seed, then you can replay the 
> > > sequence when necessary.
> > > 
> > > >Is it possible, or easy, to use the GSL random number distributions with
> > > >true random number devices?  I'm thinking of a generator that really
> > > >just reads from /dev/random or a device driver for one of the really
> > > >fancy hardware random number devices.
> > 
> > Another problem is that you cannot verify the quality of random
> > numbers obtained from physical sources.  It has also be shown by
> > George Marsaglia that physical sources which may be expected to be
> > "random" are actually not good sources of randomness.  This was
> > covered also in Knuth's "Art of Computer Programming".
> 
> An excellent point; this is one of the reasons that I've been working on
> the testing program.  A member of our department has discovered an
> optical process that he believes might generate truly random bits, and
> talked to me about how he might go about testing the output of his
> prototype device for randomness.  Interestingly, although the process is
> (supposedly) random, the noise it produces is not white -- the signal
> has a distribution.  There are ways of taking a random signal that isn't
> uniform and transforming it into one that is first order uniform (that
> is, has the right 0/1 bit balance) but a) the transformation costs two
> bits to make one bit, so it slows the generation process; and b) it
> leaves open the point as to whether >>higher<< order bit correlations
> are correctly (randomly) distributed, e.g. do 00, 01, 10, 11 occur in
> the right proportions?  How about 000, 001, 010, etc?  Many of the
> supposedly "random" physical processes that generate the bits have (if
> you look closely) the right first order correlations but the wrong
> second or higher order correlations.  Even supposedly purely random
> quantum processes such as photon counting from a fluorescent source can
> fail, because while single photon emission from an atom known to be
> precisely in the excited state is random enough, photon emission in
> fluorescence is antibunched.  Not even quantum processes are TRULY
> random; at best they are unpredictable.
> 
> Testing generators is a highly nontrivial process.  As I am rewriting
> Marsaglia's diehard and the NIST STS tests (from their descriptions in
> his documentation and/or the literature, as most of them come from the
> literature) fairly actively into a GPL shell tightly integrated with the
> GSL) it is becoming clear that even these tests are not enough.  Most of
> the diehard implementations of these tests date from when an IBM PC was
> an adequate testing platform and/or when the actual tests were run on an
> IBM mainframe from compiled OLD fortran code.  They were designed to be
> run on sequences of a few thousand rands -- the biggest tests on a few
> million -- read in from a file.  As I scale some of the tests up, I'm
> learning things about even the "good" GSL generators that suggest that
> they aren't "perfect" -- in many cases it may not be whether a test
> fails diehard tests, but at what POINT it fails a diehard test scaled up
> to a level appropriate to test a generator that might have to produce
> 10^18+ rands over the course of a single computation (where samples of
> only a few thousand or even million only represent a tiny fraction of
> the sequence).
> 
> Anyway, I spent a LOT of hours working on rand_rate over the weekend,
> and added three diehard tests (runs, birthdays, and 3dsphere) to the
> small list of bitlevel tests I already had implemented.  This required
> that I write a couple of Kolmogorov-Smirnov tests for the uniformity of
> the p's produced in these tests over independent runs (the regular test
> for max|observed - expected| in the cumulative sorted list and the
> Kuiper variant that sums the largest positive deviation and the largest
> negative deviation that treats the tails better).  I still need to do
> Anderson-Darling (which is what Marsaglia actually used) and also need
> to do all the variants that test a vector supposedly drawn from an
> arbitrary known distribution against the actual distribution.
> 
> Today I'll probably add the diehard "minimum distance test" which is
> really "2D spheres" (same test as 3d spheres in 2D).  These tests make
> me a bit nervous because the test (target) distributions aren't
> necessarily known analytically and the test statistic therefore contains
> a number that is itself evaluated by "extensive simulation".  This begs
> a variety of interesting questions about whether these tests CAN be
> safely cranked up -- at some point one expects that a modern "good"
> generator such as the mt19937 will end up revealing the failure of the
> older generators used to numerically determine the target distributions
> used in the test code and not the other way around...
> 
> I added a major.minor version number e.g. [0].[4.31] to the project
> abstract link for rand_rate so people who are playing with this along
> with me can see when I add a test (incrementing minor by 0.1), fix an
> important bug (incrementing minor by 0.01), or do a major structural
> rewrite to incorporate ad hoc tools into a uniform interface available
> to all the tests (bump minor by 1).  I probably won't bump major until I
> "finish" incorporating all the existing diehard and sts tests.
> 
> I've also got a few other ideas for extending the already tight
> integration between the tests and the GSL, including implementing a
> "distribution test" sequence that basically generates a random sample
> drawn from a GSL-supported distribution and performs a KS test on it,
> for a given/selected GSL rng.  This is actually a very simple test of
> both the GSL routine generating the distribution and the underlying rng
> (most of the tests of rngs are of exactly this structure, but for some
> exotic distributions).  They are in the abstract.  The man page and
> usage documentation will likely lag development for a while as I add
> tests, BTW.
> 
> If anybody else is interested in participating in this, please let me
> know.  I've got one more major structural rewrite planned -- the need
> for it has become evident as I implemented the diehard tests, which use
> a slightly different validation process (using e.g. KS) compared to the
> STS tests (that typically do not use KS).  The planned interface will
> permit one to use a variety of ways of looking at result histograms of
> pvalues to determine if a low p is just Marsaglia's "p happens" or is a
> real signal of rng weakness.  Of course, by the time I'm done it will be
> a true miracle if a weak rng fails only one test of the battery that
> will be available, and since each test has a variety of adjustable
> parameters (where possible or appropriate) in the rand_rate frame
> already one can also crank the test up until a failure that is marginal
> at the existing diehard or sts levels is glaringly apparent.  I've had
> great fun doing that with some of the tests already implemented with
> known-bad tests such as randu, which can actually not obviously fail
> some of the diehard or sts tests at their existing parameterization,
> sometimes, but which will generally fail spectacularly if pushed at a
> higher level.
> 
>     rgb
> 
> > 
> > 
> > Kenneth Tan
> > -----------------------------------------------------------------------
> > News: OptimaNumerics Increases Performance of World's Fastest IBM
> >       eServer BladeCenter JS20 Supercomputer!
> > -----------------------------------------------------------------------
> > C. J. Kenneth Tan, Ph.D.
> > OptimaNumerics Ltd.
> > E-mail: cjtan@OptimaNumerics.com      Telephone: +44 798 941 7838    
> > Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015 
> > -----------------------------------------------------------------------
> > 
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]