Random Number Seed

Robert G. Brown rgb@phy.duke.edu
Mon Mar 1 17:55:00 GMT 2004


On Mon, 1 Mar 2004, Przemyslaw Sliwa wrote:

> Hi,
> 
> I have a question:
> When one wants to use the random number seed different than the default one (equals to 0) one can use the macro GSL_RNG_SEED=seed from the command line. I would like to use the system time as the seed and have no idea how one can use the it from the command line. Therefore I want to use the function clock() in my C program. Could you help me how the seed can be initialized from the function claock() within my c program?

This is getting to be a faq.  Here is a short discursion on seeds yet
again.

Depending on the rng chosen, using the clock as a seed ranges from a
maybe-safe bad idea to a really BAD bad idea.  Obviously the seeds on
all jobs started in (say) any given hour will have substantial bit-level
correlations.  Whether or not those bit-level correlations will cause
supposedly "independent" jobs started with nearby seeds to exhibit
unexpected correlations depends in part on the quality of the rng
selected, but LOTS of the GSL rngs are not terriby high quality and
would be likely to exhibit the problem.

Seeding by hand can also be problematic as humans have a hard time
selecting random unsigned long integers from the full range of available
values.

The "best" solution (in my opinion) for seeding a rng to get unique rng
series in disparate computations (so one can, for example, apply
statistics safely to results from the computations under the assumption
that those results are "independent, identically distributed" numbers
according to the requirements of statistics and the central limit
theorem) is to do the following:

  a) Use an rng with a very, very, very,...very long period.  The period
really should be long enough that all of your samplings from the rng are
"unlikely" to overlap.

  b) Use a "high quality" rng, one that passes the Diehard suite or most
of the NIST/FPE suite of tests of randomness.  

The default GSL rng, mt19937, is a very good choice wrt both a) and b).
It has a period of 2^19937, which is yes, a very large number and has
passed the diehard tests.  It is also pretty fast -- one of the faster
generators in the GSL suite.

  c) Seed the generator from /dev/random when it is available.
/dev/random is slow and unsuitable for monte carlo sampling in most
cases, but it is highly "unpredictable" and appears to do well on
bit-level randomness tests.  It is almost certainly adequate and may
even be ideal.  Note that EVEN mt19937 had problems with bit
correlations caused by certain seeds -- the current version is
supposedly fixed but it still cannot hurt at all to use the most random
seed you have available.

  d) If you DO want to ensure that all your samplings drawn from each
seed are unique, record the seeds and use them to label your answers in
such a way that IF by any miracle you get two seeds that are identical,
the answer derived from those two runs is only counted once.  In most
cases this will make no observable difference in the answer, of course,
if one is pulling seeds from bitlevel-random unsigned long ints, but is
still a good practice.

  e) Only if /dev/random is not available consider using the clock.  In
that case you can use a bit of common sense to determine whether or not
to take extra measures.  If you're writing a game, don't bother.  If
you're doing simulations, you MIGHT want to use the clock to reseed one
(good) rng, and use the first rng to determine e.g.  a bitshuffling or
other "randomization" of the original seed to create a new, less
obviously correlated seed for the second (better) rng.  I don't have an
explicit theoretical foundation for this (although there may be one) but
intuitively doing this in two stages with good rngs will break up
bitlevel correlations in the second while diluting overall seed-based
correlation by something like the product of the available phase spaces.

A code snippet for seeding from /dev/random (with fallback from the
clock) is included below.  It basically returns an unsigned long
integer with at least some of its bits set by the faster usec scale clock in
gettimeofday.  If you prefer, you could only use the seconds portion of this.
It is important to note that the addition it uses has a distinct nonzero 
probability of returning the same seed but is generally more "random"; using 
seconds alone is very strongly correlated (and will OFTEN return the same 
seed value if multiple jobs are started per second or on a cluster where 
there is a bit of clock drift).

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

#include <stdio.h>
#include <sys/time.h>

unsigned long int random_seed()
{

 unsigned int seed;
 struct timeval tv;
 FILE *devrandom;

 if ((devrandom = fopen("/dev/random","r")) == NULL) {
   gettimeofday(&tv,0);
   seed = tv.tv_sec + tv.tv_usec;
   if(verbose == D_SEED) printf("Got seed %u from gettimeofday()\n",seed);
 } else {
   fread(&seed,sizeof(seed),1,devrandom);
   if(verbose == D_SEED) printf("Got seed %u from /dev/random\n",seed);
   fclose(devrandom);
 }

 return(seed);

}




More information about the Gsl-discuss mailing list