This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: GSL 1.6 random numbers (gauss.c and rng.c)

From: James Theiler <jt at lanl dot gov>
To: Charles Karney <ckarney at sarnoff dot com>
Cc: gsl-discuss <gsl-discuss at sources dot redhat dot com>
Date: Wed, 26 Jan 2005 10:52:03 -0700 (MST)
Subject: Re: GSL 1.6 random numbers (gauss.c and rng.c)

Hi Charles,
   And thanks for the comments, which with this email I am cc'ing 
to the gsl-discuss list.  To the list, I am recommending that we
adopt both of his suggestions, though perhaps thinking a bit about
whether there is an even better solution to the issue he raised in
the gsl_rng_uniform_int routine.

On Wed, 26 Jan 2005, Charles Karney wrote:

] Hi!
] 
] We exchanged some E-mail back in Feb 2003 concerning implementations of
] the Walker algorithm.

And a valuable exchange it was, as I recall; thanks again for those
comments as well.

] 
] Now, I've got some comments on
] 
]     gsl_ran_gaussian_ratio_method
]     gsl_rng_uniform_int
] 
] in GSL 1.6
] 
] (a) [MINOR NIT] In gsl_ran_gaussian_ratio_method, the last digit of the
]     constant for sqrt(8/e) is wrong.  Worse still, the number you have
]     is too small.  Rounding the result UP still gives the correct
]     results, however (at the expense of a few extra rejects).  So you
]     might replace
] 
]       x = 1.71552776992141359295 * (v - 0.5) / u;
]     by
]       x = 1.715527769921413593 * (v - 0.5) / u;
]     or
]       x = 1.71552777 * (v - 0.5) / u;
] 
]     In my implementation I round up after additional digits:
] 
]       1.71552776992141359296037928256

Hmmm, if x were used solely as the rejection criterion, then I would
be inclined to agree that there is no harm to accuracy (and utterly
negligible harm to efficiency) to use a rounded-up value for the
constant.

But it's a little more complicated since x is also used as part of the
returned variate.  You may still be right that it doesn't matter,
maybe everything just scales up, but it's not obvious to me.

(Ok, on further thought, I've decided it's not obvious to me that
rounding up is valid even at just the rejection stage ... but this 
is undoubtedly just a failing on my part; I'm not saying you are 
wrong, just that it's not obvious to me that rounding up is 
harmless.)

Hoever, I just used 'bc' to verify your constant, and I agree with you
on that; the very least we should do is change that trailing '295' to
'296'.  I'd be happy to recommend changing that '295' to just '3' --
it's still bound to be adequately precise and it does round up.  

But what we should do is follow your practice, and use the still more
precise number.  No harm to us in being more precise, and if the
round-up is a conceptually better thing, then all the better.

] 
] (b) [EFFICIENCY] Leva gives quadratic tests to perform in
]     gsl_ran_gaussian_ratio_method which result in log being evaluated
]     0.012 per normal deviate instead of 1.369 times.  (Knuth's tests
]     result in 0.232 logs/deviate).  Depending on the relative speed of
]     log vs other arithmetic operations, this might speed up
]     gsl_ran_gaussian_ratio_method significantly.  The reference is
] 
]       J. L. Leva, A Fast Normal Random Number Generator,
]       ACM TOMS 18, 449-453 and 454-455 (1992).

Worth looking into. In my experience, logs are pretty fast on machines
I use (eg, Pentiums), so that the actual gains in reducing their
number are often not as substantial as one might hope.  But I haven't
done any benchmarks lately, and besides, if this is a better
algorithm, we should at least look into it.

] 
] (c) [GENERALIZE?] In gsl_rng_uniform_int you quit if n > range.  However
]     within that test you could allow the case n-1 == range.  Thus
] 
]       if (n > range) 
] 	{
] 	  if (n - 1 == range)
] 	    return ((r->type->get) (r->state)) - offset;
] 	  GSL_ERROR_VAL ("n exceeds maximum value of generator",
] 			    GSL_EINVAL, 0) ;
] 	}
] 
]     (Of course if range = maxint - 1, n would not be representable...)
] 

I have to admit, I haven't looked at this particular function before.  
It is confusing to me; it looks like range is defined in a funny way;
eg if min=0, max=99, shouldn't the range be 100 ? Or, 
if we are going to define range=max-min, then we 
should define the scale differently

      unsigned long int scale = (range + 1)/n;

where the "+ 1" is added to what we have now.  Then the rest proceeds
correctly and even more efficiently.

...except when r->type->max is the maximum unsigned long (ie,
0xffffffffUL), which is the case for a lot of the generators.  Then
the "+ 1" causes an overflow.  

So the current algorithm does give correct values, it just isn't as
efficient as it could be (if there were a nice way to deal with this
overflow).  eg, if n=(range+1)/2, -- eg, think n=50 in the case
above where min=0 and max=99 -- then the do/while loop is executed
twice as often as is actually necessary.

However, as Charles points out, the current algorithm isn't quite
correct when n-1==range.  His fix solves that problem.

Another alternative would be something like
    scale = ((range+1) > range ? range+1 : range)/n;
except I'm not really sure it would work, and yes, it is ugly.

jt

-- 
James Theiler            Space and Remote Sensing Sciences
MS-B244, ISR-2, LANL     Los Alamos National Laboratory
Los Alamos, NM 87545     http://nis-www.lanl.gov/~jt

Follow-Ups:
- Re: GSL 1.6 random numbers (gauss.c and rng.c)
  - From: Charles Karney

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]