covariance matrix data sampling

Well Howell whowell@superlink.net
Sat Sep 6 13:59:00 GMT 2003


If the Cholesky is blowing up, how about using Singular Value 
Decomposition to factor the hard way:
That would also get the rank or help decide if there are really 5 or 6  
coeffecients based on the smallest
singular value you wish to work with....

If you're fortunate enough to have the second edition of the Gelman et 
al text referenced previously,
you'll be looking for page 578 for the basic way to generate draws from 
a multivariate normal.
Seems as if most of us that care about such things are into MCMC 
techniques for Bayesian analysis ~;)

Well Howell

Pasquale Tricarico wrote:

> Martin Jansche wrote:
>
>> Does this mean you want to sample from the multivariate normal
>> distribution?  If so, you're right that there is no code in the GSL
>> (except for the bivariate case), unless I'm missing something.  To
>> simulate a random draw from the multivariate normal, consult any
>> statistics book that discusses computational techniques.  For example,
>> Gelman, Carlin, Stern & Rubin ("Bayesian Data Analysis", 1995, p.478)
>> give the following algorithm.  The basic idea is to use the Cholesky
>> decomposition of the covariance matrix (GSL can do that) in
>> conjunction with several draws from the univariate normal (GSL can do
>> that too).
>>
>> More specifically, gsl_linalg_cholesky_decomp(Sigma) computes the
>> Cholesky decomposition of the covariance matrix Sigma and stores it in
>> the lower triangular part of Sigma.  Call that matrix (the lower
>> triangular one) A.  Perform d independent draws from a standard normal
>> (using gsl_ran_ugaussian()) and store them in a d-dimensional vector
>> x.  Then transform x into a new vector y as by letting y = mu + A x,
>> where mu is the d-dimensional mean vector.  The d-dimensional vector y
>> is then a draw from the mulitvariate normal with covariance matrix
>> Sigma and mean vector mu.
>>
>> Does that answer your question?
>>
>> - martin
>>  
>>
>
> Dear Martin,
>
> Thank you very much! You focused exactly the problem, and now I'm going
> to try the code you have outlined. Using the GSL gives to me much
> control over the generated data, i.e. the possibility to select a random
> seed for the generation of the x vector, and better error control.
>
> Unfortunately sometimes the covariance matrix is not positive-definite,
> i.e. because two variables are too-much correlated, and the
> decomposition fails. I've read that the correct approach to this problem
> should be to reduce the number of variables by 1, i.e. from 6 to 5,
> removing one of the two too-much correlated variables, and then use the
> reduced covariance matrix to generate 5-vectors instead of 6-vectors.
> The only thing I don't understand is how to correctly recover, after the
> data generation, a 6-vector from a 5-vector; I suppose I should use a
> formula relating the mu, the standard deviation of the removed variable
> and the generated 5-vector to the 6th element. Maybe there is a better
> solution to this. The solution to this issue would definitely solve the
> problem.
>
> Best regards
>
> --Pasquale
>
>
>
>
>



More information about the Gsl-discuss mailing list