This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: covariance matrix data sampling

From: Well Howell <whowell at superlink dot net>
To: pasquale dot tricarico at pd dot infn dot it
Cc: gsl-discuss at sources dot redhat dot com
Date: Sat, 06 Sep 2003 09:58:23 -0400
Subject: Re: covariance matrix data sampling
References: <Pine.SOL.4.33.0309060457150.3033-100000@julius.ling.ohio-state.edu> <3F59C5E1.1050409@pd.infn.it>

If the Cholesky is blowing up, how about using Singular Value Decomposition to factor the hard way: That would also get the rank or help decide if there are really 5 or 6 coeffecients based on the smallest singular value you wish to work with....

If you're fortunate enough to have the second edition of the Gelman et al text referenced previously, you'll be looking for page 578 for the basic way to generate draws from a multivariate normal. Seems as if most of us that care about such things are into MCMC techniques for Bayesian analysis ~;)

Well Howell

Pasquale Tricarico wrote:

Martin Jansche wrote:

Does this mean you want to sample from the multivariate normal
distribution?  If so, you're right that there is no code in the GSL
(except for the bivariate case), unless I'm missing something.  To
simulate a random draw from the multivariate normal, consult any
statistics book that discusses computational techniques.  For example,
Gelman, Carlin, Stern & Rubin ("Bayesian Data Analysis", 1995, p.478)
give the following algorithm.  The basic idea is to use the Cholesky
decomposition of the covariance matrix (GSL can do that) in
conjunction with several draws from the univariate normal (GSL can do
that too).

More specifically, gsl_linalg_cholesky_decomp(Sigma) computes the
Cholesky decomposition of the covariance matrix Sigma and stores it in
the lower triangular part of Sigma.  Call that matrix (the lower
triangular one) A.  Perform d independent draws from a standard normal
(using gsl_ran_ugaussian()) and store them in a d-dimensional vector
x.  Then transform x into a new vector y as by letting y = mu + A x,
where mu is the d-dimensional mean vector.  The d-dimensional vector y
is then a draw from the mulitvariate normal with covariance matrix
Sigma and mean vector mu.

Does that answer your question?

- martin

Dear Martin,

Thank you very much! You focused exactly the problem, and now I'm going
to try the code you have outlined. Using the GSL gives to me much
control over the generated data, i.e. the possibility to select a random
seed for the generation of the x vector, and better error control.

Unfortunately sometimes the covariance matrix is not positive-definite,
i.e. because two variables are too-much correlated, and the
decomposition fails. I've read that the correct approach to this problem
should be to reduce the number of variables by 1, i.e. from 6 to 5,
removing one of the two too-much correlated variables, and then use the
reduced covariance matrix to generate 5-vectors instead of 6-vectors.
The only thing I don't understand is how to correctly recover, after the
data generation, a 6-vector from a 5-vector; I suppose I should use a
formula relating the mu, the standard deviation of the removed variable
and the generated 5-vector to the 6th element. Maybe there is a better
solution to this. The solution to this issue would definitely solve the
problem.

Best regards

--Pasquale

References:
- Re: covariance matrix data sampling
  - From: Pasquale Tricarico

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]