High-dimensional Minimization without analytical derivatives
Anatoliy Belaygorod
belaygorod@wustl.edu
Wed Sep 1 15:30:00 GMT 2004
Thank you for replying.
It is funny that you are telling me this about Bayesian methods though. The fact is that I am as hard-core Bayesian as you can find :). In fact, my advisor is one of the biggest people in the Bayesian field.
The maximization that I am writing is needed to implement namely M-H algorithm. The simplest form of it is Random Walk MH. However, in order to calculate the variance for the proposal density, I need to mind MLE and evaluate the negative inverse Hessian at MLE to get the variance for my proposal density.
Another algorithm I will be writing involves something called MH Tailoring (my advisor developed it), which also requires maximization routine within each iteration for each MH-block. In this case though, the dimensionality of the maximization problem will be smaller, depending on Blocking.
So, my question still stands:
What is the most efficient way ( I am talking <computational speed>, not necessarily precision) to maximize a high-dimensional (anywhere between 4 and 20 dimensions) function using GSL <without> specifying analytic derivatives? Am I better of using Nelder Mead Simplex algorithm that doesn't require gradient at all, or should I write "my_df" function to include a loop for numerically evaluating gradient one direction-at-a-time (N-dimensional gradient is nothing but a collection of N one-dimensional derivatives, so I should be able to loop from 1 to N and use gsl_diff_central) and use gsl_multimin_fdfminimizer_vector_bfgs?
My co-author uses Gauss, and he tells me that in Gauss there is a driver that uses BFGS, while calculating numerical derivatives automatically on the background. It seems like I should be able to replicate this driver with GSL (if I do what I just described above), but is there anything that I am missing? I would hate to see Gauss driver to run faster than my C code, because I missed some important point or some trick. Clearly, if I provided analytical derivatives, BFGS would be faster than Nelder Mead Simplex. But if BFGS has access only to numerical derivatives... Which one is better in most cases?
I think this question should have come up for many people looking to solve a high dimensional min/max problem, because even if in theory analytic derivatives were available, it is simply not feasible to code them all up in, say, 100, or 1000-dimensional problem. So, one either has to loop through numerical derivatives, or live without gradient all together.
But which one is better?
________________________________
From: Jason Stover [mailto:jstover@sdf.lonestar.org]
Sent: Wed 9/1/2004 9:42 AM
To: Anatoliy Belaygorod
Cc: gsl-discuss@sources.redhat.com
Subject: Re: High-dimensional Minimization without analytical derivatives
Maximizing a likelihood function with 17 parameters and
80 observations is likely to get you odd answers, e.g., many
of your estimates will be on the boundary of the parameter space.
Another possible approach is a Bayesian one: define some reasonable
prior distributions for your parameters, then run a simulation to find
the regions where your posterior density function is highest. A common
simulation algorithm here is the Metropolis-Hastings algorithm, about
which there is a lot of literature. One popular book is "Markov Chain
Monte Carlo in Practice" by W. R. Gilks, S. Richardson,
D. J. Spiegelhalter. Coding such a simulation isn't difficult.
With MCMC simulations, you can also specify the dimension of the
parameter vector as a parameter itself. Peter Green wrote a paper
about this:
Reversible jump Markov chain Monte Carlo computation and Bayesian
model determination, Biometrika, 82, 711-732 (1995)
-Jason
On Mon, Aug 30, 2004 at 05:11:11PM -0500, Anatoliy Belaygorod wrote:
> Hello,
> I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives.
> Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?
>
> I really appreciate any help or advice you can give me.
> Sincerely,
> Anatoliy
>
>
--
jstover@sdf.lonestar.org
SDF Public Access UNIX System - http://sdf.lonestar.org
More information about the Gsl-discuss
mailing list