This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Robust linear least squares


Hi Peter,

The most common robust least squares algorithm is called "M-estimation" which is what I've implemented. At each step of the iteration, you calculate the residuals and use a weighting function which is designed to assign large weights to small residuals and small weights to large residuals, so that the large residuals (outliers) contribute less and less to the model at each iteration. At each iteration, you need an estimate of the residual standard deviation, and I am using the Mean-Absolute-Deviation (MAD) of the p largest residuals (where p is the number of model parameters). There are alternatives to computing sigma but the MAD seems to be the most widely used.

If you check out the latest repository, have a look at the manual since I've documented everything including a description of the algorithm used. Let me know if you have more questions.

Patrick

On 05/12/2013 11:56 AM, Peter Teuben wrote:
Patrick
I agree, this is a useful option!

    can you say a little more here how you define robustness. The one I
know takes the quartiles Q1 and Q3 (where Q2 would
be the median), then define D=Q3-Q1 and only uses points between
Q1-1.5*D and Q3+1.5*D to define things like  a robust mean and variance.
Why 1.5 I don't know, I guess you could keep that a variable and tinker
with it.
For OLS you can imagine applying this in an iterative way to the Y
values, since formally the errors in X are neglibable compared to those
in Y. I'm saying iterative, since in theory the 2nd iteration could have
rejected points that should have
been part or the "core points".  For non-linear fitting this could be a
lot more tricky.

peter


On 05/10/2013 06:01 PM, Patrick Alken wrote:
Hi all,

   I just committed a significant chunk of code related to robust
linear regression into GSL and mainly wanted to update the other
developers and any other interested parties. The main idea here is
that ordinary least squares is very sensitive to data outliers, and
the robust algorithm tries to identify and downweight outlier points
so they don't drastically affect the model. I think this is something
that has been needed in gsl for a while.

   I've been developing the code for a while and have been using it
successfully in my own work, and also validated it pretty extensively
against the matlab implementation. I still need to make some automated
tests for it which I should get to next week.

   In the meantime, the code is very usable and working so feel free to
try it out.

Patrick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]