This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: spearman coefficient


I've added gsl_stats_spearman to the repository and have tested it on a few sample datasets. I essentially rewrote the routine using octave and numerical recipes as examples, though I rewrote everything from scratch so there are no copyright issues.

I added the function gsl_sort_vector2, similar to the numerical recipes sort2() function, which eliminates the need to allocate a permutation and sort vector. The workspace for the rank vectors is passed directly to the function so there is no need to allocate a separate workspace now.

It is possible to write the function to calculate the rank vectors in-place in the data vectors, but I opted to keep those inputs untouched to stay consistent with the rest of the statistics routines. The user must pass in a workspace of size 2*n.

I put the function in statistics/covariance_source.c so it will be defined with all the different types (float,double,int,short,etc) and its documented in the manual.

I'm sorry I wasn't able to directly use a lot of your code, but I do think this implementation is much more consistent with the rest of the library design. If you are using this function regularly in your work I would appreciate any feedback you can give (ie testing it with a wide range of inputs).

Patrick

On 05/25/2013 03:25 PM, TimothÃe Flutre wrote:
Hi Patrick,

thanks for your detailed reply. (I don't know why I didn't received
your email, I had to check the GSL mailing list archive to see it,
that's why I'm answering directly to you this time.)

About introducing a new workspace, I did it based on your advice from last year:
http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html

I don't have a strong opinion on what is the best, but someone else
commented on my code and also thought that it would be better to have
a workspace:
https://gist.github.com/timflutre/1784199#comment-82458

Maybe the code could offer two functions, with or without the
workspace? In this case, is there any guidelines to name the
functions?

I had a look at the implementation in R. The description of the
interface is here:
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html).

Even though it indicates that the argument "method" can take the value
"spearman", I don't see it anymore in the R code and thus I am a bit
confused by their implementation:
https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21

Moreover, the R code calls C code:
https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623

The file with the C code has several macros and functions to compute
covariance or correlation, to handle missing data in different ways,
to deal with Pearson, Spearman and Kendall coefficients, etc. All this
makes it really hard for me to understand it...

Finally, I looked at the algorithm in Numerical Recipes in C, the pdf
of the book is available here:
www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdfâ

However, the GSL web site says that we can't use algorithms from this
book because of the non-free license.

Also, it seems to me that spear() from Numerical Recipe (pdf page 641)
uses the function srt2() (Quicksort with 2 arrays, page 334) which
seems to require to allocate another array, "istack". Therefore, at
the end, it doesn't seem to me that it's much better than my d and
perm vector, which have the advantage of using other functions of the
GSL (gsl_sort_vector and gsl_sort_vector_index).

But again, I'm really not an expert programmer, in C or any other
language. So I tried to see how I could change my code based on what
you said but I don't see any obvious ways to do it (except copying the
code from Numerical Recipe).

If you don't want to include the code as it is into the next release
of the GSL, I'm fine with that. Of course, if you have a better
understandng of all this and you can explain me what to do, I can try
to help.

Best,

TimothÃe Flutre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]