July 3, 2008

Terrible NA’s

Filed under: R project — izabela @ 4:32 pm

In R project, most of stuff is simple unless you have NA values (missing data). I struggled with them already in PCA. This time, I am not talking about any complicated programming, models or what not. No. All I want to do in this moment is to calculate a median!
I discovered on more than one page on the Net that summary functions like mean(data), median(data), var(data) or range(data) when data set contains NA, require na.rm=T to work:

median(data, na.rm=T)

Surprisingly, summary(data) will work without NA argument. Who would guess.
But, to make my life more miserable, mean(data, na.rm=T) on my data set works, while median(data, na.rm=T) produces only an error :( :

Error in median.default(set) : need numeric data

(Problem unsolved so far. E-mailed my R-guru, but he is on vacation till next week)

After few e-mail exchange with my R guru, there are following conclusions:

median(as.matrix(data), na.rm=T)

gives you median for whole data set, not variable by variable. Useless.

median(data[,3],na.rm=T)

gives you median for variable in column 3. If you have 120 variables, I guess you need to do it one by one. Not more helpful.
Bottom line? No solution at this time…

Technorati Tags: , , , , , ,

R for Dummies

Filed under: R project — izabela @ 3:57 pm

Just a short note on what I just discovered on the Internet, looking for some R stuff. It turns out, there is something like R-Commander. In simple words- R with pull-down menus and basic features, something like R for Dummies :). And by basic features I mean data summaries and graphs, t-test, ANOVA, clusters, PCA, distributions and some more!
I am installing it on my computer right now. All you need to do is to install package Rcmdr, and load it. If you are missing any packages (quite many, in my case), you get a message and R takes care of it by itself.
R-Commander itself seems to be rather intuitive and easy to use.

Technorati Tags: R-Commander,

July 2, 2008

Hotelling T2 test

Filed under: R project — izabela @ 12:06 pm

This time an easy assignment for R project. I needed Hotelling T-test, to compare multivariate means between two data sets. It turns out, all you need to do is to load the package ICSNP, and use function HotellingsT2. OK, with R there always needs to be a trick, and you always have to learn something else meantime. And here, you need to have you data divided into two sets, by group, with no group labels. So, if you have everything in one data frame, function subset comes in handy. Nicely described in Verzani book, the function allows you to restrict rows using logical phrase in subset= or columns, using numbers in select= as in example below:

X <- subset(data, subset = group == 'group1', select = 2:10)
Y <- subset(data, subset = group == 'group2', select = 2:10)

and then it is simple as that:

HotellingsT2 ( X, Y, test = ‘f’ )

The other option for test is ‘chi’, if your data is not-so-perfectly-normal (you can check package manual for more details).

Short note- this needs to be told what to do with outliers. na.action=na.omit or similar solves the problem.

Technorati Tags: , , , ,

Powered by WordPress.
Theme by Ron and Andrea. Background image from Gimp Patterns. Theme images created using The GIMP 2.2.8.