July 3, 2008

Terrible NA’s

Filed under: R project — izabela @ 4:32 pm

In R project, most of stuff is simple unless you have NA values (missing data). I struggled with them already in PCA. This time, I am not talking about any complicated programming, models or what not. No. All I want to do in this moment is to calculate a median!
I discovered on more than one page on the Net that summary functions like mean(data), median(data), var(data) or range(data) when data set contains NA, require na.rm=T to work:

median(data, na.rm=T)

Surprisingly, summary(data) will work without NA argument. Who would guess.
But, to make my life more miserable, mean(data, na.rm=T) on my data set works, while median(data, na.rm=T) produces only an error :( :

Error in median.default(set) : need numeric data

(Problem unsolved so far. E-mailed my R-guru, but he is on vacation till next week)

After few e-mail exchange with my R guru, there are following conclusions:

median(as.matrix(data), na.rm=T)

gives you median for whole data set, not variable by variable. Useless.

median(data[,3],na.rm=T)

gives you median for variable in column 3. If you have 120 variables, I guess you need to do it one by one. Not more helpful.
Bottom line? No solution at this time…

Technorati Tags: , , , , , ,

No Comments »

Darn, no comments yet.

Powered by WordPress.
Theme by Ron and Andrea. Background image from Gimp Patterns. Theme images created using The GIMP 2.2.8.