Terrible NA’s
In R project, most of stuff is simple unless you have NA values (missing data). I struggled with them already in PCA. This time, I am not talking about any complicated programming, models or what not. No. All I want to do in this moment is to calculate a median!
I discovered on more than one page on the Net that summary functions like mean(data), median(data), var(data) or range(data) when data set contains NA, require na.rm=T to work:
median(data, na.rm=T)
Surprisingly, summary(data) will work without NA argument. Who would guess.
But, to make my life more miserable, mean(data, na.rm=T) on my data set works, while median(data, na.rm=T) produces only an error
:
Error in median.default(set) : need numeric data
(Problem unsolved so far. E-mailed my R-guru, but he is on vacation till next week)
After few e-mail exchange with my R guru, there are following conclusions:
median(as.matrix(data), na.rm=T)
gives you median for whole data set, not variable by variable. Useless.
median(data[,3],na.rm=T)
gives you median for variable in column 3. If you have 120 variables, I guess you need to do it one by one. Not more helpful.
Bottom line? No solution at this time…
Technorati Tags: R project, NA values, missing data, summary functions, mean, median, na.rm