backward elimination in R

When comparing different submodels, it is necessary that they be fitted to the same set of data — otherwise the results just don’t make sense. (Consider the extreme situation where you have two predictors A and B, which are each measured on only half of your observations — then the model y~A+B will be fitted to all the data, but the models y~A and y~B will be fitted to non-overlapping subsets of the data.) Thus, step won’t allow you to compare submodels that (because of automatic removal of cases containing NA values) are using different subsets of the original data set.

Using na.omit on the original data set should fix the problem.

fullmodel <- lm(Eeff ~ NDF + ADF + CP + NEL + DMI + FCM, data = na.omit(phuong))
step(fullmodel, direction = "backward", trace=FALSE ) 

However, if you have a lot of NA values in different predictors, you may end up losing a lot of your data set — in an extreme case you could lose the entire data set. If this happens you have to reconsider your modeling strategy …

Leave a Comment