I recommend using the dummyVars function in the caret package:
customers <- data.frame( id=c(10, 20, 30, 40, 50), gender=c('male', 'female', 'female', 'male', 'female'), mood=c('happy', 'sad', 'happy', 'sad','happy'), outcome=c(1, 1, 0, 0, 0)) customers id gender mood outcome 1 10 male happy 1 2 20 female sad 1 3 30 female happy 0 4 40 male sad 0 5 50 female happy 0 # dummify the data dmy <- dummyVars(" ~ .", data = customers) trsf <- data.frame(predict(dmy, newdata = customers)) trsf id gender.female gender.male mood.happy mood.sad outcome 1 10 0 1 1 0 1 2 20 1 0 0 1 1 3 30 1 0 1 0 0 4 40 0 1 0 1 0 5 50 1 0 1 0 0
example source
You apply the same procedure to both the training and validation sets.