Your code is not entirely reproducible (there’s no running of the actual randomForest
algorithm) but you are not replacing Inf
values with the means of column vectors. This is because the na.rm = TRUE
argument in the call to mean()
within your impute.mean
function does exactly what it says — removes NA
values (and not Inf
ones).
You can see this, for example, by:
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x, na.rm = TRUE))
losses <- apply(losses, 2, impute.mean)
sum( apply( losses, 2, function(.) sum(is.infinite(.))) )
# [1] 696
To get rid of infinite values, use:
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x[!is.na(x) & !is.nan(x) & !is.infinite(x)]))
losses <- apply(losses, 2, impute.mean)
sum(apply( losses, 2, function(.) sum(is.infinite(.)) ))
# [1] 0