Conditional mean statement

If you want to exclude the non-smokers, you have a few options. The easiest is probably this:

mean(bwght[bwght$cigs>0,"cigs"])

With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.

Your other ones didn’t work for the following reasons:

mean(bwght$cigs| bwght$cigs>0)

This is effectively a logical comparison. You’re asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I’m not totally sure, but I think R can’t even take data typed as logical for the mean() function.

mean(bwght$cigs>0 | bwght$cigs=TRUE)

Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.

if(bwght$cigs > 0){sum(bwght$cigs)}

By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn’t work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won’t work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS – check out functions like lapply, tapply, and so on.

x <-as.numeric(bwght$cigs, rm="0")
mean(x)

I honestly don’t know what this would do. It might work if rm="0" didn’t have quotes…?

Leave a Comment